Intermittent Unknown error 258 with no obvious cause

See original GitHub issue

Describe the bug

On occasions we will see the following error

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteReader(IAsyncResult asyncResult, Boolean isInternal, String endMethod)
   at Microsoft.Data.SqlClient.SqlCommand.EndExecuteReaderInternal(IAsyncResult asyncResult)
   at Microsoft.Data.SqlClient.SqlCommand.EndExecuteReaderAsync(IAsyncResult asyncResult)
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization)

However, SQL Server shows no long running queries and is not using a lot of it’s resources during the periods where this happens.

It looks to be more of an intermittent connection issue but we’re unable to find any sort of root cause.

To reproduce

We’re not sure of the reproduction steps. I’ve been unable to reproduce this myself by simulating load. From what we can tell this is more likely to happen when the pod is busy (not through just HTTP, but handling events from an external source) but equally it can happen randomly when nothing is really happening on the pod which has caused us quite a substantial amount of confusion.

Expected behavior

Either more information on what the cause might be, or some solution to the issue. I realise the driver might not actually know the issue and it may really be a timeout to it’s point of view. We’re not entirely sure where the problem lies yet, which is the biggest issue.

Further technical details

Microsoft.Data.SqlClient version: 3.0.1 .NET target: Core 3.1 SQL Server version: Microsoft SQL Azure (RTM) - 12.0.2000.8 Operating system: Docker Container - mcr.microsoft.com/dotnet/aspnet:3.1

Additional context

  • Running in AKS, against Elastic Pools.
  • SQL Server shows no long running queries
  • We sometimes get a TimeoutEvent from the metrics that are collected from the pool. On occasions when we do get them, the error_state will be different.
    • For example, we had one this morning that was 145. We don’t know what this means can find no information on what these relate to. I’ve raised a ticket with the Azure Docs team to look at this. I’ll add more onto this when they happen as we’ve not been keeping track of the error_state codes as we’re not sure if they’re even relevant.
  • This might be related to this ticket - https://github.com/dotnet/SqlClient/issues/647
    • However we don’t see the ReadSniSyncOverAsync
  • We do have event counter metrics being exported to Prometheus but have found no obvious indicators that something is wrong

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:11
  • Comments:45 (9 by maintainers)

github_iconTop GitHub Comments

6reactions
DLS201commented, Aug 10, 2022

Hello, Azure Support just told us that the product group identified the issue and will fix it Q4 2022.

Regards,

3reactions
eisenwintercommented, Jun 27, 2023

@dazinator we only tested containers on different Linux host Systems to see if it’s possible related to a certain kernel version or not but seems like it’s not, as said we stopped testing after the issue disappeared after migrating the sql server to a newer version, just wanted to share our experience maybe it helps others somehow

Read more comments on GitHub >

github_iconTop Results From Across the Web

Azure Kubernetes .NET Core App to Azure SQL Database ...
Win32Exception (258): The wait operation timed out. and. System.ComponentModel.Win32Exception (258): Unknown error 258. is probably just the ...
Read more >
Getting Timeout Error[258] Unable to complete login ...
Restored a TFS Database in CUS region in Azure SQL VM. Not able to connect to database from Client machine which is an...
Read more >
Azure Kubernetes .NET Core App to Azure SQL Database ...
... operation or the server is not responding. ---> System.ComponentModel.Win32Exception (258): Unknown error 258 at System.Data.SqlClient.
Read more >
IBM Informix Messages and Corrections
If not, note all circumstances and contact IBM Informix Techical Support. -53 Software caused connection abort. An operating-system error code with the ...
Read more >
Error 258 when connecting to sql server instance over the ...
I managed to get it working. Named instances use dynamic ports by default. I needed to change the firewall rule so that it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found