"Test host process crashed" error hard to diagnose

See original GitHub issue

Our CI has an intermittent failure:

The active test run was aborted. Reason: Test host process crashed
Results File: C:\a\_temp\AzDevOps_2019-6vse00024V_2021-06-24_02_47_47.trx
Test Run Aborted.

As far as I can tell, the message is printed by this code: https://github.com/microsoft/vstest/blob/eff66c00b217b355b6ee11034ff8396a618d04e3/src/Microsoft.TestPlatform.CommunicationUtilities/TestRequestSender.cs#L667

It would be really helpful here if it printed the full path to the process executable that crashed. I took a quick look but I couldn’t find where to get the executable full path from. Also it seems that the error output stream contents was empty (clientExitErrorMessage) so when the test process crashes it should print the full exception stack to the Console.Error so that the stack appears in the CI log.

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:7
  • Comments:42 (19 by maintainers)

github_iconTop GitHub Comments

10reactions
KirillOsenkovcommented, Jun 30, 2021

I have found out what my problem was. It’s that infamous 100ms bug striking again, I can’t even fathom how much that bug has caused our ecosystem in terms of productivity loss. I personally wasted four full days tracking this down.

I had to add --diag to my dotnet test arguments to publish the diagnostic logs, and then a separate step to upload them as artifacts.

Finally I saw this:

TpTrace Error: 0 : 6860, 7, 2021/06/26, 19:34:34.416, 8580065582, vstest.console.dll, LengthPrefixCommunicationChannel.Send: Error sending data: System.IO.IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.IO.BufferedStream.Flush()
   at System.IO.BinaryWriter.Flush()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.Send(String data).
TpTrace Warning: 0 : 6860, 7, 2021/06/26, 19:34:34.417, 8580070372, vstest.console.dll, ProxyOperationManager: Failed to end session: Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.Interfaces.CommunicationException: Unable to send data over channel.
 ---> System.IO.IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.IO.BufferedStream.Flush()
   at System.IO.BinaryWriter.Flush()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.Send(String data)
   --- End of inner exception stack trace ---
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.Send(String data)
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestSender.EndSession()
   at Microsoft.VisualStudio.TestPlatform.CrossPlatEngine.Client.ProxyOperationManager.Close()
TpTrace Warning: 0 : 6860, 7, 2021/06/26, 19:34:34.417, 8580071127, vstest.console.dll, ProxyOperationManager: Timed out waiting for test host to exit. Will terminate process.

I think it’s this issue: https://github.com/microsoft/vstest/issues/2379

https://github.com/microsoft/vstest/blob/d10bcbb28cc3999bcc12758a41a04b998eb9595b/src/Microsoft.TestPlatform.CrossPlatEngine/Client/ProxyOperationManager.cs#L211-L215

The process is busy doing something, then we impatiently kill it after 100ms, AND WE DON’T TELL THE USER WE KILLED IT. All the user sees is:

The active test run was aborted. Reason: Test host process crashed

So then the user is sent on a wild goose chase for 4 days trying to figure out various ways to deploy procdump.exe to the CI agent, find out that it doesn’t work anyway because we only explicitly pass StackOverflowException and AccessViolationException as arguments to procdump and ignore other types of exceptions.

We killed the process, we know it, but we don’t log it, don’t publish the dump, don’t have any MSBuild errors, leaving the user helpless, frustrated, blocked, not knowing how to proceed.

1reaction
sttobiacommented, Aug 21, 2023

I can see it fails in xUnit.GetAvailableRunnerReporters because it cannot load a module. I can see the name of the module. Sent you an email.

Quick update from my side: Looking at the procdump files, it looked like an issue with a specific DLL file in my build folder. After letting it rest for a while and then updating all packages at some point (including xunit IIRC), the problem was completely gone. In hindsight the process just crashing, and the dump not really helping out either, was really frustrating. Nevertheless, thank you for the support @nohwnd!

Read more comments on GitHub >

github_iconTop Results From Across the Web

DevOps Test host process crashed
MS hosted agent (ubuntu-latest) started crashing when trying to run `dotnet test`. The error message does not have any details or ...
Read more >
Azure Devop TestThe active test run was aborted. Reason: ...
Reason: Test host process crashed : Unhandled Exception. We have one PR that fails due to our tests sets. They only fail with...
Read more >
How to Fix Common Host Process for Windows Tasks ...
Method 4: Run the Windows Memory Diagnostic tool · Press the Windows + R keys on your keyboard to bring up the Run...
Read more >
Windows-based computer freeze troubleshooting
Learn how to troubleshoot computer freeze issues on Windows-based computers and servers. Also, you can learn how to diagnose, identify, ...
Read more >
Diagnosing an ASP.NET Core hard crash
I execute all Zoom-specific code on the WPF Dispatcher thread. The immediate error came from code like the following. You can ignore the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found