Function App host restarting but can't see why
See original GitHub issueInvestigative information
Please provide the following:
- Timestamp: 2019-03-12T03:00:00.000000Z until 2019-03-12T12:00:00.0000000Z
- Function App version (1.0 or 2.0): 2.0
- Function App name:
- Function name(s) (as appropriate):
- Invocation ID: 27ea1788-840a-4e61-99f8-b4584fecf4b4
- Region: Australia Southeast
Repro steps
Currently as we kick off a large Durable Function orchestration, we are expecting that the activity and sub orchestrations to finish and return to the ‘root’ orchestration and complete.
What we are observing is that while the majority of the processes have completed, we end up getting to a point where the host seems to constantly restart. We might then get a couple of trace messages to indicate some processing and activity, no error messages, then another cycle of host initialization.
Looking through our Log Analytics exceptions doesn’t yield and errors and there doesn’t seem to be anything particularly apparent in the traces section either.
Using the following query:
traces | where message contains "Initializing Host." and timestamp >= todatetime('2019-03-12T03:00:00.000000Z') and timestamp <= todatetime('2019-03-12T12:00:00.0000000Z') ends up yielding 167 records for the time period.
It appears that these messages seem to be spread out across the cloud_RoleInstances that are running
traces | where message contains "Initializing Host." and timestamp >= todatetime('2019-03-12T03:00:00.000000Z') and timestamp <= todatetime('2019-03-12T12:00:00.0000000Z') | summarize count() by cloud_RoleInstance

Exception query yields no results 😦
exceptions | where timestamp >= todatetime('2019-03-12T03:00:00.000000Z') and timestamp <= todatetime('2019-03-12T12:00:00.0000000Z')
Looking at ‘Diagnose and solve problems’ blade for the Function App and going to ‘Function App Down or Reporting Errors’ indicates no errors, however, a number of functions that have not been completed.

Expected behavior
Function host to not be constantly restarting and finish processing.
Actual behavior
Function host machines seem to be ‘restarting’ and going through its initialization phase.
Known workarounds
None known
Related information
Function SDK: 1.0.26 Language: C#
- We are currently running our Function app on an App Service Plan (scaled to 3 instances of B3)
- We have disabled applicationInsights sampling to ensure that we are getting all logs to flow through.
- We heavily use Durable Function orchestrations calling sub orchestrations and activities.
<PackageReference Include="ExifLib.Standard" Version="1.7.0" />
<PackageReference Include="Magick.NET-Q16-AnyCPU" Version="7.10.1" />
<PackageReference Include="Microsoft.Azure.CognitiveServices.Vision.ComputerVision" Version="3.3.0" />
<PackageReference Include="Microsoft.Azure.CognitiveServices.Vision.Face" Version="2.3.0-preview" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.CosmosDB" Version="3.0.3" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.DurableTask" Version="1.7.1" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.ServiceBus" Version="3.0.3" />
<PackageReference Include="Microsoft.Azure.WebJobs.Extensions.Storage" Version="3.0.3" />
<PackageReference Include="Microsoft.NET.Sdk.Functions" Version="1.0.26" />
<PackageReference Include="morelinq" Version="3.1.0" />
<PackageReference Include="PhotoSauce.MagicScaler" Version="0.9.1" />
<PackageReference Include="ProjNET4GeoAPI" Version="1.4.1" />
<PackageReference Include="SixLabors.ImageSharp" Version="1.0.0-beta0005" />
<PackageReference Include="Soze.Common" Version="1.0.26" />
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:8 (1 by maintainers)
Top Related StackOverflow Question
@fabiocav after much analysis from the team, we’ve tracked down the offending piece of code.
log.LogWarning($"Exception AnalyseImage() - {JsonConvert.SerializeObject(e, new JsonSerializerSettings { ReferenceLoopHandling = ReferenceLoopHandling.Serialize })}");I’m guessing that was naive implementation on our side. (have since removed it) The difficulty that we found was that the above piece of code caused the whole Function Host to effectively crash there and then. It took us a while to find where this was failing as we were dealing with other messages being processed off Service Bus and we weren’t seeing any consistency in where (if any) errors were being thrown.
We ‘think’ log messages were actually being generated, but unable to be written to Log Analytics in time before the Function Host died…hence when we query Log Analytics we could never find the errors that we were looking for.
I’m unsure if there is a possible way for the Function Host to die (or if it even should?) more gracefully to enable people to diagnose these situations better?
I am experiencing this exact issue as well with a NodeJS-based app. I have an Azure service bus trigger configured and that part works fine. Looking at the streaming log I see my app get the message and start its execution. But shortly after it starts the job host restarts, and I see messages like this in the log:
The job host just keeps re-initializing, presumably pausing the execution of my function already in progress. Eventually my function completes but quite a while later for what should be a fairly quick operation. I have looked through Application Insights and I see nothing that would indicate why it’s happening. My function is able to complete its work within the 10 minute max timeout I’ve specified in host.json.
Definitely seems like a real issue.