Reactor netty in spring boot webflux application generates metrics with CancelledServerWebExchangeException exception and UNKNOWN outcome where are no issues
See original GitHub issueWe have a bunch of spring boot webflux services in our project and almost all has this same issue. We use prometheus for metrics and track the success of the requests. However in those services from 1% to 20% http server requests metrics consists of outcome=UNKNOWN with exception=CancelledServerWebExchangeException while there are no other indications of any issues in server responses or indication that clients cancel that many requests. examples:
http_server_requests_seconds_count{exception="CancelledServerWebExchangeException",method="GET",outcome="UNKNOWN",platform="UNKNOWN",status="401",uri="UNKNOWN",} 87.0
http_server_requests_seconds_count{exception="CancelledServerWebExchangeException",method="GET",outcome="UNKNOWN",platform="UNKNOWN",status="200",uri="UNKNOWN",} 110.0
I successfully reproduced this locally with basic webflux application template and single controller bombarding with https://httpd.apache.org/docs/2.4/programs/ab.html : ab -n 15000 -c 50 http://localhost:8080/v1/hello.
I tried substituting tomcat for netty and there were no more of these metrics logs.
While it seems it doesn’t cause direct issues on services running in production, it still interferes in the correctness of the metrics and alerts. We can ignore all the UKNOWN outcomes but we can’t know if those UNKNOWN come from actual server/client cancels or just this netty issue.
Someone already had this issue in the past but it was never resolved: https://stackoverflow.com/questions/69913027/webflux-cancelledserverwebexchangeexception-appears-in-metrics-for-seemingly-no
Versions used: SpringBoot: 2.7.2 and SpringBoot: 2.6.2, Kotlin: 1.7.10, JVM: 17
Issue Analytics
- State:
- Created 10 months ago
- Comments:14 (7 by maintainers)
Top Related StackOverflow Question
@MBalciunas Thanks - I’m reaching out to the Security team internally. To me it’s somehow expected to have additional latency there as credentials checks are expected to take time/CPU resources. But the cancellation behavior I’m not familiar with in this case. I’ll report back here as soon as I know more.
I don’t think we’re making progress here; we’re changing the “load test” infrastructure with each comment and we’re spending time on something that will not be useful to your problem. In my opinion, there are only three ways out of this:
CancelledServerWebExchangeExceptionare just noise to your application and that they’re a distraction; in this case, you can contribute aMeterFilterin your application to filter those out.In the meantime, I’m closing this issue as I can’t justify spending more time on this right now.