Reactor netty in spring boot webflux application generates metrics with CancelledServerWebExchangeException exception and UNKNOWN outcome where are no issues

See original GitHub issue

We have a bunch of spring boot webflux services in our project and almost all has this same issue. We use prometheus for metrics and track the success of the requests. However in those services from 1% to 20% http server requests metrics consists of outcome=UNKNOWN with exception=CancelledServerWebExchangeException while there are no other indications of any issues in server responses or indication that clients cancel that many requests. examples: http_server_requests_seconds_count{exception="CancelledServerWebExchangeException",method="GET",outcome="UNKNOWN",platform="UNKNOWN",status="401",uri="UNKNOWN",} 87.0 http_server_requests_seconds_count{exception="CancelledServerWebExchangeException",method="GET",outcome="UNKNOWN",platform="UNKNOWN",status="200",uri="UNKNOWN",} 110.0

I successfully reproduced this locally with basic webflux application template and single controller bombarding with https://httpd.apache.org/docs/2.4/programs/ab.html : ab -n 15000 -c 50 http://localhost:8080/v1/hello.

I tried substituting tomcat for netty and there were no more of these metrics logs.

While it seems it doesn’t cause direct issues on services running in production, it still interferes in the correctness of the metrics and alerts. We can ignore all the UKNOWN outcomes but we can’t know if those UNKNOWN come from actual server/client cancels or just this netty issue.

Someone already had this issue in the past but it was never resolved: https://stackoverflow.com/questions/69913027/webflux-cancelledserverwebexchangeexception-appears-in-metrics-for-seemingly-no

Versions used: SpringBoot: 2.7.2 and SpringBoot: 2.6.2, Kotlin: 1.7.10, JVM: 17

Issue Analytics

  • State:closed
  • Created 10 months ago
  • Comments:14 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
bclozelcommented, Nov 28, 2022

@MBalciunas Thanks - I’m reaching out to the Security team internally. To me it’s somehow expected to have additional latency there as credentials checks are expected to take time/CPU resources. But the cancellation behavior I’m not familiar with in this case. I’ll report back here as soon as I know more.

0reactions
bclozelcommented, Dec 5, 2022

I don’t think we’re making progress here; we’re changing the “load test” infrastructure with each comment and we’re spending time on something that will not be useful to your problem. In my opinion, there are only three ways out of this:

  1. you consider that those CancelledServerWebExchangeException are just noise to your application and that they’re a distraction; in this case, you can contribute a MeterFilter in your application to filter those out.
  2. you manage, under certain conditions, to find that something in this chain doesn’t behave as it should: Netty (sending a channel event about client disconnecting) -> Reactor Netty (cancelling the reactive pipeline) -> Spring Boot (turning this signal into a metric). If you’re using a local load test setup you might find a bug in one of those or even in your local TCP stack; chances are quite low here and even if you do, this might not be the problem you’re seeing in production.
  3. you manage to collect data from production about client disconnecting (configuring debug logs to Netty or Reactor Netty, network traffic capture) and really understand why you’re seeing this in production

In the meantime, I’m closing this issue as I can’t justify spending more time on this right now.

Read more comments on GitHub >

github_iconTop Results From Across the Web

CVE-2022-31684: Reactor Netty HTTP Server may ... - Spring
Reactor Netty is used internally in many frameworks including Spring WebFlux and its WebClient. If you have a Spring Boot application, ...
Read more >
Webflux: CancelledServerWebExchangeException appears in ...
It appears in my own API metrics, as well as actuator endpoints metrics (health, info, prometheus). Example: http_server_requests_seconds_count{ ...
Read more >
reactor/reactor-netty - Gitter
Hi. I'm using r2dbc-mysql, and with BlockHound loaded get following error (in my case as soon as spring boot actuator health check triggered)...
Read more >
Reactor Netty Reference Guide
This section describes three kinds of configuration that you can use at the TCP level: Setting Channel Options. Wire Logger. Using an Event...
Read more >
Chapter 5. Developing reactive applications using Spring Boot ...
While WebFlux is designed to work primarily with Reactor Netty, ... Spring WebFlux and Reactor enable you to create applications that are: Non-blocking:...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found