Exceptions.throwIfFatal / throwIfJvmFatal should log fatal exceptions

Related issues: #3036, #2521, #2917, #1118

The combination of unrecoverable Error subclasses and reactive/asynchronous applications can easily lead to applications being left in semi-working state with no visible errors being logged.

Motivation

In an imperative, blocking application, an unrecoverable Error subclass would lead to the main application thread terminating with a stack trace. However, in reactive/asynchronous applications, the Error is often propagated up to a thread in some thread pool, which upon termination is peacefully returned to the pool. Unless some layer of the application catches Throwables and logs the error, it will be invisible to the end users and application maintainers. On the other hand, Reactor sinks/subscriptions will be left in a “hanging” state – neither working, nor terminated – leading to hard to catch and hard to debug situations.

The specific scenario that caused me to look into this was a minor library version incompatibility that resulted in java.lang.NoSuchMethodError being thrown in some stream mapping logic. While unrecoverable from both Java perspective (because it’s an Error) and from Reactor throwIfJvmFatal perspective (because it’s a LinkageErrror), this type of issue tends to be isolated to a particular code path and perhaps does not need to lead to application shutdown. At minimum, it would be good to have consistent logging of unrecoverable errors.

Desired solution

In order of the amount of work involved.

Logging fatal errors as part of Exceptions.throwIfJvmFatal.
While Reactor is doing the right thing by not turning thrown Error into stream errors, it would be good if the downstream code could recover if desired. This would require that FluxSink is left in a state in which it’s capable of terminating – in the reactorFluxSinkUnrecoverableError() test below, it would mean sink.error(e); results in the proper stream termination (although the exception passed in to the sink probably needs to be wrapped in something that won’t re-trigger throwIfJvmFatal behavior).

Considered alternatives

Leave everything as is; not handling Error subclasses works as intended. While true, it does lead to developers spending a lot of time troubleshooting – see related issues. And that’s only the people who did figure out what was going wrong! It’s a valid option though.

Additional context

The following two tests demonstrate the difference in behavior between normal exceptions being handled and converted to stream errors, and unrecoverable errors leading to a hanging stream.

  // this test passes
 @Test
  void reactorFluxSinkRecoverableException() {

    Flux<String> testFlux = Flux.create(sink -> {
      sink.onCancel(() -> System.out.println("*** canceled"));
      sink.onDispose(() -> System.out.println("*** disposed"));
      try {
        sink.next("some value");
        sink.complete();
      } catch(Exception e) {
        // This point is never reached because Reactor correctly converts exception into stream error.
        System.out.println("Exception in `next` processing caught, handled: " + e.getMessage());
        sink.error(e);
      }

    }).map(value -> {
      System.out.println("received value: " + value);
      throw new RuntimeException("normal exception");
    });

    StepVerifier.create(testFlux)
            .verifyErrorMessage("normal exception");
  }

  // this test hangs
  @Test
  void reactorFluxSinkUnrecoverableError() {

    Flux<String> testFlux = Flux.create(sink -> {
      sink.onCancel(() -> System.out.println("*** canceled"));
      sink.onDispose(() -> System.out.println("*** disposed"));
      try {
        sink.next("some value");
        sink.complete();
      } catch(Throwable e) {
        System.out.println("Throwable in `next` processing caught, handled: " + e.getMessage());
        // this will have no effect; a terminal state is no longer reachable for the stream
        sink.error(e);
      }
    }).map(value -> {
      System.out.println("received value: " + value);
      throw new LinkageError("unrecoverable error");
    });

    // This stream will hang.
    StepVerifier.create(testFlux)
            .verifyErrorMessage("unrecoverable error");
  }

Issue Analytics

State:
Created a year ago
Reactions:1
Comments:7 (5 by maintainers)

Top GitHub Comments

1reaction

simonbaslecommented, Jul 29, 2022

@schananas if you have pieces of code that are susceptible to OOM but you’re confident you can recover from these, then you are the best suited to implement that. I’d argue only you can take the responsibility to wrap such code in a try-catch block, because Reactor certainly can’t identify these specific occurrences. I doubt even you can, but that’s another story (see this informative stackoverflow answer for some context).

with the introduction of isFatal is the associated PR, it could open up a path of hooking a Predicate but as explained above that Predicate would apply too broadly (filter by exception Class, message…). Nothing better than user knowledge to pinpoint “I’m about to decode a potentially huge object, let’s try/catch”. We won’t be pursuing that path for now.

1reaction

simonbaslecommented, Jul 18, 2022

I’ll take that one and add logging to Exceptions.throwIfFatal

Top Results From Across the Web

reactor.core.Exceptions.throwIfJvmFatal java code examples

Throws a particular Throwable only if it belongs to a set of "fatal" error varieties native to the JVM. These varieties are as...

When logging when is an error fatal? - Stack Overflow

Non-fatal errors would include: A server where a single session fails for some reason but you can still service other clients. An intermittent ......

Index (reactor-core 3.5.0)

Log an exception (throwable) at the DEBUG level with an accompanying message. ... is considered by Reactor as Fatal and would be thrown...

reactor.core.Exceptions Maven / Gradle / Ivy

Exceptions maven / gradle build tool code. ... exceptions. Instances create by this method can be detected using the * {@link #isMultiple(Throwable)} check....

There is nothing Goish about log.Fatal | by Sindre Myren

A program might be killed for any reason, so we can't rely on defer. Therefore, isn't it acceptable that defers doesn't always run...