io.grpc.Context getting cancelled
See original GitHub issueDescribe the bug Seeing grpc context timeouts with 1.4.1 and 1.5.3.
Steps to reproduce No specific recipe to share at this point. But this appears to be related to few other issues with Google client APIs.
What did you expect to see? GRPC context timeout not getting affected by Otel instrumentation.
What did you see instead?
CANCELLED: io.grpc.Context was cancelled without error.
Stack: io.grpc.Status.asRuntimeException(Status.java:535),
io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:478),
interceptor-for-ClientCall.Listener<RespT>: onClose()
interceptor-for-ForwardingClientCall.SimpleForwardingClientCall<ReqT, RespT>: onClose()
...(truncated)...
(TracingClientInterceptor.java:139),
(DelayedClientCall.java:463),
(DelayedClientCall.java:427),
(DelayedClientCall.java:460),
(ClientCallImpl.java:553),
(ClientCallImpl.java:68),
(ClientCallImpl.java:212),
(ContextRunnable.java:37),
What version are you using? Tried with both 1.4.1 and 1.5.3 with the same result. Issue does not happen without instrumentation.
Environment JDK11
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:35 (28 by maintainers)
Top Results From Across the Web
gRPC: Random CANCELLED exception on RPC calls
gRPC creates a cancellable context for each inbound RPC. When that RPC completes (basically, when the server responds with a onCompleted()/Â ...
Read more >io.grpc.StatusRuntimeException: CANCELLED - Google Groups
This sounds like auto-cancellation. When a RPC completes on the server, the server's io.grpc.Context is cancelled. That propagates to any outgoing RPCs. This...
Read more >io.grpc.Context.cancellationCause java code examples
If the context is not yet cancelled will always return null. The cancellation cause is provided for informational purposes only and implementations should ......
Read more >Context.CancellableContext (grpc-all 1.51.0 API)
A context which inherits cancellation from its parent but which can also be independently cancelled and which will propagate cancellation to its descendants ......
Read more >gRPC - Timeouts & Cancellation - Tutorialspoint
gRPC supports canceling of requests both from the client as well as the server side. The Client can specify during runtime the amount...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
đ Hi all, I was able to recreate the bug. I have a reproducer project available in this repo: ryandens/otel-grpc-context
Iâm hopeful Iâll have time in the next few days to investigate and determine the source of the issue, but figured Iâd give someone a starting point if they beat me to it. I was able to confirm that this error is not present in 1.1.0 but is present in 1.2.0.
After looking into this, I have a hypothesis that the instrumentation is working as intended - but in some sense, it is working too well and perhaps we do need to change it somehow.
When looking at this type of code pattern inside a gRPC handler
It is in strict terms, âincorrectâ gRPC code - gRPC context hasnât been propagated into the executor. This means that deadline is not propagated as itâs supposed to be, and if using OpenCensus (gRPCâs native instrumentation), the trace would be broken. Similary, if using OpenTelemetryâs gRPC library instrumentation, the trace would be broken due to the lack of context propagation.
For this code to become correct, it needs to be something like
Executor executor = Context.currentContextExecutor(Executors.newFixedThreadPool());. With deadline correctly propagated into the threadpool, this would cause the backend request to fail with the cancellation exception, no OpenTelemetry involved. Presumably you would callContext.fork()in the business logic to fix this business logic bug. Note that with such a change, adding OpenTelemetry library instrumentation would also just work - we sync the context between gRPC and OpenTelemetry because we want people to be able to add in the library instrumentation interceptors and have applications using gRPC context propagation to work out of the box, especially to help with OpenCensus migration.Our problem here is that the javaagent is able to automatically propagate context for threadpool executors. Because we sync OpenTelemetry and gRPC context, this means that the gRPC deadline is also automatically propagated - so code that did not fail due to deadline issues will now fail. The early return pattern we see in the repro is a trival way of reproducing such code, but even traditional synchronous gRPC calls can fail if the client used
.withDeadline()and the deadline got propagated to a location where it wasnât intended. Usually it is intended though.So the javaagent is making the app âbetterâ by propagating deadline to more places it should be đ But we still have a rule that the javaagent canât introduce exceptions to code that previously didnât. I think I have reasoned out a workaround for the javaagent and letâs see how it goes.