"Specified group generation id is not valid" after broker maintenance, consumer stops receiving events
See original GitHub issueHi, we are having an issue similar to https://github.com/tulios/kafkajs/issues/1009 but it happens after a broker maintenance.
We have consumers running parallelly on different machines, with a heartbeat check triggered on eachBatch.
We consume multiple topics, with a specific instance of our service per topic.
All of this works fine but we had issues (twice already) when brokers go on maintenance.
Some of the instance (thus some of the topics) stop consuming events, but don’t throw errors nor crash (if it crashed we would respawn and everything would be ok).
We do see the error message:
[Consumer] Crash: KafkaJSNonRetriableError: Specified group generation id is not valid
But it doesn’t actually crash, and the instance is stale, it won’t consume any new message or trigger the heartbeat. If we restart the instance it will consume all pending traffic (given the offset is still current).
Odd thing is some of the topics keep working fine after the maintenance, so the overall system seems to be “up” unless we check each specific topic.
Issue Analytics
- State:
- Created a year ago
- Reactions:4
- Comments:10
Top Related StackOverflow Question
Ran into this as well, proposed fix: https://github.com/tulios/kafkajs/pull/1474
I have pretty the same thing. I have a connection to 11 topics and when I start receiving messages i see the logs below
and after it the message that the consumer has been stopped. Increasing of heartbeats interval and sessionTimeout didn’t help