UnknownMemberIdError causes consumer to lock up
See original GitHub issueDescribe the bug
When we have multiple consumers running in Kubernetes join a group using faust(manual commits), we run into an issue where the group is stuck in a constant rebalance with the following logs. These logs are tailed from all the consumer. As you can see 2 consumers get -1 as the generation and the other 2 get the correct value
The ones that get -1 are the ones that receive the following error during rebalance
February 25th 2021, 12:19:40.754 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin
February 25th 2021, 12:19:40.770 Successfully synced group app with generation -1 February 25th 2021, 12:19:40.770 Successfully synced group app with generation -1 February 25th 2021, 12:19:40.770 Successfully synced group app with generation 372 February 25th 2021, 12:19:40.770 Successfully synced group app with generation 372 February 25th 2021, 12:19:40.757 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin February 25th 2021, 12:19:40.754 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin February 25th 2021, 12:19:40.753 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin February 25th 2021, 12:19:40.752 Elected group leader – performing partition assignments using faust February 25th 2021, 12:19:40.752 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin February 25th 2021, 12:19:40.751 Joined group ‘app’ (generation 372) with member_id faust-0.4.8rc6-30b24070-fbc9-4acd-9a45-9c1e05ce992e February 25th 2021, 12:19:40.748 Joined group ‘app’ (generation 372) with member_id faust-0.4.8rc6-f5f75be5-1b2c-431f-8d63-b198069eb0a5 February 25th 2021, 12:19:40.748 Joined group ‘app’ (generation 372) with member_id faust-0.4.8rc6-72990a08-0a25-43e8-89a1-27175c9ebcbe February 25th 2021, 12:19:40.748 Joined group ‘app’ (generation 372) with member_id faust-0.4.8rc6-90829dc4-4bc5-4d31-bf7b-1e5bc6a71aa5
Expected behaviour A clear and concise description of what you expected to happen.
February 25th 2021, 12:19:40.752 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin
Should not happen and -1 as the generation should not happen either
Environment (please complete the following information):
- 0.7.0
- 2.0.2:
- 2.5.1:
- Other information (Confluent Cloud version, etc.):
Reproducible example
# Add a short Python script or Docker configuration that can reproduce the issue.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:4
- Comments:10 (1 by maintainers)
Top Related StackOverflow Question
ods pwilczynskiclearcode
Hello, I catch this error again.
aiokafka==0.7.2 python==3.9.12
Used 2 pods for consumption. We have a load just at 9 am for 1 hour every day. enable_auto_commit=False (by faust-streaming framework)
Heartbeat called _maybe_leave_group()
Logs from pod1 (pod1 continued to process messages):
Logs from pod2 (pod2 locked up):
@sam-orca: Workaround: Parallel thread monitors a periodic task and restarts the service when the task wasn’t run in TTL (the main thread was locked)