UnknownMemberIdError causes consumer to lock up

See original GitHub issue

Describe the bug When we have multiple consumers running in Kubernetes join a group using faust(manual commits), we run into an issue where the group is stuck in a constant rebalance with the following logs. These logs are tailed from all the consumer. As you can see 2 consumers get -1 as the generation and the other 2 get the correct value The ones that get -1 are the ones that receive the following error during rebalance February 25th 2021, 12:19:40.754 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin

February 25th 2021, 12:19:40.770 Successfully synced group app with generation -1 February 25th 2021, 12:19:40.770 Successfully synced group app with generation -1 February 25th 2021, 12:19:40.770 Successfully synced group app with generation 372 February 25th 2021, 12:19:40.770 Successfully synced group app with generation 372 February 25th 2021, 12:19:40.757 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin February 25th 2021, 12:19:40.754 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin February 25th 2021, 12:19:40.753 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin February 25th 2021, 12:19:40.752 Elected group leader – performing partition assignments using faust February 25th 2021, 12:19:40.752 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin February 25th 2021, 12:19:40.751 Joined group ‘app’ (generation 372) with member_id faust-0.4.8rc6-30b24070-fbc9-4acd-9a45-9c1e05ce992e February 25th 2021, 12:19:40.748 Joined group ‘app’ (generation 372) with member_id faust-0.4.8rc6-f5f75be5-1b2c-431f-8d63-b198069eb0a5 February 25th 2021, 12:19:40.748 Joined group ‘app’ (generation 372) with member_id faust-0.4.8rc6-72990a08-0a25-43e8-89a1-27175c9ebcbe February 25th 2021, 12:19:40.748 Joined group ‘app’ (generation 372) with member_id faust-0.4.8rc6-90829dc4-4bc5-4d31-bf7b-1e5bc6a71aa5

Expected behaviour A clear and concise description of what you expected to happen.

February 25th 2021, 12:19:40.752 OffsetCommit failed for group app due to group error ([Error 25] UnknownMemberIdError: app), will rejoin Should not happen and -1 as the generation should not happen either

Environment (please complete the following information):

  • 0.7.0
  • 2.0.2:
  • 2.5.1:
  • Other information (Confluent Cloud version, etc.):

Reproducible example

# Add a short Python script or Docker configuration that can reproduce the issue.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:4
  • Comments:10 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
ostetsenkocommented, Jun 8, 2022

ods pwilczynskiclearcode

Hello, I catch this error again.

aiokafka==0.7.2 python==3.9.12

Used 2 pods for consumption. We have a load just at 9 am for 1 hour every day. enable_auto_commit=False (by faust-streaming framework)

Heartbeat called _maybe_leave_group()

# If consumer is idle (no records consumed) for too long we need
# to leave the group
idle_time = self._subscription.fetcher_idle_time
if idle_time < self._max_poll_interval:
    sleep_time = min(
        sleep_time,
        self._max_poll_interval - idle_time)
else:
    await self._maybe_leave_group()

Logs from pod1 (pod1 continued to process messages):

{"name": "aiokafka.consumer.group_coordinator", "levelno": 20, "pathname": "/app/.local/lib/python3.9/site-packages/aiokafka/consumer/group_coordinator.py", "filename": "group_coordinator.py", "exc_info": null, "exc_text": null, "lineno": 359, "message": "LeaveGroup request succeeded", "@timestamp": "2022-06-07T15:35:02.033104", "level": "INFO", "@version": "1"}
{"name": "aiokafka.consumer.group_coordinator", "levelno": 20, "pathname": "/app/.local/lib/python3.9/site-packages/aiokafka/consumer/group_coordinator.py", "filename": "group_coordinator.py", "exc_info": null, "exc_text": null, "lineno": 384, "message": "Revoking previously assigned partitions frozenset({TopicPartition(topic='topic1', partition=0), TopicPartition(topic='topic2', partition=0), ....}) for group group-id-07-06-22", "@timestamp": "2022-06-07T15:35:02.033347", "level": "INFO", "@version": "1"}

Logs from pod2 (pod2 locked up):

{"name": "aiokafka.consumer.group_coordinator", "levelno": 20, "pathname": "/app/.local/lib/python3.9/site-packages/aiokafka/consumer/group_coordinator.py", "filename": "group_coordinator.py", "exc_info": null, "exc_text": null, "lineno": 359, "message": "LeaveGroup request succeeded", "@timestamp": "2022-06-07T15:35:02.033104", "level": "INFO", "@version": "1"}
{"name": "aiokafka.consumer.group_coordinator", "levelno": 20, "pathname": "/app/.local/lib/python3.9/site-packages/aiokafka/consumer/group_coordinator.py", "filename": "group_coordinator.py", "exc_info": null, "exc_text": null, "lineno": 384, "message": "Revoking previously assigned partitions frozenset({TopicPartition(topic='topic1', partition=0), TopicPartition(topic='topic2', partition=0), ....}) for group group-id-07-06-22", "@timestamp": "2022-06-07T15:35:02.033347", "level": "INFO", "@version": "1"}
{"name": "aiokafka.consumer.group_coordinator", "levelno": 40, "pathname": "/app/.local/lib/python3.9/site-packages/aiokafka/consumer/group_coordinator.py", "filename": "group_coordinator.py", "exc_info": null, "exc_text": null, "lineno": 1043, "message": "OffsetCommit failed for group group-id-07-06-22 due to group error ([Error 25] UnknownMemberIdError: group-id-07-06-22), will rejoin", "@timestamp": "2022-06-07T15:35:02.150061", "level": "ERROR", "@version": "1"}
{"name": "aiokafka.consumer.group_coordinator", "levelno": 40, "pathname": "/app/.local/lib/python3.9/site-packages/aiokafka/consumer/group_coordinator.py", "filename": "group_coordinator.py", "exc_info": null, "exc_text": null, "lineno": 1052, "message": "OffsetCommit failed for group group-id-07-06-22 due to group error ([Error 25] UnknownMemberIdError: group-id-07-06-22), will rejoin", "@timestamp": "2022-06-07T15:35:02.150212", "level": "ERROR", "@version": "1"}
Logs from 07/06/202
0reactions
ostetsenkocommented, Sep 4, 2022

@sam-orca: Workaround: Parallel thread monitors a periodic task and restarts the service when the task wasn’t run in TTL (the main thread was locked)

Read more comments on GitHub >

github_iconTop Results From Across the Web

UnknownMemberIdError causes consumer to lock up #575
We have encountered an issue several times in the last few days in which a consumer process becomes unresponsive and needs to be...
Read more >
UnknownMemberId Error in AIOKafka library while consuming ...
I am facing an error with the library AIOKafka in Python (versions ... to group error ([Error 25] UnknownMemberIdError: my-group-dag-kafka), ...
Read more >
What To Know About Credit Freezes and Fraud Alerts
Credit freezes and fraud alerts can protect you from identity theft or prevent further misuse of your personal information if it was stolen....
Read more >
Consumer Report Security Freeze State Laws
State Citation Who May Request a Freeze Alabama Ala. Code §8‑35‑1 et seq. Any consumer Arkansas Ark. Stat. Ann. §4‑112‑101 et seq. Any consumer California Cal....
Read more >
What does it mean to put a security freeze on my credit report?
A security freeze prevents prospective creditors from accessing your credit file.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found