Heartbeat failed for group <id> because it is rebalancing
See original GitHub issueHi,
Our node details are as follows Kafka - 3 nodes (shared between different applications) App - 2 nodes (ie 2 consumers per topic, total 5 topics) Kafka settings - version 1.0 with default settings, nothing changed
FYI, Kafka settings
broker.id=1
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/var/lib/kafka
num.partitions=1
num.recovery.threads.per.data.dir=1
default.replication.factor=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
log.retention.hours=120
zookeeper.connection.timeout.ms=6000
confluent.support.metrics.enable=false
group.initial.rebalance.delay.ms=0
confluent.support.customer.id=anonymous
auto.create.topics.enable=false
Consumer s running fine , if I run one node with one consumer. The moment I start another consumer from other node, I get the group rebalancing error for some topics immediately.
I have tried with below consumer settings, still I see the same error.
AIOKafkaConsumer(
#evgaKafkaName(topic_name),
topic_name,
loop=loop,
#bootstrap_servers=getKafkaBrokers(),
bootstrap_servers=BROKERS,
# group_id=evgaKafkaName(group_name),
group_id=group_name,
#security_protocol='SSL',
#ssl_context=getKafkaSslContext(),
value_deserializer=lambda v: v.decode('utf8'),
#value_deserializer=deserializer,
enable_auto_commit=True,
auto_offset_reset=auto_offset_reset,
consumer_timeout_ms=consumer_timeout_ms,
retry_backoff_ms=10,
#heartbeat_interval_ms=50,
session_timeout_ms=60000,
#request_timeout_ms=80 * 1000,
#fetch_max_wait_ms=3000,
max_poll_records=50
)
What could be the issue, anything to do with rebalance.max.retries and rebalance.backoff.ms and session timeout ?
Thanks, Bala
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
Heartbeat failed for group xxxWorker because it is rebalancing
I solved this question by create many group to consume the topics. firstly I consume all the topics in one group, nearly twelve...
Read more >Heartbeat failed for group because it's rebalancing – iTecNote
When a new consumer joins a consumer group the set of consumers attempt to "rebalance" the load to assign partitions to each consumer....
Read more >Solving My Weird Kafka Rebalancing Problems & Explaining ...
Kafka starts a rebalancing if a consumer joins or leaves a group. Below are various reasons why that can or will happen.
Read more >kafka rebalancing issues
If you get a heartbeat failure because the group is rebalancing, – DeV Mar 19 at 9:18. it indicates that your consumer instance...
Read more >Getting CommitFailedException in 0.10.2.0 due to member id ...
If you look at line 42006 you will see that the group is rebalancing. Attempt to heartbeat failed for group new-part-advice since it...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
yes @tvoinarovskyi , looks like it fixed after adding the Rebalance listener.
@pod2metra Are you are seeing errors of similar kind in logs, or are you having problems with consuming messages because of it constantly happening?
If it’s just some messages in logs, that’s OK. It’s how Kafka behaves (at least used to before they introduced some optimizations, which are not implemented in aiokafka sadly). When Kafka broker sees some members of the group leave or timeout it will throw an error to all remaining participants. Usually, this error will be seen on the next heartbeat, which will result in a log message similar to:
That is good, it just signals the consumer to rejoin the group for consumption with a different member set.
If you have a different case, please open a new issue. Sorry for the trouble.