Heartbeat failed for group <id> because it is rebalancing

See original GitHub issue

Hi,

Our node details are as follows Kafka - 3 nodes (shared between different applications) App - 2 nodes (ie 2 consumers per topic, total 5 topics) Kafka settings - version 1.0 with default settings, nothing changed

FYI, Kafka settings

broker.id=1
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/var/lib/kafka
num.partitions=1
num.recovery.threads.per.data.dir=1
default.replication.factor=3
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=3
transaction.state.log.min.isr=2
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
log.retention.hours=120
zookeeper.connection.timeout.ms=6000
confluent.support.metrics.enable=false
group.initial.rebalance.delay.ms=0
confluent.support.customer.id=anonymous
auto.create.topics.enable=false

Consumer s running fine , if I run one node with one consumer. The moment I start another consumer from other node, I get the group rebalancing error for some topics immediately.

I have tried with below consumer settings, still I see the same error.

AIOKafkaConsumer(
        #evgaKafkaName(topic_name),
        topic_name,
        loop=loop,
        #bootstrap_servers=getKafkaBrokers(),
        bootstrap_servers=BROKERS,
        # group_id=evgaKafkaName(group_name),
        group_id=group_name,
        #security_protocol='SSL',
        #ssl_context=getKafkaSslContext(),
        value_deserializer=lambda v: v.decode('utf8'),
        #value_deserializer=deserializer,
        enable_auto_commit=True,
        auto_offset_reset=auto_offset_reset,
        consumer_timeout_ms=consumer_timeout_ms,
        retry_backoff_ms=10,
        #heartbeat_interval_ms=50,
        session_timeout_ms=60000,
        #request_timeout_ms=80 * 1000,
        #fetch_max_wait_ms=3000,
        max_poll_records=50
    )

What could be the issue, anything to do with rebalance.max.retries and rebalance.backoff.ms and session timeout ?

Thanks, Bala

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
pybalacommented, Sep 11, 2018

yes @tvoinarovskyi , looks like it fixed after adding the Rebalance listener.

0reactions
tvoinarovskyicommented, Mar 12, 2020

@pod2metra Are you are seeing errors of similar kind in logs, or are you having problems with consuming messages because of it constantly happening?

If it’s just some messages in logs, that’s OK. It’s how Kafka behaves (at least used to before they introduced some optimizations, which are not implemented in aiokafka sadly). When Kafka broker sees some members of the group leave or timeout it will throw an error to all remaining participants. Usually, this error will be seen on the next heartbeat, which will result in a log message similar to:

 Heartbeat failed for group <id> because it is rebalancing

That is good, it just signals the consumer to rejoin the group for consumption with a different member set.

If you have a different case, please open a new issue. Sorry for the trouble.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Heartbeat failed for group xxxWorker because it is rebalancing
I solved this question by create many group to consume the topics. firstly I consume all the topics in one group, nearly twelve...
Read more >
Heartbeat failed for group because it's rebalancing – iTecNote
When a new consumer joins a consumer group the set of consumers attempt to "rebalance" the load to assign partitions to each consumer....
Read more >
Solving My Weird Kafka Rebalancing Problems & Explaining ...
Kafka starts a rebalancing if a consumer joins or leaves a group. Below are various reasons why that can or will happen.
Read more >
kafka rebalancing issues
If you get a heartbeat failure because the group is rebalancing, – DeV Mar 19 at 9:18. it indicates that your consumer instance...
Read more >
Getting CommitFailedException in 0.10.2.0 due to member id ...
If you look at line 42006 you will see that the group is rebalancing. Attempt to heartbeat failed for group new-part-advice since it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found