Dead loop for connector state

See original GitHub issue

Hello team,

I am using Camel-kafka-s3 connector version 3.3.0 and apache camel 3.7.0. I have been using for more than a year and had never seen this issue. We have kafka cluster strimi 0.25.0 for kafka 2.8.0.

I noticed that for some issue (most probably custom broker certificate expired) on kafka-cluster, our kafka connector went on dead loop of re-trying to connect to broker. Our cluster came back in 10-15 mins but the kafka-connector process never came back to a healthy status. Here is what I see in logs:

2022-02-28 02:11:33,287 WARN [Worker clientId=connect-1, groupId=s3-connector] Didn't reach end of config log quickly enough (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [DistributedHerder-connect-1-1]
java.util.concurrent.TimeoutException: Timed out waiting for future
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:106)
at org.apache.kafka.connect.storage.KafkaConfigBackingStore.refresh(KafkaConfigBackingStore.java:441)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.readConfigToEnd(DistributedHerder.java:1188)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:342)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:316)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2022-02-28 02:16:45,296 INFO [Worker clientId=connect-1, groupId=s3-connector] Reading to end of config log; current config state offset: 2233 (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [DistributedHerder-connect-1-1]

Follwing the log from where exception started:

2022-02-27 19:18:30,203 INFO [AdminClient clientId=adminclient-8] Failed authentication with kafka-cluster-kafka-15.kafka-cluster-kafka-brokers.mdic-strimzi-pro.svc/100.80.52.142 (Token validation failed: Unknown signing key (kid:{keyid}) (ErrId: 26617d8d)) (org.apache.kafka.common.network.Selector) [kafka-admin-client-thread | adminclient-8]
2022-02-27 19:18:30,203 ERROR [AdminClient clientId=adminclient-8] Connection to node 15 (kafka-cluster-kafka-15.kafka-cluster-kafka-brokers.mdic-strimzi-pro.svc/100.80.52.142:9093) failed authentication due to: Token validation failed: Unknown signing key (kid:{keyid}) (ErrId: 26617d8d) (org.apache.kafka.clients.NetworkClient) [kafka-admin-client-thread | adminclient-8]
2022-02-27 19:18:30,203 WARN [AdminClient clientId=adminclient-8] Metadata update failed due to authentication error (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-8]
org.apache.kafka.common.errors.SaslAuthenticationException: Token validation failed: Unknown signing key (kid:m45ykTYExWg7_fe2lwAqTCHevnt-1c064bM9vRSw5V4) (ErrId: 26617d8d)
2022-02-27 19:18:30,218 INFO [AdminClient clientId=adminclient-8] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-8]
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1645989510217, tries=1, nextAllowedTryMs=1645989510318) timed out at 1645989510218 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
2022-02-27 19:18:30,318 INFO [AdminClient clientId=adminclient-8] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-8]
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1645989510317, tries=1, nextAllowedTryMs=1645989510418) timed out at 1645989510318 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
2022-02-27 19:18:30,418 INFO [AdminClient clientId=adminclient-8] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-8]
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1645989510417, tries=1, nextAllowedTryMs=1645989510518) timed out at 1645989510418 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
2022-02-27 19:18:30,518 INFO [AdminClient clientId=adminclient-8] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-8]
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1645989510517, tries=1, nextAllowedTryMs=1645989510618) timed out at 1645989510518 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata

I have few following questions:

  1. Why didn’t the connector die and restart the process?
  2. Is there a way we can propagate the Admin client exception in the application?
  3. If not, How can I monitor my application health?

Thank you.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
scholzjcommented, Feb 28, 2022

PS: Overall I think this is more a general problem not related to any particular connector.

0reactions
ruchirvaninasdaqcommented, Jun 21, 2022

Yeah, I am good for now. Thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

North Lakes Connector | Map, Guide - Alabama - AllTrails
Check out this 4.4-mile out-and-back trail near Pelham, Alabama. Generally considered a moderately challenging route, ...
Read more >
QuickBooks Web Connector Stuck in Infinite Loop
QuickBooks Web Connector Stuck in Infinite Loop · Go to the Banking menu. · Highlight Bank Feeds, then select Import Web Connect Files....
Read more >
Connector shows up as "RUNNING" after it stopped working ...
The group gets transitioned to state Dead after one week; the connector stays RUNNING. I'm using Kafka 2.7.0; I'm using two Kafka Connect ......
Read more >
Dickson County's direct 840 connector route moves closer to ...
Rough estimate of the connection from I-840 dead-end in Dickson County to. A direct connecting route from Highway 96 to Interstate 840 in ......
Read more >
[DBZ-5163] Debezium Postgres Connector goes ... - Red Hat
Debezium Postgres Connector goes into unrecoverable state after ... This is unexpected and can lead to an infinite loop or a data loss....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found