Dead loop for connector state
See original GitHub issueHello team,
I am using Camel-kafka-s3 connector version 3.3.0 and apache camel 3.7.0. I have been using for more than a year and had never seen this issue. We have kafka cluster strimi 0.25.0 for kafka 2.8.0.
I noticed that for some issue (most probably custom broker certificate expired) on kafka-cluster, our kafka connector went on dead loop of re-trying to connect to broker. Our cluster came back in 10-15 mins but the kafka-connector process never came back to a healthy status. Here is what I see in logs:
2022-02-28 02:11:33,287 WARN [Worker clientId=connect-1, groupId=s3-connector] Didn't reach end of config log quickly enough (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [DistributedHerder-connect-1-1]
java.util.concurrent.TimeoutException: Timed out waiting for future
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:106)
at org.apache.kafka.connect.storage.KafkaConfigBackingStore.refresh(KafkaConfigBackingStore.java:441)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.readConfigToEnd(DistributedHerder.java:1188)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:342)
at org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:316)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2022-02-28 02:16:45,296 INFO [Worker clientId=connect-1, groupId=s3-connector] Reading to end of config log; current config state offset: 2233 (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [DistributedHerder-connect-1-1]
Follwing the log from where exception started:
2022-02-27 19:18:30,203 INFO [AdminClient clientId=adminclient-8] Failed authentication with kafka-cluster-kafka-15.kafka-cluster-kafka-brokers.mdic-strimzi-pro.svc/100.80.52.142 (Token validation failed: Unknown signing key (kid:{keyid}) (ErrId: 26617d8d)) (org.apache.kafka.common.network.Selector) [kafka-admin-client-thread | adminclient-8]
2022-02-27 19:18:30,203 ERROR [AdminClient clientId=adminclient-8] Connection to node 15 (kafka-cluster-kafka-15.kafka-cluster-kafka-brokers.mdic-strimzi-pro.svc/100.80.52.142:9093) failed authentication due to: Token validation failed: Unknown signing key (kid:{keyid}) (ErrId: 26617d8d) (org.apache.kafka.clients.NetworkClient) [kafka-admin-client-thread | adminclient-8]
2022-02-27 19:18:30,203 WARN [AdminClient clientId=adminclient-8] Metadata update failed due to authentication error (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-8]
org.apache.kafka.common.errors.SaslAuthenticationException: Token validation failed: Unknown signing key (kid:m45ykTYExWg7_fe2lwAqTCHevnt-1c064bM9vRSw5V4) (ErrId: 26617d8d)
2022-02-27 19:18:30,218 INFO [AdminClient clientId=adminclient-8] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-8]
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1645989510217, tries=1, nextAllowedTryMs=1645989510318) timed out at 1645989510218 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
2022-02-27 19:18:30,318 INFO [AdminClient clientId=adminclient-8] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-8]
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1645989510317, tries=1, nextAllowedTryMs=1645989510418) timed out at 1645989510318 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
2022-02-27 19:18:30,418 INFO [AdminClient clientId=adminclient-8] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-8]
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1645989510417, tries=1, nextAllowedTryMs=1645989510518) timed out at 1645989510418 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
2022-02-27 19:18:30,518 INFO [AdminClient clientId=adminclient-8] Metadata update failed (org.apache.kafka.clients.admin.internals.AdminMetadataManager) [kafka-admin-client-thread | adminclient-8]
org.apache.kafka.common.errors.TimeoutException: Call(callName=fetchMetadata, deadlineMs=1645989510517, tries=1, nextAllowedTryMs=1645989510618) timed out at 1645989510518 after 1 attempt(s)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timed out waiting to send the call. Call: fetchMetadata
I have few following questions:
- Why didn’t the connector die and restart the process?
- Is there a way we can propagate the Admin client exception in the application?
- If not, How can I monitor my application health?
Thank you.
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (4 by maintainers)
Top Results From Across the Web
North Lakes Connector | Map, Guide - Alabama - AllTrails
Check out this 4.4-mile out-and-back trail near Pelham, Alabama. Generally considered a moderately challenging route, ...
Read more >QuickBooks Web Connector Stuck in Infinite Loop
QuickBooks Web Connector Stuck in Infinite Loop · Go to the Banking menu. · Highlight Bank Feeds, then select Import Web Connect Files....
Read more >Connector shows up as "RUNNING" after it stopped working ...
The group gets transitioned to state Dead after one week; the connector stays RUNNING. I'm using Kafka 2.7.0; I'm using two Kafka Connect ......
Read more >Dickson County's direct 840 connector route moves closer to ...
Rough estimate of the connection from I-840 dead-end in Dickson County to. A direct connecting route from Highway 96 to Interstate 840 in ......
Read more >[DBZ-5163] Debezium Postgres Connector goes ... - Red Hat
Debezium Postgres Connector goes into unrecoverable state after ... This is unexpected and can lead to an infinite loop or a data loss....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
PS: Overall I think this is more a general problem not related to any particular connector.
Yeah, I am good for now. Thanks.