Discussion:
java.lang.IllegalStateException: Correlation id for response () does not match request ()
Mickael Maison
2017-03-06 17:15:39 UTC
Permalink
Hi,

In one of our clusters, some of our clients occasionally see this exception:
java.lang.IllegalStateException: Correlation id for response (4564)
does not match request (4562)
at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:486)
at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:381)
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
at java.lang.Thread.run(Unknown Source)

We've also seen it from consumer poll() and commit()

Usually the response's correlation id is off by just 1 or 2 (like
above) but we've also seen it off by a few hundreds:
java.lang.IllegalStateException: Correlation id for response (742)
does not match request (174)
at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:486)
at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:381)
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:426)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1059)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1027)

When this happens, all subsequent responses are also shifted:
java.lang.IllegalStateException: Correlation id for response (743)
does not match request (742)
java.lang.IllegalStateException: Correlation id for response (744)
does not match request (743)
java.lang.IllegalStateException: Correlation id for response (745)
does not match request (744)
java.lang.IllegalStateException: Correlation id for response (746)
does not match request (745)
...
It's easy to discard and recreate the consumer instance to recover
however we can't do that with the producer as it occurs in the Sender
thread.

Our cluster and our clients are running Kafka 0.10.0.1.
Under which circumstances would such an error happen ?
Even with logging set to TRACE, we can't spot anything suspicious
shortly before the issue. Is there any data we should try to capture
when this happens ?

Thanks!
Ismael Juma
2017-03-06 17:44:46 UTC
Permalink
Hi Mickael,

This looks to be the same as KAFKA-4669. In theory, this should never
happen and it's unclear when/how it can happen. Not sure if someone has
investigated it in more detail.

Ismael
Post by Mickael Maison
Hi,
java.lang.IllegalStateException: Correlation id for response (4564)
does not match request (4562)
at org.apache.kafka.clients.NetworkClient.correlate(
NetworkClient.java:486)
at org.apache.kafka.clients.NetworkClient.parseResponse(
NetworkClient.java:381)
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(
NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
at java.lang.Thread.run(Unknown Source)
We've also seen it from consumer poll() and commit()
Usually the response's correlation id is off by just 1 or 2 (like
java.lang.IllegalStateException: Correlation id for response (742)
does not match request (174)
at org.apache.kafka.clients.NetworkClient.correlate(
NetworkClient.java:486)
at org.apache.kafka.clients.NetworkClient.parseResponse(
NetworkClient.java:381)
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(
NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.
clientPoll(ConsumerNetworkClient.java:360)
at org.apache.kafka.clients.consumer.internals.
ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at org.apache.kafka.clients.consumer.internals.
ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at org.apache.kafka.clients.consumer.internals.
ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
commitOffsetsSync(ConsumerCoordinator.java:426)
at org.apache.kafka.clients.consumer.KafkaConsumer.
commitSync(KafkaConsumer.java:1059)
at org.apache.kafka.clients.consumer.KafkaConsumer.
commitSync(KafkaConsumer.java:1027)
java.lang.IllegalStateException: Correlation id for response (743)
does not match request (742)
java.lang.IllegalStateException: Correlation id for response (744)
does not match request (743)
java.lang.IllegalStateException: Correlation id for response (745)
does not match request (744)
java.lang.IllegalStateException: Correlation id for response (746)
does not match request (745)
...
It's easy to discard and recreate the consumer instance to recover
however we can't do that with the producer as it occurs in the Sender
thread.
Our cluster and our clients are running Kafka 0.10.0.1.
Under which circumstances would such an error happen ?
Even with logging set to TRACE, we can't spot anything suspicious
shortly before the issue. Is there any data we should try to capture
when this happens ?
Thanks!
Aarti Gupta
2017-11-29 23:07:26 UTC
Permalink
https://issues.apache.org/jira/browse/KAFKA-4669?focusedCommentId=16271727&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16271727
Can we reopen this issue ?

We saw this on the consumer in production today. We are on 0.11.01

ERROR c.v.v.h.m.k.KafkaEventConsumerDelegate- Error fetching next record
java.lang.Exception: Error fetching next new record from kafka queue
at
com.vmware.vchs.hybridity.messaging.kafka.KafkaEventConsumerDelegate.getNextEventWithTimeout(KafkaEventConsumerDelegate.java:121)
at
com.vmware.vchs.hybridity.messaging.kafka.KafkaEventConsumerDelegate.getNextEvent(KafkaEventConsumerDelegate.java:64)
at
com.vmware.vchs.hybridity.messaging.adapter.EventListenerAdapter.getNextEvent(EventListenerAdapter.java:76)
at
com.vmware.vchs.hybridity.messaging.adapter.EventListenerAdapter.getNextEventAndAck(EventListenerAdapter.java:94)
at
com.vmware.vchs.hybridity.messaging.adapter.EventListenerAdapter$1.run(EventListenerAdapter.java:125)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Correlation id for response
(386681) does not match request (386680), request header:
{api_key=9,api_version=3,correlation_id=386680,client_id=consumer-36}

at org.apache.kafka.clients.NetworkClient.correlate(NetworkClient.java:752)
at
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:561)
at
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:657)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:442)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:232)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208)
at
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:168)
at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.fetchCommittedOffsets(ConsumerCoordinator.java:477)
at
org.apache.kafka.clients.consumer.KafkaConsumer.committed(KafkaConsumer.java:1346)
at
com.vmware.vchs.hybridity.messaging.kafka.KafkaEventConsumerDelegate.getNextEventWithTimeout(KafkaEventConsumerDelegate.java:86)
... 5 common frames omitted
Post by Ismael Juma
Hi Mickael,
This looks to be the same as KAFKA-4669. In theory, this should never
happen and it's unclear when/how it can happen. Not sure if someone has
investigated it in more detail.
Ismael
Post by Mickael Maison
Hi,
java.lang.IllegalStateException: Correlation id for response (4564)
does not match request (4562)
at org.apache.kafka.clients.NetworkClient.correlate(
NetworkClient.java:486)
at org.apache.kafka.clients.NetworkClient.parseResponse(
NetworkClient.java:381)
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(
NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
at org.apache.kafka.clients.producer.internals.Sender.run(
Sender.java:229)
Post by Mickael Maison
at org.apache.kafka.clients.producer.internals.Sender.run(
Sender.java:134)
Post by Mickael Maison
at java.lang.Thread.run(Unknown Source)
We've also seen it from consumer poll() and commit()
Usually the response's correlation id is off by just 1 or 2 (like
java.lang.IllegalStateException: Correlation id for response (742)
does not match request (174)
at org.apache.kafka.clients.NetworkClient.correlate(
NetworkClient.java:486)
at org.apache.kafka.clients.NetworkClient.parseResponse(
NetworkClient.java:381)
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(
NetworkClient.java:449)
at org.apache.kafka.clients.NetworkClient.poll(
NetworkClient.java:269)
Post by Mickael Maison
at org.apache.kafka.clients.consumer.internals.
ConsumerNetworkClient.
Post by Mickael Maison
clientPoll(ConsumerNetworkClient.java:360)
at org.apache.kafka.clients.consumer.internals.
ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
at org.apache.kafka.clients.consumer.internals.
ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
at org.apache.kafka.clients.consumer.internals.
ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
commitOffsetsSync(ConsumerCoordinator.java:426)
at org.apache.kafka.clients.consumer.KafkaConsumer.
commitSync(KafkaConsumer.java:1059)
at org.apache.kafka.clients.consumer.KafkaConsumer.
commitSync(KafkaConsumer.java:1027)
java.lang.IllegalStateException: Correlation id for response (743)
does not match request (742)
java.lang.IllegalStateException: Correlation id for response (744)
does not match request (743)
java.lang.IllegalStateException: Correlation id for response (745)
does not match request (744)
java.lang.IllegalStateException: Correlation id for response (746)
does not match request (745)
...
It's easy to discard and recreate the consumer instance to recover
however we can't do that with the producer as it occurs in the Sender
thread.
Our cluster and our clients are running Kafka 0.10.0.1.
Under which circumstances would such an error happen ?
Even with logging set to TRACE, we can't spot anything suspicious
shortly before the issue. Is there any data we should try to capture
when this happens ?
Thanks!
Loading...