Tom Raney
2017-01-27 18:16:05 UTC
After adding a new Kafka node, I ran the kafka-reassign-partitions.sh tool
to redistribute topics onto the new machine and it seemed like some of the
migrations were stuck processing for over 24 hours, so I cancelled the
reassignment by deleting the zk node (/admin/reassign_partitions) and used
the kafka-preferred-replica-election.sh to try and resolve it. It didn't
work.
Now, I have partitions in a weird state. For example, I have one partition
that has broker 1003 as a replica but it shouldn't be there. The partition
directory on 1003 is still growing but is way behind the leader and the
other ISR on 1001.
Topic: foo Partition: 2 Leader: 1004 Replicas: 1003,1004,1001 Isr: 1004,1001
When I force a leader election, for that partition, it fails because 1003
is not in sync.
kafka.common.StateChangeFailedException: encountered error while electing
leader for partition [foo,2] due to: Preferred replica 1003 for partition
[foo,2] is either not alive or not in the isr. Current leader and ISR:
[{"leader":1004,"leader_epoch":11,"isr":[1004,1001]}].
When I try to reassign with the config...
{"version":1,"partitions":[{"topic":"foo","partition":2,"replicas":[1004,1001]}]}
I see that it doesn't resolve.
Status of partition reassignment:
Reassignment of partition [foo,2] is still in progress
And, I would think it would since 1001 is already an ISR and the leader is
already 1004.
How do I resolve this?
to redistribute topics onto the new machine and it seemed like some of the
migrations were stuck processing for over 24 hours, so I cancelled the
reassignment by deleting the zk node (/admin/reassign_partitions) and used
the kafka-preferred-replica-election.sh to try and resolve it. It didn't
work.
Now, I have partitions in a weird state. For example, I have one partition
that has broker 1003 as a replica but it shouldn't be there. The partition
directory on 1003 is still growing but is way behind the leader and the
other ISR on 1001.
Topic: foo Partition: 2 Leader: 1004 Replicas: 1003,1004,1001 Isr: 1004,1001
When I force a leader election, for that partition, it fails because 1003
is not in sync.
kafka.common.StateChangeFailedException: encountered error while electing
leader for partition [foo,2] due to: Preferred replica 1003 for partition
[foo,2] is either not alive or not in the isr. Current leader and ISR:
[{"leader":1004,"leader_epoch":11,"isr":[1004,1001]}].
When I try to reassign with the config...
{"version":1,"partitions":[{"topic":"foo","partition":2,"replicas":[1004,1001]}]}
I see that it doesn't resolve.
Status of partition reassignment:
Reassignment of partition [foo,2] is still in progress
And, I would think it would since 1001 is already an ISR and the leader is
already 1004.
How do I resolve this?