Discussion:
Having 4 Node Kafka Cluster
Le Cyberian
2017-03-05 21:49:08 UTC
Permalink
Hi Guys,

I have a scenario where i need to configure a 4 node Kafka Cluster along
with Zookeeper having 4 nodes as well.

I understand as a rule of them when it comes to cluster its always odd
number, however this is the case i need to do it.

For zookeeper, i understand with a 4 node cluster i can tolerate failure of
1 node which is same as having a 3 node cluster which does not give benefit
of having 4th one. how does it effect when it comes to leader election ?

For Kafka its the same case, I wanted to understand if we really want to do
this and achieve fault tolerance some how.

Thanks!

Lee
Le Cyberian
2017-03-05 21:46:53 UTC
Permalink
Hi Guys,

I have a scenario where i need to configure a 4 node Kafka Cluster along
with Zookeeper having 4 nodes as well.

I understand as a rule of them when it comes to cluster its always odd
number, however this is the case i need to do it.

For zookeeper, i understand with a 4 node cluster i can tolerate failure of
1 node which is same as having a 3 node cluster which does not give benefit
of having 4th one. how does it effect when it comes to leader election ?

For Kafka its the same case, I wanted to understand if we really want to do
this and achieve fault tolerance some how.

Thanks!

Lee
Hans Jespersen
2017-03-05 23:10:29 UTC
Permalink
A 4 node zookeeper ensemble will not even work. It MUST be an odd number of zookeeper nodes to start.

For Kafka you can start with any number of nodes (including 4). Remember that it is the partitions that are replicated, not the entire broker so if you have a Kafka node crash, its only the partitions that were leaders on that node that need to move to the remaining brokers and they will try to be balanced across the remaining in-sync replicas on the remaining nodes.

-hans

Sent from my iPhone
Post by Le Cyberian
Hi Guys,
I have a scenario where i need to configure a 4 node Kafka Cluster along
with Zookeeper having 4 nodes as well.
I understand as a rule of them when it comes to cluster its always odd
number, however this is the case i need to do it.
For zookeeper, i understand with a 4 node cluster i can tolerate failure of
1 node which is same as having a 3 node cluster which does not give benefit
of having 4th one. how does it effect when it comes to leader election ?
For Kafka its the same case, I wanted to understand if we really want to do
this and achieve fault tolerance some how.
Thanks!
Lee
Jens Rantil
2017-03-06 08:20:34 UTC
Permalink
Hi Hans,
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd number
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes, that means
a running ensemble of three can't be live-migrated to other nodes (because
that's done by increasing the ensemble and then reducing it in the case of
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that means
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a three node
cluster.

Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB

Email: ***@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Twitter <https://twitter.com/tink>
Hans Jespersen
2017-03-06 11:20:48 UTC
Permalink
Jens,

I think you are correct that a 4 node zookeeper ensemble can be made to work but it will be slightly less resilient than a 3 node ensemble because it can only tolerate 1 failure (same as a 3 node ensemble) and the likelihood of node failures is higher because there is 1 more node that could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).

-hans
Post by Jens Rantil
Hi Hans,
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd number
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes, that means
a running ensemble of three can't be live-migrated to other nodes (because
that's done by increasing the ensemble and then reducing it in the case of
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that means
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a three node
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Twitter <https://twitter.com/tink>
Le Cyberian
2017-03-06 13:38:47 UTC
Permalink
Hi Guys,

Thank you very much for you reply.

The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.

There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The idea
is to have Active-Active setup along with fault tolerance so that if one of
the site goes on the operations are normal.

In this case if i go ahead with 4 node-cluster of both zookeeper and kafka
it will give failover tolerance for 1 node only.

What do you suggest to do in this case ? because to divide between 2 sites
it needs to be even number if that makes sense ? Also if possible some help
regarding partitions for topic and replication factor.

I already have Kafka running with quiet few topics having replication
factor 1 along with 1 default partition, is there a way to repartition /
increase partition of existing topics when i migrate to above setup ? I
think we can increase replication factor by Kafka rebalance tool.

Thanks alot for your help and time looking into this.

BR,

Le
Post by Hans Jespersen
Jens,
I think you are correct that a 4 node zookeeper ensemble can be made to
work but it will be slightly less resilient than a 3 node ensemble because
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Post by Jens Rantil
Hi Hans,
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd number
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes, that
means
Post by Jens Rantil
a running ensemble of three can't be live-migrated to other nodes
(because
Post by Jens Rantil
that's done by increasing the ensemble and then reducing it in the case
of
Post by Jens Rantil
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that means
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a three
node
Post by Jens Rantil
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Post by Jens Rantil
Twitter <https://twitter.com/tink>
Le Cyberian
2017-03-06 13:37:23 UTC
Permalink
Hi Guys,

Thank you very much for you reply.

The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.

There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The idea
is to have Active-Active setup along with fault tolerance so that if one of
the site goes on the operations are normal.

In this case if i go ahead with 4 node-cluster of both zookeeper and kafka
it will give failover tolerance for 1 node only.

What do you suggest to do in this case ? because to divide between 2 sites
it needs to be even number if that makes sense ? Also if possible some help
regarding partitions for topic and replication factor.

I already have Kafka running with quiet few topics having replication
factor 1 along with 1 default partition, is there a way to repartition /
increase partition of existing topics when i migrate to above setup ? I
think we can increase replication factor by Kafka rebalance tool.

Thanks alot for your help and time looking into this.

BR,

Le
Post by Hans Jespersen
Jens,
I think you are correct that a 4 node zookeeper ensemble can be made to
work but it will be slightly less resilient than a 3 node ensemble because
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Post by Jens Rantil
Hi Hans,
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd number
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes, that
means
Post by Jens Rantil
a running ensemble of three can't be live-migrated to other nodes
(because
Post by Jens Rantil
that's done by increasing the ensemble and then reducing it in the case
of
Post by Jens Rantil
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that means
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a three
node
Post by Jens Rantil
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Post by Jens Rantil
Twitter <https://twitter.com/tink>
Hans Jespersen
2017-03-06 13:50:23 UTC
Permalink
What do you mean when you say you have "2 sites not datacenters"? You
should be very careful configuring a stretch cluster across multiple sites.
What is the RTT between the two sites? Why do you think that MIrror Maker
(or Confluent Replicator) would not work between the sites and yet you
think a stretch cluster will work? That seems wrong.

-hans

/**
* Hans Jespersen, Principal Systems Engineer, Confluent Inc.
* ***@confluent.io (650)924-2670
*/
Post by Le Cyberian
Hi Guys,
Thank you very much for you reply.
The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.
There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The idea
is to have Active-Active setup along with fault tolerance so that if one of
the site goes on the operations are normal.
In this case if i go ahead with 4 node-cluster of both zookeeper and kafka
it will give failover tolerance for 1 node only.
What do you suggest to do in this case ? because to divide between 2 sites
it needs to be even number if that makes sense ? Also if possible some help
regarding partitions for topic and replication factor.
I already have Kafka running with quiet few topics having replication
factor 1 along with 1 default partition, is there a way to repartition /
increase partition of existing topics when i migrate to above setup ? I
think we can increase replication factor by Kafka rebalance tool.
Thanks alot for your help and time looking into this.
BR,
Le
Post by Hans Jespersen
Jens,
I think you are correct that a 4 node zookeeper ensemble can be made to
work but it will be slightly less resilient than a 3 node ensemble
because
Post by Hans Jespersen
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Post by Jens Rantil
Hi Hans,
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd
number
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes, that
means
Post by Jens Rantil
a running ensemble of three can't be live-migrated to other nodes
(because
Post by Jens Rantil
that's done by increasing the ensemble and then reducing it in the case
of
Post by Jens Rantil
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that
means
Post by Hans Jespersen
Post by Jens Rantil
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a three
node
Post by Jens Rantil
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Post by Jens Rantil
Twitter <https://twitter.com/tink>
Le Cyberian
2017-03-06 13:57:59 UTC
Permalink
Hi Hans,

Thank you for your reply.

Its basically two different server rooms on different floors and they are
connected with fiber connectivity so its almost like a local connection
between them no network latencies / lag.

If i do a Mirror Maker / Replicator then i will not be able to use them at
the same time for writes./ producers. because the consumers / producers
will request from all of them

BR,

Lee
Post by Hans Jespersen
What do you mean when you say you have "2 sites not datacenters"? You
should be very careful configuring a stretch cluster across multiple sites.
What is the RTT between the two sites? Why do you think that MIrror Maker
(or Confluent Replicator) would not work between the sites and yet you
think a stretch cluster will work? That seems wrong.
-hans
/**
* Hans Jespersen, Principal Systems Engineer, Confluent Inc.
*/
Post by Le Cyberian
Hi Guys,
Thank you very much for you reply.
The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.
There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The
idea
Post by Le Cyberian
is to have Active-Active setup along with fault tolerance so that if one
of
Post by Le Cyberian
the site goes on the operations are normal.
In this case if i go ahead with 4 node-cluster of both zookeeper and
kafka
Post by Le Cyberian
it will give failover tolerance for 1 node only.
What do you suggest to do in this case ? because to divide between 2
sites
Post by Le Cyberian
it needs to be even number if that makes sense ? Also if possible some
help
Post by Le Cyberian
regarding partitions for topic and replication factor.
I already have Kafka running with quiet few topics having replication
factor 1 along with 1 default partition, is there a way to repartition /
increase partition of existing topics when i migrate to above setup ? I
think we can increase replication factor by Kafka rebalance tool.
Thanks alot for your help and time looking into this.
BR,
Le
Post by Hans Jespersen
Jens,
I think you are correct that a 4 node zookeeper ensemble can be made to
work but it will be slightly less resilient than a 3 node ensemble
because
Post by Hans Jespersen
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Post by Jens Rantil
Hi Hans,
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd
number
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes, that
means
Post by Jens Rantil
a running ensemble of three can't be live-migrated to other nodes
(because
Post by Jens Rantil
that's done by increasing the ensemble and then reducing it in the
case
Post by Le Cyberian
Post by Hans Jespersen
of
Post by Jens Rantil
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that
means
Post by Hans Jespersen
Post by Jens Rantil
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a three
node
Post by Jens Rantil
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Post by Jens Rantil
Twitter <https://twitter.com/tink>
Hans Jespersen
2017-03-06 14:10:01 UTC
Permalink
In that case it’s really one cluster. Make sure to set different rack ids for each server room so kafka will ensure that the replicas always span both floors and you don’t loose availability of data if a server room goes down.
You will have to configure one addition zookeeper node in each site which you will only ever startup if a site goes down because otherwise 2 of 4 zookeeper nodes is not a quorum.Again you would be better with 3 nodes because then you would only have to do this in the site that has the single active node.

-hans
Post by Jens Rantil
Hi Hans,
Thank you for your reply.
Its basically two different server rooms on different floors and they are
connected with fiber connectivity so its almost like a local connection
between them no network latencies / lag.
If i do a Mirror Maker / Replicator then i will not be able to use them at
the same time for writes./ producers. because the consumers / producers
will request from all of them
BR,
Lee
Post by Hans Jespersen
What do you mean when you say you have "2 sites not datacenters"? You
should be very careful configuring a stretch cluster across multiple sites.
What is the RTT between the two sites? Why do you think that MIrror Maker
(or Confluent Replicator) would not work between the sites and yet you
think a stretch cluster will work? That seems wrong.
-hans
/**
* Hans Jespersen, Principal Systems Engineer, Confluent Inc.
*/
Post by Le Cyberian
Hi Guys,
Thank you very much for you reply.
The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.
There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The
idea
Post by Le Cyberian
is to have Active-Active setup along with fault tolerance so that if one
of
Post by Le Cyberian
the site goes on the operations are normal.
In this case if i go ahead with 4 node-cluster of both zookeeper and
kafka
Post by Le Cyberian
it will give failover tolerance for 1 node only.
What do you suggest to do in this case ? because to divide between 2
sites
Post by Le Cyberian
it needs to be even number if that makes sense ? Also if possible some
help
Post by Le Cyberian
regarding partitions for topic and replication factor.
I already have Kafka running with quiet few topics having replication
factor 1 along with 1 default partition, is there a way to repartition /
increase partition of existing topics when i migrate to above setup ? I
think we can increase replication factor by Kafka rebalance tool.
Thanks alot for your help and time looking into this.
BR,
Le
Post by Hans Jespersen
Jens,
I think you are correct that a 4 node zookeeper ensemble can be made to
work but it will be slightly less resilient than a 3 node ensemble
because
Post by Hans Jespersen
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Post by Jens Rantil
Hi Hans,
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd
number
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes, that
means
Post by Jens Rantil
a running ensemble of three can't be live-migrated to other nodes
(because
Post by Jens Rantil
that's done by increasing the ensemble and then reducing it in the
case
Post by Le Cyberian
Post by Hans Jespersen
of
Post by Jens Rantil
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that
means
Post by Hans Jespersen
Post by Jens Rantil
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a three
node
Post by Jens Rantil
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Post by Jens Rantil
Twitter <https://twitter.com/tink>
Alexander Binzberger
2017-03-06 15:16:21 UTC
Permalink
I agree on this is one cluster but having one additional ZK node per
site does not help. (as far as I understand ZK)

A 3 out of 6 is also not a majority. So I think you mean 3/5 with a
cloned 3rd one. This would mean manually switching the cloned one for
majority which can cause issues again.
1. You actually build a master/slave ZK with manually switch over.
2. While switching the clone from room to room you would have downtime.
3. If you switch on both ZK node clones at the same time (by mistake)
you screwed.
4. If you "switch" clones instead of moving it will all data on disk you
generate a split brain from which you have to recover first.

So if you loose the connection between the rooms / the rooms get
separated / you loose one room:
* You (might) need manual interaction
* loose automatic fail-over between the rooms
* might face complete outage if your "master" room with the active 3rd
node is hit.
Actually this is the same scenario with 2/3 nodes spread over two locations.

What you need is a third cross connected location for real fault
tolerance and distribute your 3 or 5 ZK nodes over those.
Or live with a possible outage in such a scenario.

Additional Hints:
* You can run any number of Kafka brokers on a ZK cluster. In your case
this could be 4 Kafka brokers on 3 ZK nodes.
* You should set topic replication to 2 (can be done at any time) and
some other producer/broker settings to ensure your messages will not get
lost in switch over cases.
* ZK service does not react nicely on disk full.
In that case it’s really one cluster. Make sure to set different rack ids for each server room so kafka will ensure that the replicas always span both floors and you don’t loose availability of data if a server room goes down.
You will have to configure one addition zookeeper node in each site which you will only ever startup if a site goes down because otherwise 2 of 4 zookeeper nodes is not a quorum.Again you would be better with 3 nodes because then you would only have to do this in the site that has the single active node.
-hans
Post by Jens Rantil
Hi Hans,
Thank you for your reply.
Its basically two different server rooms on different floors and they are
connected with fiber connectivity so its almost like a local connection
between them no network latencies / lag.
If i do a Mirror Maker / Replicator then i will not be able to use them at
the same time for writes./ producers. because the consumers / producers
will request from all of them
BR,
Lee
Post by Hans Jespersen
What do you mean when you say you have "2 sites not datacenters"? You
should be very careful configuring a stretch cluster across multiple sites.
What is the RTT between the two sites? Why do you think that MIrror Maker
(or Confluent Replicator) would not work between the sites and yet you
think a stretch cluster will work? That seems wrong.
-hans
/**
* Hans Jespersen, Principal Systems Engineer, Confluent Inc.
*/
Post by Le Cyberian
Hi Guys,
Thank you very much for you reply.
The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.
There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The
idea
Post by Le Cyberian
is to have Active-Active setup along with fault tolerance so that if one
of
Post by Le Cyberian
the site goes on the operations are normal.
In this case if i go ahead with 4 node-cluster of both zookeeper and
kafka
Post by Le Cyberian
it will give failover tolerance for 1 node only.
What do you suggest to do in this case ? because to divide between 2
sites
Post by Le Cyberian
it needs to be even number if that makes sense ? Also if possible some
help
Post by Le Cyberian
regarding partitions for topic and replication factor.
I already have Kafka running with quiet few topics having replication
factor 1 along with 1 default partition, is there a way to repartition /
increase partition of existing topics when i migrate to above setup ? I
think we can increase replication factor by Kafka rebalance tool.
Thanks alot for your help and time looking into this.
BR,
Le
Post by Hans Jespersen
Jens,
I think you are correct that a 4 node zookeeper ensemble can be made to
work but it will be slightly less resilient than a 3 node ensemble
because
Post by Hans Jespersen
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Post by Jens Rantil
Hi Hans,
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd
number
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes, that
means
Post by Jens Rantil
a running ensemble of three can't be live-migrated to other nodes
(because
Post by Jens Rantil
that's done by increasing the ensemble and then reducing it in the
case
Post by Le Cyberian
Post by Hans Jespersen
of
Post by Jens Rantil
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that
means
Post by Hans Jespersen
Post by Jens Rantil
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a three
node
Post by Jens Rantil
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Post by Jens Rantil
Twitter <https://twitter.com/tink>
Le Cyberian
2017-03-06 18:44:58 UTC
Permalink
Thanks Han and Alexander for taking time out and your responses.

I now understand the risks and the possible outcome of having the desired
setup.

What would be better in your opinion to have failover (active-active)
between both of these server rooms to avoid switching to the clone / 3rd
zookeeper.

I mean even if there are 5 nodes having 3 in one server room and 2 in other
still there would be problem related to zookeeper majority leader election
if the server room goes down that has 3 nodes.

is there some way to achieve this ?

Thanks again!

Lee

On Mon, Mar 6, 2017 at 4:16 PM, Alexander Binzberger <
I agree on this is one cluster but having one additional ZK node per site
does not help. (as far as I understand ZK)
A 3 out of 6 is also not a majority. So I think you mean 3/5 with a cloned
3rd one. This would mean manually switching the cloned one for majority
which can cause issues again.
1. You actually build a master/slave ZK with manually switch over.
2. While switching the clone from room to room you would have downtime.
3. If you switch on both ZK node clones at the same time (by mistake) you
screwed.
4. If you "switch" clones instead of moving it will all data on disk you
generate a split brain from which you have to recover first.
So if you loose the connection between the rooms / the rooms get separated
* You (might) need manual interaction
* loose automatic fail-over between the rooms
* might face complete outage if your "master" room with the active 3rd
node is hit.
Actually this is the same scenario with 2/3 nodes spread over two locations.
What you need is a third cross connected location for real fault tolerance
and distribute your 3 or 5 ZK nodes over those.
Or live with a possible outage in such a scenario.
* You can run any number of Kafka brokers on a ZK cluster. In your case
this could be 4 Kafka brokers on 3 ZK nodes.
* You should set topic replication to 2 (can be done at any time) and some
other producer/broker settings to ensure your messages will not get lost in
switch over cases.
* ZK service does not react nicely on disk full.
Post by Hans Jespersen
In that case it’s really one cluster. Make sure to set different rack ids
for each server room so kafka will ensure that the replicas always span
both floors and you don’t loose availability of data if a server room goes
down.
You will have to configure one addition zookeeper node in each site which
you will only ever startup if a site goes down because otherwise 2 of 4
zookeeper nodes is not a quorum.Again you would be better with 3 nodes
because then you would only have to do this in the site that has the single
active node.
-hans
Post by Jens Rantil
Hi Hans,
Thank you for your reply.
Its basically two different server rooms on different floors and they are
connected with fiber connectivity so its almost like a local connection
between them no network latencies / lag.
If i do a Mirror Maker / Replicator then i will not be able to use them at
the same time for writes./ producers. because the consumers / producers
will request from all of them
BR,
Lee
What do you mean when you say you have "2 sites not datacenters"? You
Post by Hans Jespersen
should be very careful configuring a stretch cluster across multiple sites.
What is the RTT between the two sites? Why do you think that MIrror Maker
(or Confluent Replicator) would not work between the sites and yet you
think a stretch cluster will work? That seems wrong.
-hans
/**
* Hans Jespersen, Principal Systems Engineer, Confluent Inc.
*/
Hi Guys,
Post by Le Cyberian
Thank you very much for you reply.
The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.
There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The
idea
Post by Le Cyberian
is to have Active-Active setup along with fault tolerance so that if one
of
Post by Le Cyberian
the site goes on the operations are normal.
In this case if i go ahead with 4 node-cluster of both zookeeper and
kafka
Post by Le Cyberian
it will give failover tolerance for 1 node only.
What do you suggest to do in this case ? because to divide between 2
sites
Post by Le Cyberian
it needs to be even number if that makes sense ? Also if possible some
help
Post by Le Cyberian
regarding partitions for topic and replication factor.
I already have Kafka running with quiet few topics having replication
factor 1 along with 1 default partition, is there a way to repartition /
increase partition of existing topics when i migrate to above setup ? I
think we can increase replication factor by Kafka rebalance tool.
Thanks alot for your help and time looking into this.
BR,
Le
Jens,
Post by Hans Jespersen
I think you are correct that a 4 node zookeeper ensemble can be made to
work but it will be slightly less resilient than a 3 node ensemble
because
Post by Hans Jespersen
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Hi Hans,
Post by Hans Jespersen
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd
number
of zookeeper nodes to start.
Post by Hans Jespersen
Are you sure about that? If Zookeer doesn't run with four nodes, that
means
Post by Hans Jespersen
a running ensemble of three can't be live-migrated to other nodes
(because
Post by Hans Jespersen
that's done by increasing the ensemble and then reducing it in the
case
of
Post by Hans Jespersen
Post by Hans Jespersen
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that
means
quorum will be three nodes, so there's no added benefit in terms of
Post by Hans Jespersen
availability since you can only loose one node just like with a three
node
Post by Hans Jespersen
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Post by Hans Jespersen
Twitter <https://twitter.com/tink>
Hans Jespersen
2017-03-06 19:22:55 UTC
Permalink
Is there any way you can find a third rack/server room/power supply nearby just for the 1 extra zookeeper node? You don’t have to put any kafka brokers there, just a single zookeeper. It’s less likely to have a 3-way split brain because of a network partition. It’s so much cleaner with 3 availability zones because everything would be automatic failover. This is how most people run when deployed in Amazon.

Baring that I would say the next best thing would be 3 zookeepers in one zone and 2 zookeepers in the other zone so it will auto-failover if the 2 zk zone fails. If the 3 zk zone fails you can setup a well tested set of manual steps to carefully configure a 3rd zookeeper clone (which matches the id of one of the failed nodes) and still get your system back up and running. If this is not something you have done before I suggest getting a few days of expert consulting to have someone help you set it up, test it, and document the proper failover and recovery procedures.

-hans
Post by Le Cyberian
Thanks Han and Alexander for taking time out and your responses.
I now understand the risks and the possible outcome of having the desired
setup.
What would be better in your opinion to have failover (active-active)
between both of these server rooms to avoid switching to the clone / 3rd
zookeeper.
I mean even if there are 5 nodes having 3 in one server room and 2 in other
still there would be problem related to zookeeper majority leader election
if the server room goes down that has 3 nodes.
is there some way to achieve this ?
Thanks again!
Lee
On Mon, Mar 6, 2017 at 4:16 PM, Alexander Binzberger <
I agree on this is one cluster but having one additional ZK node per site
does not help. (as far as I understand ZK)
A 3 out of 6 is also not a majority. So I think you mean 3/5 with a cloned
3rd one. This would mean manually switching the cloned one for majority
which can cause issues again.
1. You actually build a master/slave ZK with manually switch over.
2. While switching the clone from room to room you would have downtime.
3. If you switch on both ZK node clones at the same time (by mistake) you
screwed.
4. If you "switch" clones instead of moving it will all data on disk you
generate a split brain from which you have to recover first.
So if you loose the connection between the rooms / the rooms get separated
* You (might) need manual interaction
* loose automatic fail-over between the rooms
* might face complete outage if your "master" room with the active 3rd
node is hit.
Actually this is the same scenario with 2/3 nodes spread over two locations.
What you need is a third cross connected location for real fault tolerance
and distribute your 3 or 5 ZK nodes over those.
Or live with a possible outage in such a scenario.
* You can run any number of Kafka brokers on a ZK cluster. In your case
this could be 4 Kafka brokers on 3 ZK nodes.
* You should set topic replication to 2 (can be done at any time) and some
other producer/broker settings to ensure your messages will not get lost in
switch over cases.
* ZK service does not react nicely on disk full.
Post by Hans Jespersen
In that case it’s really one cluster. Make sure to set different rack ids
for each server room so kafka will ensure that the replicas always span
both floors and you don’t loose availability of data if a server room goes
down.
You will have to configure one addition zookeeper node in each site which
you will only ever startup if a site goes down because otherwise 2 of 4
zookeeper nodes is not a quorum.Again you would be better with 3 nodes
because then you would only have to do this in the site that has the single
active node.
-hans
Post by Jens Rantil
Hi Hans,
Thank you for your reply.
Its basically two different server rooms on different floors and they are
connected with fiber connectivity so its almost like a local connection
between them no network latencies / lag.
If i do a Mirror Maker / Replicator then i will not be able to use them at
the same time for writes./ producers. because the consumers / producers
will request from all of them
BR,
Lee
What do you mean when you say you have "2 sites not datacenters"? You
Post by Hans Jespersen
should be very careful configuring a stretch cluster across multiple sites.
What is the RTT between the two sites? Why do you think that MIrror Maker
(or Confluent Replicator) would not work between the sites and yet you
think a stretch cluster will work? That seems wrong.
-hans
/**
* Hans Jespersen, Principal Systems Engineer, Confluent Inc.
*/
Hi Guys,
Post by Le Cyberian
Thank you very much for you reply.
The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.
There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The
idea
Post by Le Cyberian
is to have Active-Active setup along with fault tolerance so that if one
of
Post by Le Cyberian
the site goes on the operations are normal.
In this case if i go ahead with 4 node-cluster of both zookeeper and
kafka
Post by Le Cyberian
it will give failover tolerance for 1 node only.
What do you suggest to do in this case ? because to divide between 2
sites
Post by Le Cyberian
it needs to be even number if that makes sense ? Also if possible some
help
Post by Le Cyberian
regarding partitions for topic and replication factor.
I already have Kafka running with quiet few topics having replication
factor 1 along with 1 default partition, is there a way to repartition /
increase partition of existing topics when i migrate to above setup ? I
think we can increase replication factor by Kafka rebalance tool.
Thanks alot for your help and time looking into this.
BR,
Le
Jens,
Post by Hans Jespersen
I think you are correct that a 4 node zookeeper ensemble can be made to
work but it will be slightly less resilient than a 3 node ensemble
because
Post by Hans Jespersen
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Hi Hans,
Post by Hans Jespersen
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd
number
of zookeeper nodes to start.
Post by Hans Jespersen
Are you sure about that? If Zookeer doesn't run with four nodes, that
means
Post by Hans Jespersen
a running ensemble of three can't be live-migrated to other nodes
(because
Post by Hans Jespersen
that's done by increasing the ensemble and then reducing it in the
case
of
Post by Hans Jespersen
Post by Hans Jespersen
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that
means
quorum will be three nodes, so there's no added benefit in terms of
Post by Hans Jespersen
availability since you can only loose one node just like with a three
node
Post by Hans Jespersen
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Post by Hans Jespersen
Twitter <https://twitter.com/tink>
Le Cyberian
2017-03-06 19:37:34 UTC
Permalink
Hi Han,

Thank you for your response. I understand. Its not possible to have a third
rack/server room at the moment as the requirement is to have redundancy
between both. I tried already to get one :-/

Is it possible to have a Zookeeper Ensemble (3 node) in one server room and
same in the other and have some sort of master-master replication in
between both of them ? would this make sense if its possible ? since in
this case both would have same config and split brain theoretically should
not happen.

I haven't does this Zookeeper 3rd node hack before :) i guess i need to
play around with it for a while to get it proper documented and functional
/ tested :)

Thanks again!

Le
Post by Hans Jespersen
Is there any way you can find a third rack/server room/power supply nearby
just for the 1 extra zookeeper node? You don’t have to put any kafka
brokers there, just a single zookeeper. It’s less likely to have a 3-way
split brain because of a network partition. It’s so much cleaner with 3
availability zones because everything would be automatic failover. This is
how most people run when deployed in Amazon.
Baring that I would say the next best thing would be 3 zookeepers in one
zone and 2 zookeepers in the other zone so it will auto-failover if the 2
zk zone fails. If the 3 zk zone fails you can setup a well tested set of
manual steps to carefully configure a 3rd zookeeper clone (which matches
the id of one of the failed nodes) and still get your system back up and
running. If this is not something you have done before I suggest getting a
few days of expert consulting to have someone help you set it up, test it,
and document the proper failover and recovery procedures.
-hans
Post by Le Cyberian
Thanks Han and Alexander for taking time out and your responses.
I now understand the risks and the possible outcome of having the desired
setup.
What would be better in your opinion to have failover (active-active)
between both of these server rooms to avoid switching to the clone / 3rd
zookeeper.
I mean even if there are 5 nodes having 3 in one server room and 2 in
other
Post by Le Cyberian
still there would be problem related to zookeeper majority leader
election
Post by Le Cyberian
if the server room goes down that has 3 nodes.
is there some way to achieve this ?
Thanks again!
Lee
On Mon, Mar 6, 2017 at 4:16 PM, Alexander Binzberger <
Post by Alexander Binzberger
I agree on this is one cluster but having one additional ZK node per
site
Post by Le Cyberian
Post by Alexander Binzberger
does not help. (as far as I understand ZK)
A 3 out of 6 is also not a majority. So I think you mean 3/5 with a
cloned
Post by Le Cyberian
Post by Alexander Binzberger
3rd one. This would mean manually switching the cloned one for majority
which can cause issues again.
1. You actually build a master/slave ZK with manually switch over.
2. While switching the clone from room to room you would have downtime.
3. If you switch on both ZK node clones at the same time (by mistake)
you
Post by Le Cyberian
Post by Alexander Binzberger
screwed.
4. If you "switch" clones instead of moving it will all data on disk you
generate a split brain from which you have to recover first.
So if you loose the connection between the rooms / the rooms get
separated
Post by Le Cyberian
Post by Alexander Binzberger
* You (might) need manual interaction
* loose automatic fail-over between the rooms
* might face complete outage if your "master" room with the active 3rd
node is hit.
Actually this is the same scenario with 2/3 nodes spread over two locations.
What you need is a third cross connected location for real fault
tolerance
Post by Le Cyberian
Post by Alexander Binzberger
and distribute your 3 or 5 ZK nodes over those.
Or live with a possible outage in such a scenario.
* You can run any number of Kafka brokers on a ZK cluster. In your case
this could be 4 Kafka brokers on 3 ZK nodes.
* You should set topic replication to 2 (can be done at any time) and
some
Post by Le Cyberian
Post by Alexander Binzberger
other producer/broker settings to ensure your messages will not get
lost in
Post by Le Cyberian
Post by Alexander Binzberger
switch over cases.
* ZK service does not react nicely on disk full.
Post by Hans Jespersen
In that case it’s really one cluster. Make sure to set different rack
ids
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
for each server room so kafka will ensure that the replicas always span
both floors and you don’t loose availability of data if a server room
goes
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
down.
You will have to configure one addition zookeeper node in each site
which
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
you will only ever startup if a site goes down because otherwise 2 of 4
zookeeper nodes is not a quorum.Again you would be better with 3 nodes
because then you would only have to do this in the site that has the
single
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
active node.
-hans
Post by Jens Rantil
Hi Hans,
Thank you for your reply.
Its basically two different server rooms on different floors and they
are
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
connected with fiber connectivity so its almost like a local
connection
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
between them no network latencies / lag.
If i do a Mirror Maker / Replicator then i will not be able to use
them
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
at
the same time for writes./ producers. because the consumers /
producers
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
will request from all of them
BR,
Lee
What do you mean when you say you have "2 sites not datacenters"? You
Post by Hans Jespersen
should be very careful configuring a stretch cluster across multiple sites.
What is the RTT between the two sites? Why do you think that MIrror Maker
(or Confluent Replicator) would not work between the sites and yet
you
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
think a stretch cluster will work? That seems wrong.
-hans
/**
* Hans Jespersen, Principal Systems Engineer, Confluent Inc.
*/
Hi Guys,
Post by Le Cyberian
Thank you very much for you reply.
The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.
There will be 4 nodes in total, like 2 in Site A and 2 in Site B.
The
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
idea
Post by Le Cyberian
is to have Active-Active setup along with fault tolerance so that if one
of
Post by Le Cyberian
the site goes on the operations are normal.
In this case if i go ahead with 4 node-cluster of both zookeeper and
kafka
Post by Le Cyberian
it will give failover tolerance for 1 node only.
What do you suggest to do in this case ? because to divide between 2
sites
Post by Le Cyberian
it needs to be even number if that makes sense ? Also if possible
some
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
help
Post by Le Cyberian
regarding partitions for topic and replication factor.
I already have Kafka running with quiet few topics having
replication
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
factor 1 along with 1 default partition, is there a way to
repartition
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
/
increase partition of existing topics when i migrate to above setup
? I
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
think we can increase replication factor by Kafka rebalance tool.
Thanks alot for your help and time looking into this.
BR,
Le
Jens,
Post by Hans Jespersen
I think you are correct that a 4 node zookeeper ensemble can be
made
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
Post by Hans Jespersen
to
work but it will be slightly less resilient than a 3 node ensemble
because
Post by Hans Jespersen
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Hi Hans,
Post by Hans Jespersen
On Mon, Mar 6, 2017 at 12:10 AM, Hans Jespersen <
A 4 node zookeeper ensemble will not even work. It MUST be an odd
number
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes,
that
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
Post by Hans Jespersen
means
a running ensemble of three can't be live-migrated to other nodes
(because
that's done by increasing the ensemble and then reducing it in the
case
of
Post by Hans Jespersen
3-node ensembles). IIRC, you can run four Zookeeper nodes, but
that
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
Post by Hans Jespersen
means
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a
three
Post by Le Cyberian
Post by Alexander Binzberger
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
Post by Hans Jespersen
node
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Twitter <https://twitter.com/tink>
Alexander Binzberger
2017-03-07 12:23:36 UTC
Permalink
A raspberry pi or a apu board would be enough for your majority ZK node
in your 3rd room.

One kafka cluster belongs to one ZK cluster. You could have one ZK/Kafka
cluster per room. There are tools for one direction copying of Kafka
messages but bi-directional "replication" is not possible as far as I
know. It does not make much sense I think but it depends on the use-case.
Post by Le Cyberian
Hi Han,
Thank you for your response. I understand. Its not possible to have a third
rack/server room at the moment as the requirement is to have redundancy
between both. I tried already to get one :-/
Is it possible to have a Zookeeper Ensemble (3 node) in one server room and
same in the other and have some sort of master-master replication in
between both of them ? would this make sense if its possible ? since in
this case both would have same config and split brain theoretically should
not happen.
I haven't does this Zookeeper 3rd node hack before :) i guess i need to
play around with it for a while to get it proper documented and functional
/ tested :)
Thanks again!
Le
Post by Hans Jespersen
Is there any way you can find a third rack/server room/power supply nearby
just for the 1 extra zookeeper node? You don’t have to put any kafka
brokers there, just a single zookeeper. It’s less likely to have a 3-way
split brain because of a network partition. It’s so much cleaner with 3
availability zones because everything would be automatic failover. This is
how most people run when deployed in Amazon.
Baring that I would say the next best thing would be 3 zookeepers in one
zone and 2 zookeepers in the other zone so it will auto-failover if the 2
zk zone fails. If the 3 zk zone fails you can setup a well tested set of
manual steps to carefully configure a 3rd zookeeper clone (which matches
the id of one of the failed nodes) and still get your system back up and
running. If this is not something you have done before I suggest getting a
few days of expert consulting to have someone help you set it up, test it,
and document the proper failover and recovery procedures.
-hans
Post by Le Cyberian
Thanks Han and Alexander for taking time out and your responses.
I now understand the risks and the possible outcome of having the desired
setup.
What would be better in your opinion to have failover (active-active)
between both of these server rooms to avoid switching to the clone / 3rd
zookeeper.
I mean even if there are 5 nodes having 3 in one server room and 2 in
other
Post by Le Cyberian
still there would be problem related to zookeeper majority leader
election
Post by Le Cyberian
if the server room goes down that has 3 nodes.
is there some way to achieve this ?
Thanks again!
Lee
On Mon, Mar 6, 2017 at 4:16 PM, Alexander Binzberger <
Post by Alexander Binzberger
I agree on this is one cluster but having one additional ZK node per
site
Post by Le Cyberian
Post by Alexander Binzberger
does not help. (as far as I understand ZK)
A 3 out of 6 is also not a majority. So I think you mean 3/5 with a
cloned
Post by Le Cyberian
Post by Alexander Binzberger
3rd one. This would mean manually switching the cloned one for majority
which can cause issues again.
1. You actually build a master/slave ZK with manually switch over.
2. While switching the clone from room to room you would have downtime.
3. If you switch on both ZK node clones at the same time (by mistake)
you
Post by Le Cyberian
Post by Alexander Binzberger
screwed.
4. If you "switch" clones instead of moving it will all data on disk you
generate a split brain from which you have to recover first.
So if you loose the connection between the rooms / the rooms get
separated
Post by Le Cyberian
Post by Alexander Binzberger
* You (might) need manual interaction
* loose automatic fail-over between the rooms
* might face complete outage if your "master" room with the active 3rd
node is hit.
Actually this is the same scenario with 2/3 nodes spread over two locations.
What you need is a third cross connected location for real fault
tolerance
Post by Le Cyberian
Post by Alexander Binzberger
and distribute your 3 or 5 ZK nodes over those.
Or live with a possible outage in such a scenario.
* You can run any number of Kafka brokers on a ZK cluster. In your case
this could be 4 Kafka brokers on 3 ZK nodes.
* You should set topic replication to 2 (can be done at any time) and
some
Post by Le Cyberian
Post by Alexander Binzberger
other producer/broker settings to ensure your messages will not get
lost in
Post by Le Cyberian
Post by Alexander Binzberger
switch over cases.
* ZK service does not react nicely on disk full.
In that case it’s really one cluster. Make sure to set different rack
ids
Post by Le Cyberian
Post by Alexander Binzberger
for each server room so kafka will ensure that the replicas always span
both floors and you don’t loose availability of data if a server room
goes
Post by Le Cyberian
Post by Alexander Binzberger
down.
You will have to configure one addition zookeeper node in each site
which
Post by Le Cyberian
Post by Alexander Binzberger
you will only ever startup if a site goes down because otherwise 2 of 4
zookeeper nodes is not a quorum.Again you would be better with 3 nodes
because then you would only have to do this in the site that has the
single
Post by Le Cyberian
Post by Alexander Binzberger
active node.
-hans
Post by Jens Rantil
Hi Hans,
Thank you for your reply.
Its basically two different server rooms on different floors and they
are
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
connected with fiber connectivity so its almost like a local
connection
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
between them no network latencies / lag.
If i do a Mirror Maker / Replicator then i will not be able to use
them
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
at
the same time for writes./ producers. because the consumers /
producers
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
will request from all of them
BR,
Lee
What do you mean when you say you have "2 sites not datacenters"? You
Post by Hans Jespersen
should be very careful configuring a stretch cluster across multiple sites.
What is the RTT between the two sites? Why do you think that MIrror Maker
(or Confluent Replicator) would not work between the sites and yet
you
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
Post by Hans Jespersen
think a stretch cluster will work? That seems wrong.
-hans
/**
* Hans Jespersen, Principal Systems Engineer, Confluent Inc.
*/
Hi Guys,
Post by Le Cyberian
Thank you very much for you reply.
The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.
There will be 4 nodes in total, like 2 in Site A and 2 in Site B.
The
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
Post by Hans Jespersen
idea
Post by Le Cyberian
is to have Active-Active setup along with fault tolerance so that if one
of
Post by Le Cyberian
the site goes on the operations are normal.
In this case if i go ahead with 4 node-cluster of both zookeeper and
kafka
Post by Le Cyberian
it will give failover tolerance for 1 node only.
What do you suggest to do in this case ? because to divide between 2
sites
Post by Le Cyberian
it needs to be even number if that makes sense ? Also if possible
some
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
Post by Hans Jespersen
help
Post by Le Cyberian
regarding partitions for topic and replication factor.
I already have Kafka running with quiet few topics having
replication
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
factor 1 along with 1 default partition, is there a way to
repartition
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
/
increase partition of existing topics when i migrate to above setup
? I
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
think we can increase replication factor by Kafka rebalance tool.
Thanks alot for your help and time looking into this.
BR,
Le
Jens,
Post by Hans Jespersen
I think you are correct that a 4 node zookeeper ensemble can be
made
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
Post by Hans Jespersen
to
work but it will be slightly less resilient than a 3 node ensemble
because
Post by Hans Jespersen
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Hi Hans,
Post by Hans Jespersen
On Mon, Mar 6, 2017 at 12:10 AM, Hans Jespersen <
A 4 node zookeeper ensemble will not even work. It MUST be an odd number
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes,
that
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
Post by Hans Jespersen
means
a running ensemble of three can't be live-migrated to other nodes
(because
that's done by increasing the ensemble and then reducing it in the
case
of
Post by Hans Jespersen
3-node ensembles). IIRC, you can run four Zookeeper nodes, but
that
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
Post by Hans Jespersen
means
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a
three
Post by Le Cyberian
Post by Alexander Binzberger
Post by Jens Rantil
Post by Hans Jespersen
Post by Le Cyberian
Post by Hans Jespersen
node
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Twitter <https://twitter.com/tink>
--
Alexander Binzberger
System Designer - WINGcon AG
Tel. +49 7543 966-119

Sitz der Gesellschaft: Langenargen
Registergericht: ULM, HRB 734260
USt-Id.: DE232931635, WEEE-Id.: DE74015979
Vorstand: thomasThomas Ehrle (Vorsitz), Fritz R. Paul (Stellvertreter), Tobias Treß
Aufsichtsrat: Jürgen Maucher (Vorsitz), Andreas Paul (Stellvertreter), Martin Sauter
Le Cyberian
2017-03-06 13:59:29 UTC
Permalink
Hi Hans,

Thank you for your reply.

Its basically two different server rooms on different floors and they are
connected with fiber connectivity so its almost like a local connection
between them no network latencies / lag.

If i do a Mirror Maker / Replicator then i will not be able to use them at
the same time for writes./ producers. because the consumers / producers
will request from all of them.

I am confused somehow :-/ what to do in this case

BR,

Lee
Post by Hans Jespersen
What do you mean when you say you have "2 sites not datacenters"? You
should be very careful configuring a stretch cluster across multiple sites.
What is the RTT between the two sites? Why do you think that MIrror Maker
(or Confluent Replicator) would not work between the sites and yet you
think a stretch cluster will work? That seems wrong.
-hans
/**
* Hans Jespersen, Principal Systems Engineer, Confluent Inc.
*/
Post by Le Cyberian
Hi Guys,
Thank you very much for you reply.
The scenario which i have to implement is that i have 2 sites not
datacenters so mirror maker would not work here.
There will be 4 nodes in total, like 2 in Site A and 2 in Site B. The
idea
Post by Le Cyberian
is to have Active-Active setup along with fault tolerance so that if one
of
Post by Le Cyberian
the site goes on the operations are normal.
In this case if i go ahead with 4 node-cluster of both zookeeper and
kafka
Post by Le Cyberian
it will give failover tolerance for 1 node only.
What do you suggest to do in this case ? because to divide between 2
sites
Post by Le Cyberian
it needs to be even number if that makes sense ? Also if possible some
help
Post by Le Cyberian
regarding partitions for topic and replication factor.
I already have Kafka running with quiet few topics having replication
factor 1 along with 1 default partition, is there a way to repartition /
increase partition of existing topics when i migrate to above setup ? I
think we can increase replication factor by Kafka rebalance tool.
Thanks alot for your help and time looking into this.
BR,
Le
Post by Hans Jespersen
Jens,
I think you are correct that a 4 node zookeeper ensemble can be made to
work but it will be slightly less resilient than a 3 node ensemble
because
Post by Hans Jespersen
it can only tolerate 1 failure (same as a 3 node ensemble) and the
likelihood of node failures is higher because there is 1 more node that
could fail.
So it SHOULD be an odd number of zookeeper nodes (not MUST).
-hans
Post by Jens Rantil
Hi Hans,
Post by Hans Jespersen
A 4 node zookeeper ensemble will not even work. It MUST be an odd
number
Post by Hans Jespersen
Post by Jens Rantil
Post by Hans Jespersen
of zookeeper nodes to start.
Are you sure about that? If Zookeer doesn't run with four nodes, that
means
Post by Jens Rantil
a running ensemble of three can't be live-migrated to other nodes
(because
Post by Jens Rantil
that's done by increasing the ensemble and then reducing it in the
case
Post by Le Cyberian
Post by Hans Jespersen
of
Post by Jens Rantil
3-node ensembles). IIRC, you can run four Zookeeper nodes, but that
means
Post by Hans Jespersen
Post by Jens Rantil
quorum will be three nodes, so there's no added benefit in terms of
availability since you can only loose one node just like with a three
node
Post by Jens Rantil
cluster.
Cheers,
Jens
--
Jens Rantil
Backend engineer
Tink AB
Phone: +46 708 84 18 32
Web: www.tink.se
Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_
companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%
2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
Post by Jens Rantil
Twitter <https://twitter.com/tink>
Loading...