Discussion:
Does kafka send the acks response to the producer after flush the messages to the disk or just keep them in the memory
Jiecxy
2017-02-26 09:39:55 UTC
Permalink
Hi guys,

Does kafka send the acks response to the producer after flush the messages to the disk or just keep them in the memory?
How does Kafka flush the messages? By calling the system call, like fsync()?

Thanks
Chen
Guozhang Wang
2017-02-26 18:52:41 UTC
Permalink
Hello Chen,

Kafka flushes data to disk (i.e. fsync) asynchronously. Based on the
ack.mode it will return the response of the produce request to producer
after it has been replicated (likely in memory) on N partition replicas.


Guozhang
Post by Jiecxy
Hi guys,
Does kafka send the acks response to the producer after flush the
messages to the disk or just keep them in the memory?
How does Kafka flush the messages? By calling the system call, like fsync()?
Thanks
Chen
--
-- Guozhang
Jiecxy
2017-03-01 01:54:42 UTC
Permalink
Hi Guozhang,

Thanks for your reply. I’m still confused. I checked the source code.Kafka just uses the class FileChannel.write(buffer) to write the data on the broker, it puts the data in the memory instead of disk. Only you set the FileChannel.force(true), it will flush the data to the disk. I understand the ack mode, but it just requires the isr to receive the data then return the response, in fact, it doesn’t require them to flush the data to the disk. Am I right? Anything wrong?
So my question is, since the flush appears only after called explicitly (or relying on the os), so it is still possible for kafka to loss data (like broker down before the data are flushed to the disk)?

Thanks
Chen
Post by Guozhang Wang
Hello Chen,
Kafka flushes data to disk (i.e. fsync) asynchronously. Based on the
ack.mode it will return the response of the produce request to producer
after it has been replicated (likely in memory) on N partition replicas.
Guozhang
Post by Jiecxy
Hi guys,
Does kafka send the acks response to the producer after flush the
messages to the disk or just keep them in the memory?
How does Kafka flush the messages? By calling the system call, like fsync()?
Thanks
Chen
--
-- Guozhang
Guozhang Wang
2017-03-01 07:10:39 UTC
Permalink
You are right. If all replicas happen to fail at the same time, then even
with ack=everyone your acked sent messages may still be lost. As Kafka
replication documents stated, with N replicas it can tolerate N-1
concurrent failures, hence if you are really unlucky and get N concurrent
failures then replication will not be prevent data loss.

Guozhang
Post by Jiecxy
Hi Guozhang,
Thanks for your reply. I’m still confused. I checked the source code.Kafka
just uses the class FileChannel.write(buffer) to write the data on the
broker, it puts the data in the memory instead of disk. Only you set the
FileChannel.force(true), it will flush the data to the disk. I understand
the ack mode, but it just requires the isr to receive the data then return
the response, in fact, it doesn’t require them to flush the data to the
disk. Am I right? Anything wrong?
So my question is, since the flush appears only after called explicitly
(or relying on the os), so it is still possible for kafka to loss data
(like broker down before the data are flushed to the disk)?
Thanks
Chen
Post by Guozhang Wang
Hello Chen,
Kafka flushes data to disk (i.e. fsync) asynchronously. Based on the
ack.mode it will return the response of the produce request to producer
after it has been replicated (likely in memory) on N partition replicas.
Guozhang
Post by Jiecxy
Hi guys,
Does kafka send the acks response to the producer after flush the
messages to the disk or just keep them in the memory?
How does Kafka flush the messages? By calling the system call, like fsync()?
Thanks
Chen
--
-- Guozhang
--
-- Guozhang
Loading...