Discussion:
Using GlobalKTable/KeyValueStore for topic cache
Chris Toomey
2018-11-13 23:17:34 UTC
Permalink
We're considering using GlobalKTables / KeyValueStores for locally caching
topic content in services. The topics would be compacted such that only the
latest key/value pair would exist for a given key.

One question that's come up is how to determine, when bootstrapping the
app, when the cache has been populated with the latest content from the
topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
an approximateNumEntries() method that we could use to see how much we've
got, but trying to figure out how much there is in the topic looks much
more difficult -- the only way I can see via the APIs / code is to use an
AdminClient to get the topic partitions and then the KafkaConsumer to get
the end offsets for those.

Does anyone have experience doing this kind of caching? How did you handle
the bootstrapping issue?

Any thoughts on easier or better ways to determine when the cache is warm?

thx,
Chris
Ryanne Dolan
2018-11-14 00:54:10 UTC
Permalink
Chris, consider using log compaction.

Ryanne
Post by Chris Toomey
We're considering using GlobalKTables / KeyValueStores for locally caching
topic content in services. The topics would be compacted such that only the
latest key/value pair would exist for a given key.
One question that's come up is how to determine, when bootstrapping the
app, when the cache has been populated with the latest content from the
topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
an approximateNumEntries() method that we could use to see how much we've
got, but trying to figure out how much there is in the topic looks much
more difficult -- the only way I can see via the APIs / code is to use an
AdminClient to get the topic partitions and then the KafkaConsumer to get
the end offsets for those.
Does anyone have experience doing this kind of caching? How did you handle
the bootstrapping issue?
Any thoughts on easier or better ways to determine when the cache is warm?
thx,
Chris
Chris Toomey
2018-11-14 01:49:20 UTC
Permalink
Definitely Ryanne -- that's what I meant by "topics would be compacted".

But that doesn't obviate checking bootstrapping progress.
Post by Ryanne Dolan
Chris, consider using log compaction.
Ryanne
Post by Chris Toomey
We're considering using GlobalKTables / KeyValueStores for locally
caching
Post by Chris Toomey
topic content in services. The topics would be compacted such that only
the
Post by Chris Toomey
latest key/value pair would exist for a given key.
One question that's come up is how to determine, when bootstrapping the
app, when the cache has been populated with the latest content from the
topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
an approximateNumEntries() method that we could use to see how much we've
got, but trying to figure out how much there is in the topic looks much
more difficult -- the only way I can see via the APIs / code is to use an
AdminClient to get the topic partitions and then the KafkaConsumer to get
the end offsets for those.
Does anyone have experience doing this kind of caching? How did you
handle
Post by Chris Toomey
the bootstrapping issue?
Any thoughts on easier or better ways to determine when the cache is
warm?
Post by Chris Toomey
thx,
Chris
Bill Bejeck
2018-11-14 02:30:26 UTC
Permalink
Hi Chris,

I'm not sure I totally understand your requirements but the
StateRestoreListener (
https://kafka.apache.org/20/javadoc/org/apache/kafka/streams/processor/StateRestoreListener.html)
class provides callbacks when restoring state stores (including
GlobalStores) and may provide what you are looking for.

Thanks,
Bill
Post by Chris Toomey
Definitely Ryanne -- that's what I meant by "topics would be compacted".
But that doesn't obviate checking bootstrapping progress.
Post by Ryanne Dolan
Chris, consider using log compaction.
Ryanne
Post by Chris Toomey
We're considering using GlobalKTables / KeyValueStores for locally
caching
Post by Chris Toomey
topic content in services. The topics would be compacted such that only
the
Post by Chris Toomey
latest key/value pair would exist for a given key.
One question that's come up is how to determine, when bootstrapping the
app, when the cache has been populated with the latest content from the
topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
an approximateNumEntries() method that we could use to see how much
we've
Post by Ryanne Dolan
Post by Chris Toomey
got, but trying to figure out how much there is in the topic looks much
more difficult -- the only way I can see via the APIs / code is to use
an
Post by Ryanne Dolan
Post by Chris Toomey
AdminClient to get the topic partitions and then the KafkaConsumer to
get
Post by Ryanne Dolan
Post by Chris Toomey
the end offsets for those.
Does anyone have experience doing this kind of caching? How did you
handle
Post by Chris Toomey
the bootstrapping issue?
Any thoughts on easier or better ways to determine when the cache is
warm?
Post by Chris Toomey
thx,
Chris
Chris Toomey
2018-11-14 03:36:31 UTC
Permalink
Thanks Bill. So poking around the code a bit, it looks like perhaps any
kafka streams execution that produces a state store would execute this
"restore state store" operation, even creating a new state store, is that
correct? If so that indeed could be just what I need.
Post by Bill Bejeck
Hi Chris,
I'm not sure I totally understand your requirements but the
StateRestoreListener (
https://kafka.apache.org/20/javadoc/org/apache/kafka/streams/processor/StateRestoreListener.html
)
class provides callbacks when restoring state stores (including
GlobalStores) and may provide what you are looking for.
Thanks,
Bill
Post by Chris Toomey
Definitely Ryanne -- that's what I meant by "topics would be compacted".
But that doesn't obviate checking bootstrapping progress.
Post by Ryanne Dolan
Chris, consider using log compaction.
Ryanne
Post by Chris Toomey
We're considering using GlobalKTables / KeyValueStores for locally
caching
Post by Chris Toomey
topic content in services. The topics would be compacted such that
only
Post by Chris Toomey
Post by Ryanne Dolan
the
Post by Chris Toomey
latest key/value pair would exist for a given key.
One question that's come up is how to determine, when bootstrapping
the
Post by Chris Toomey
Post by Ryanne Dolan
Post by Chris Toomey
app, when the cache has been populated with the latest content from
the
Post by Chris Toomey
Post by Ryanne Dolan
Post by Chris Toomey
topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
an approximateNumEntries() method that we could use to see how much
we've
Post by Ryanne Dolan
Post by Chris Toomey
got, but trying to figure out how much there is in the topic looks
much
Post by Chris Toomey
Post by Ryanne Dolan
Post by Chris Toomey
more difficult -- the only way I can see via the APIs / code is to
use
Post by Chris Toomey
an
Post by Ryanne Dolan
Post by Chris Toomey
AdminClient to get the topic partitions and then the KafkaConsumer to
get
Post by Ryanne Dolan
Post by Chris Toomey
the end offsets for those.
Does anyone have experience doing this kind of caching? How did you
handle
Post by Chris Toomey
the bootstrapping issue?
Any thoughts on easier or better ways to determine when the cache is
warm?
Post by Chris Toomey
thx,
Chris
Bill Bejeck
2018-11-14 03:50:50 UTC
Permalink
Hi Chris

Yes, for every state store in a kafka streams application, the state
restore listener is executed.

-Bill
Post by Chris Toomey
Thanks Bill. So poking around the code a bit, it looks like perhaps any
kafka streams execution that produces a state store would execute this
"restore state store" operation, even creating a new state store, is that
correct? If so that indeed could be just what I need.
Post by Bill Bejeck
Hi Chris,
I'm not sure I totally understand your requirements but the
StateRestoreListener (
https://kafka.apache.org/20/javadoc/org/apache/kafka/streams/processor/StateRestoreListener.html
Post by Bill Bejeck
)
class provides callbacks when restoring state stores (including
GlobalStores) and may provide what you are looking for.
Thanks,
Bill
Post by Chris Toomey
Definitely Ryanne -- that's what I meant by "topics would be
compacted".
Post by Bill Bejeck
Post by Chris Toomey
But that doesn't obviate checking bootstrapping progress.
Post by Ryanne Dolan
Chris, consider using log compaction.
Ryanne
Post by Chris Toomey
We're considering using GlobalKTables / KeyValueStores for locally
caching
Post by Chris Toomey
topic content in services. The topics would be compacted such that
only
Post by Chris Toomey
Post by Ryanne Dolan
the
Post by Chris Toomey
latest key/value pair would exist for a given key.
One question that's come up is how to determine, when bootstrapping
the
Post by Chris Toomey
Post by Ryanne Dolan
Post by Chris Toomey
app, when the cache has been populated with the latest content from
the
Post by Chris Toomey
Post by Ryanne Dolan
Post by Chris Toomey
topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
an approximateNumEntries() method that we could use to see how much
we've
Post by Ryanne Dolan
Post by Chris Toomey
got, but trying to figure out how much there is in the topic looks
much
Post by Chris Toomey
Post by Ryanne Dolan
Post by Chris Toomey
more difficult -- the only way I can see via the APIs / code is to
use
Post by Chris Toomey
an
Post by Ryanne Dolan
Post by Chris Toomey
AdminClient to get the topic partitions and then the KafkaConsumer
to
Post by Bill Bejeck
Post by Chris Toomey
get
Post by Ryanne Dolan
Post by Chris Toomey
the end offsets for those.
Does anyone have experience doing this kind of caching? How did you
handle
Post by Chris Toomey
the bootstrapping issue?
Any thoughts on easier or better ways to determine when the cache
is
Post by Bill Bejeck
Post by Chris Toomey
Post by Ryanne Dolan
warm?
Post by Chris Toomey
thx,
Chris
Patrik Kleindl
2018-11-14 06:40:23 UTC
Permalink
Hi Chris

We are using them like you described.
Performance is very good compared to the database used before.
Beware that until https://issues.apache.org/jira/browse/KAFKA-7380
is done the startup will be blocked until all global stores are restored (sequentially).
This can take a little for larger topic and/or multiple global stores.

We are blocking access until they are available although this is not ideal in terms of timeout tuning.

Any ideas are welcome.

Best regards

Patrik
Post by Chris Toomey
We're considering using GlobalKTables / KeyValueStores for locally caching
topic content in services. The topics would be compacted such that only the
latest key/value pair would exist for a given key.
One question that's come up is how to determine, when bootstrapping the
app, when the cache has been populated with the latest content from the
topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
an approximateNumEntries() method that we could use to see how much we've
got, but trying to figure out how much there is in the topic looks much
more difficult -- the only way I can see via the APIs / code is to use an
AdminClient to get the topic partitions and then the KafkaConsumer to get
the end offsets for those.
Does anyone have experience doing this kind of caching? How did you handle
the bootstrapping issue?
Any thoughts on easier or better ways to determine when the cache is warm?
thx,
Chris
Chris Toomey
2018-11-14 19:51:39 UTC
Permalink
Thanks Bill, the StateRestoreListener is exactly the tool needed for my use
case.

Patrik, thanks for the heads-up on that issue. I guess until it's fixed
that makes it even easier to wait until the cache is warmed :-).

Chris
Post by Bill Bejeck
Hi Chris
We are using them like you described.
Performance is very good compared to the database used before.
Beware that until https://issues.apache.org/jira/browse/KAFKA-7380
is done the startup will be blocked until all global stores are restored (sequentially).
This can take a little for larger topic and/or multiple global stores.
We are blocking access until they are available although this is not ideal
in terms of timeout tuning.
Any ideas are welcome.
Best regards
Patrik
Post by Chris Toomey
We're considering using GlobalKTables / KeyValueStores for locally
caching
Post by Chris Toomey
topic content in services. The topics would be compacted such that only
the
Post by Chris Toomey
latest key/value pair would exist for a given key.
One question that's come up is how to determine, when bootstrapping the
app, when the cache has been populated with the latest content from the
topic (so we start with a "warm" cache). ReadOnlyKeyValueStore has
an approximateNumEntries() method that we could use to see how much we've
got, but trying to figure out how much there is in the topic looks much
more difficult -- the only way I can see via the APIs / code is to use an
AdminClient to get the topic partitions and then the KafkaConsumer to get
the end offsets for those.
Does anyone have experience doing this kind of caching? How did you
handle
Post by Chris Toomey
the bootstrapping issue?
Any thoughts on easier or better ways to determine when the cache is
warm?
Post by Chris Toomey
thx,
Chris
Loading...