Topics on advanced use for producers.
To eliminate the possibility of duplicate messages, you cam set enable.idempotence
to true
and the consumer will delete duplicate messages.
There are over 50 producer configs. Kafka docs indicates whether certain configurations are high importance or low.
Idempotence is listed as low importance due to its effect on efficiency. However, you can use this to ensure your messages arrive in their entirety if safety is a concern.
Sending a batch as opposed to individual messages is very important in Kafka. Sending larger batches with compression will significantly improve throughput.
When multiple records are sent to the same partition, the producer can batch them together. The batch.size
config controls the amount of memory used for each batch. The linger.ms
config will help increase the batch size to get the best compression and throughput.
Custom serialization can be used to translate your data in a format that can be stored, transmitted and retrieved.
The producer serializers, while the consumer deserializers.
If producer is producing messages faster than the broker can receive those messages, the messages will be sent to a bugger memory on the producer.
Topics on advanced use for consumers.
// enable.auto.commit=true while (true) { List<Records> batch = consumer.poll(Duration.ofMillis(100)); doSomethingSynchronous(batch); } // enable.auto.commit=false while (true) { batch += consumer.poll(Duration.ofMillis(100)); if (isReady(batch)) { doSomethingSynchronous(batch); consumer.commitSync(); } }
When a consumer updates its current position in the partition, it's called a commit
. A consumer produces a message to the __consumer_offsets
topic. If the consumer crashes, it triggers a rebalance and the consumer may be assigned a new set of partitions.
Types for a topic with 8 partitions could be Round Robin
or Range
.
Moving partition ownership from one consumer to another is called a rebalance. This allows us to easily and safely add and remove consumers. Outside of adding or removing consumers, we don't want to to rebalance.
One of the brokers in the Kafka cluster is established as a consumer group coordinator. That broker continuously reaches out to all consumers and checks if they have a heartbeat.
This broker is also responsible for making the appropriate adjustments when a consumer fails or a new consumer joins the group.
Example could be that the first consumer reads messages on brokers 1 & 3 while another consumer reads from broker 2. Broker 3 coordinates all of this.
Topics for Kakfa Advanced Topics:
For an example topic of helloworld
that has 3 partitions and 3 brokers listening to each partition respectively in the Kafka Cluster, there are some design considerations:
There are also topic options that come in the form of command line argument flags.
Log Segment
vs Compacted Topic
.