Kafka cluster is a hub that you can plug into that can process millions of messages at a time.
It is a publish/subscribe mechanisms. The nodes of the Kafka cluster are known as brokers. Kafka can also be used for tracking, stream processing, decoupling of components of the databases etc.
Term | Definition |
---|---|
Producer | Sends messages to Kafka |
Consumer | Retrieves messages from Kafka |
Stream Processor | Producing messages into output streams |
Connector | Connecting topics to existing applications |
Message delivery can take at least one of the following methods:
Kafka can process millions of messages per second. Different consumers may access the same message. This allows you to move away from batch processing and scale infinitely. Another benefit is being able to utilize the data in real time versus keeping it sitting on spinning disk.
Messages in Kafka are called Topics
. Topics are divided into partitions, with each message receiving an incremental ID called the offset.
When a message is written:
A producer is a client that writes data to the cluster in order to eventually get consumed.
Consumer is the application that consumes or reads the messages. The consumer subscribes to one or more topics and reads the messages in the order in which they were produced.
A broker
receives messages from the producer, assigns offsets, and stores the messages on disk.
Brokers are designed to operate in a cluster, in which one broker is assigned the controller
.
Brokers will also replicate data across brokers in order to create fault tolerance
.