[Tech Simplified] Kafka
Introduction to Kafka
--
Kafka is a distributed system consisting of servers and clients that communicate via TCP network protocol.
- To publish (write) and subscribe to (read) streams of events from sources like databases, sensors, mobile devices, cloud services, and software applications
- Storing these event streams durably for a defined period
- Processing event streams in real-time as well as retrospectively
- Routing the event streams to different destination technologies as needed.
Terminology
Event/Message:
- A single record that is being transmitted in Kafka.
Producer:
- Sources that produces Events or Messages to Kafka
Consumers:
- Those that subscribe and React to the Messages.
Consumer groups:
- Several consumers are grouped to consume a given topic. Consumers in the same consumer group are assigned the same group-id value. This concept ensures that a message is only read by a single consumer in the group.
Topics:
- Messages are categorised and stored in topics.
- Think of each of them as a category.
- A topic can contain messages produced by different producers and also consumed by different consumer groups.
- Topics are partitioned, which means they are located in different kafka brokers.
Visualisation
Imagine Kafka as a distributed queue.
Kafka is made up of multiple queues and each queue is located in a different server. Each queue represents a partition and contains a subset of messages owned by a single topic.
A scenario
Imagine we have many ongoing Basketball matches at the same time and we want to publish their results to a dashboard for fans to view the scores of the matches in real time.