Apache Kafka Cheat Sheet

Introduction

Apache Kafka is a distributed streaming platform used to build real-time data pipelines and streaming applications. It is designed to handle a high throughput of data streams, providing features like fault tolerance, scalability, and durability. Kafka is widely used in industries for real-time analytics, monitoring, and event-driven architectures. This cheat sheet provides a quick reference to the most commonly used Kafka concepts and commands.

Apache Kafka Concepts

Concept Description
Topic A category or feed name to which records are published. Topics are partitioned and can have multiple consumers.
Partition A division of a topic, allowing data to be distributed across multiple brokers. Each partition is ordered and immutable.
Broker A Kafka server that stores data and serves clients. Multiple brokers form a Kafka cluster.
Producer An application that writes records to a Kafka topic.
Consumer An application that reads records from a Kafka topic. Consumers can be part of a consumer group.
Consumer Group A group of consumers that work together to consume data from a topic. Each partition is consumed by one consumer within a group.
Zookeeper Manages and coordinates the Kafka brokers. It handles leader election, configuration management, and more.
Offset A unique identifier for a record within a partition. The offset keeps track of which records have been consumed.
Replication The process of duplicating partitions across multiple brokers to ensure data availability and fault tolerance.
Retention The duration for which Kafka retains records in a topic before they are deleted.

Apache Kafka Commands Cheat Sheet

Command Description
kafka-topics.sh --create Creates a new topic with specified configurations.
kafka-topics.sh --list Lists all the topics available in the Kafka cluster.
kafka-topics.sh --describe Describes the details of a specific topic, including partition and replica details.
kafka-topics.sh --delete Deletes a topic from the Kafka cluster.
kafka-console-producer.sh --topic <topic-name> Sends data to a Kafka topic using the console producer.
kafka-console-consumer.sh --topic <topic-name> Reads data from a Kafka topic using the console consumer.
kafka-console-consumer.sh --from-beginning Reads all data from the beginning of the topic using the console consumer.
kafka-console-consumer.sh --bootstrap-server <server> Specifies the Kafka broker to connect to when using the console consumer.
kafka-consumer-groups.sh --list Lists all consumer groups available in the Kafka cluster.
kafka-consumer-groups.sh --describe --group <group-name> Describes the details of a specific consumer group, including offsets and lag.
kafka-consumer-groups.sh --reset-offsets Resets the offsets of a consumer group to a specific point (e.g., beginning, end, or a specific offset).
kafka-replica-verification.sh --verify Verifies that all replicas in the cluster are in sync with their leaders.
kafka-acls.sh --add Adds access control lists (ACLs) for users to restrict or allow access to Kafka resources.
kafka-configs.sh --alter Alters the configuration of a Kafka topic, broker, or client.
kafka-configs.sh --describe Describes the current configuration of a Kafka topic, broker, or client.
kafka-configs.sh --bootstrap-server <server> Specifies the Kafka broker to connect to when configuring Kafka resources.

Explanation and Examples of Apache Kafka Commands

kafka-topics.sh --create

Description: Creates a new topic with specified configurations such as partition count, replication factor, and more.
Example:

kafka-topics.sh --create --topic my-topic --partitions 3 --replication-factor 2 --bootstrap-server localhost:9092

Explanation: This command creates a topic named my-topic with 3 partitions and a replication factor of 2.

kafka-console-producer.sh --topic <topic-name>

Description: Sends data to a Kafka topic using the console producer.
Example:

kafka-console-producer.sh --topic my-topic --bootstrap-server localhost:9092

Explanation: This command starts a console producer that sends messages to the topic my-topic.

kafka-console-consumer.sh --topic <topic-name> --from-beginning

Description: Reads all data from the beginning of the topic using the console consumer.
Example:

kafka-console-consumer.sh --topic my-topic --from-beginning --bootstrap-server localhost:9092

Explanation: This command starts a console consumer that reads all messages from the topic my-topic from the beginning.

kafka-consumer-groups.sh --describe --group <group-name>

Description: Describes the details of a specific consumer group, including offsets and lag.
Example:

kafka-consumer-groups.sh --describe --group my-group --bootstrap-server localhost:9092

Explanation: This command provides detailed information about the consumer group my-group.

kafka-configs.sh --alter

Description: Alters the configuration of a Kafka topic, broker, or client.
Example:

kafka-configs.sh --alter --entity-type topics --entity-name my-topic --add-config retention.ms=604800000 --bootstrap-server localhost:9092

Explanation: This command sets the retention period of the topic my-topic to 7 days (604800000 milliseconds).

Conclusion

Apache Kafka is a robust and versatile platform for building real-time streaming data pipelines and applications. This cheat sheet provides a quick reference to the key concepts and commands in Kafka, helping you manage and operate Kafka clusters more effectively. Keep this guide handy as you work with Kafka to ensure smooth and efficient data streaming. Happy coding!

Comments