Apache Kafka Cheat Sheet

Introduction

Apache Kafka is a distributed streaming platform used to build real-time data pipelines and streaming applications. It is designed to handle a high throughput of data streams, providing features like fault tolerance, scalability, and durability. Kafka is widely used in industries for real-time analytics, monitoring, and event-driven architectures. This cheat sheet provides a quick reference to the most commonly used Kafka concepts and commands.

Apache Kafka Concepts

Concept	Description
Topic	A category or feed name to which records are published. Topics are partitioned and can have multiple consumers.
Partition	A division of a topic, allowing data to be distributed across multiple brokers. Each partition is ordered and immutable.
Broker	A Kafka server that stores data and serves clients. Multiple brokers form a Kafka cluster.
Producer	An application that writes records to a Kafka topic.
Consumer	An application that reads records from a Kafka topic. Consumers can be part of a consumer group.
Consumer Group	A group of consumers that work together to consume data from a topic. Each partition is consumed by one consumer within a group.
Zookeeper	Manages and coordinates the Kafka brokers. It handles leader election, configuration management, and more.
Offset	A unique identifier for a record within a partition. The offset keeps track of which records have been consumed.
Replication	The process of duplicating partitions across multiple brokers to ensure data availability and fault tolerance.
Retention	The duration for which Kafka retains records in a topic before they are deleted.

Apache Kafka Commands Cheat Sheet

Command	Description
`kafka-topics.sh --create`	Creates a new topic with specified configurations.
`kafka-topics.sh --list`	Lists all the topics available in the Kafka cluster.
`kafka-topics.sh --describe`	Describes the details of a specific topic, including partition and replica details.
`kafka-topics.sh --delete`	Deletes a topic from the Kafka cluster.
`kafka-console-producer.sh --topic <topic-name>`	Sends data to a Kafka topic using the console producer.
`kafka-console-consumer.sh --topic <topic-name>`	Reads data from a Kafka topic using the console consumer.
`kafka-console-consumer.sh --from-beginning`	Reads all data from the beginning of the topic using the console consumer.
`kafka-console-consumer.sh --bootstrap-server <server>`	Specifies the Kafka broker to connect to when using the console consumer.
`kafka-consumer-groups.sh --list`	Lists all consumer groups available in the Kafka cluster.
`kafka-consumer-groups.sh --describe --group <group-name>`	Describes the details of a specific consumer group, including offsets and lag.
`kafka-consumer-groups.sh --reset-offsets`	Resets the offsets of a consumer group to a specific point (e.g., beginning, end, or a specific offset).
`kafka-replica-verification.sh --verify`	Verifies that all replicas in the cluster are in sync with their leaders.
`kafka-acls.sh --add`	Adds access control lists (ACLs) for users to restrict or allow access to Kafka resources.
`kafka-configs.sh --alter`	Alters the configuration of a Kafka topic, broker, or client.
`kafka-configs.sh --describe`	Describes the current configuration of a Kafka topic, broker, or client.
`kafka-configs.sh --bootstrap-server <server>`	Specifies the Kafka broker to connect to when configuring Kafka resources.

Explanation and Examples of Apache Kafka Commands

kafka-topics.sh --create

Description: Creates a new topic with specified configurations such as partition count, replication factor, and more.
Example:

kafka-topics.sh --create --topic my-topic --partitions 3 --replication-factor 2 --bootstrap-server localhost:9092

Explanation: This command creates a topic named my-topic with 3 partitions and a replication factor of 2.

kafka-console-producer.sh --topic <topic-name>

Description: Sends data to a Kafka topic using the console producer.
Example:

kafka-console-producer.sh --topic my-topic --bootstrap-server localhost:9092

Explanation: This command starts a console producer that sends messages to the topic my-topic.

kafka-console-consumer.sh --topic <topic-name> --from-beginning

Description: Reads all data from the beginning of the topic using the console consumer.
Example:

kafka-console-consumer.sh --topic my-topic --from-beginning --bootstrap-server localhost:9092

Explanation: This command starts a console consumer that reads all messages from the topic my-topic from the beginning.

kafka-consumer-groups.sh --describe --group <group-name>

Description: Describes the details of a specific consumer group, including offsets and lag.
Example:

kafka-consumer-groups.sh --describe --group my-group --bootstrap-server localhost:9092

Explanation: This command provides detailed information about the consumer group my-group.

kafka-configs.sh --alter

Description: Alters the configuration of a Kafka topic, broker, or client.
Example:

kafka-configs.sh --alter --entity-type topics --entity-name my-topic --add-config retention.ms=604800000 --bootstrap-server localhost:9092

Explanation: This command sets the retention period of the topic my-topic to 7 days (604800000 milliseconds).

Conclusion

Apache Kafka is a robust and versatile platform for building real-time streaming data pipelines and applications. This cheat sheet provides a quick reference to the key concepts and commands in Kafka, helping you manage and operate Kafka clusters more effectively. Keep this guide handy as you work with Kafka to ensure smooth and efficient data streaming. Happy coding!

Java Guides

Search This Blog