This tutorial will provide you with the instructions for setting up pseduo-distributed multi-broker cluster of Apache Kafka.
Apache Kafka is an open source, distributed, high-throughput publish-subscribe messaging system. It is often leveraged in real-time stream processing systems. Apache Kafka can be deployed into following two schemes -
- Pseduo distributed multi-broker cluster - All Kafka brokers of a cluster are deployed on a single machine.
- Fully distributed multi-broker cluster - Each of Kafka brokers of a cluster is deployed on a separate machine.
We will be providing the instructions for setting pseudo distributed multi-broker cluster in this tutorial.
Here are the software and hardware requirement to follow the instructions in this tutorial -
- Physical or Virtual Machine ideally with 4 GB RAM, 2 CPU cores and 20 GB disk space
- Linux operating system as Apache Kafka does not officially support Windows as yet
- JDK 8 with JAVA_HOME pointing to it
Installing Apache Kafka
Installing Apache Kafka is as simple as downloading its binaries and extracting those to your file system.
You can download the latest version of Apache Kafka from offical website. You would see multiple binary downloads for different scala versions (2.10 and 2.11). If you are going to use Scala APIs, download the one with your scala version. In case of Java APIs, you can just download any of these.
At the time of writing this tutorial, latest version is 0.10.0.1 so we will be installing this version.
Once you have downloaded binary file, extract it to a directory where you would like it to execute from. In my case, i have extracted it to /opt/big-data/kafka/kafka_2.11-0.10.0.1 path.
Configuring Apache ZooKeeper
We first need to check configuration for Apache ZooKeeper. If you are using a separate ZooKeeper cluster, please skip this and next step related to ZooKeeper.
You can check ZooKeeper configuration by executing following command from Kafka home directory (for me - /opt/big-data/kafka/kafka_2.11-0.10.0.1) -
And check for following properties -
- dataDir - It should point to a directory where you want ZooKeeper to save its data
- clientPort - Defaults to 2181. Leave it as it is.
Here are how these properties have been configured in my case -
dataDir=/tmp/zookeeper # the port at which the clients will connect clientPort=2181
Starting up Apache ZooKeeper
We can now start our ZooKeeper by running following command from Kafka home directory -
./bin/zookeeper-server-start.sh -daemon config/zookeeper.properties
You can then use below command to verify if it has started -
You will see a process called QuorumPeerMain if ZooKeeper has started successfully -
Configuring Apache Kafka
We will be creating a cluster of two Kafka instances (brokers) on our machine and hence will prepare two configuration files. We will be mainly focussing on following properties -
- broker.id - Id of the broker i.e. an integer. Each broker in a cluster needs to have a unique id.
- log.dirs - Directory where you want Kafka to commit its message. Not to be confused it with usual log files.
- port - Port on which Kafka will accept connections from producers and consumers
- zookeeper.connect - Comma separate list of ZooKeeper nodes. E.g. hostname1:port1,hostname2:port2. In our case, we will set it to localhost:2181
One property file is already present in config directory of our Kafka installation and can be edited using below command from Kafka home directory -
Here are the values of concerned properties set in my broker -
broker.id=0 # The port the socket server listens on port=9092 # A comma seperated list of directories under which to store log files log.dirs=/tmp/kafka-logs-0 zookeeper.connect=localhost:2181
Next step is to create a new property file by making a copy of existing one as follows -
cp config/server.properties config/server-1.properties
We will now update our server-1.properties by updating properties as follows -
broker.id=1 # The port the socket server listens on port=9093 # A comma seperated list of directories under which to store log files log.dirs=/tmp/kafka-logs-1 zookeeper.connect=localhost:2181
Starting up Apache Kafka Cluster
Finally, it's now time to start our Apache Kafka brokers. Once we have configured two properties files,starting Apache Kafka is as simple as executing following commands -
# start first broker ./bin/kafka-server-start.sh -daemon config/server.properties #start second broker ./bin/kafka-server-start.sh -daemon config/server-1.properties
You can use jps command and check whether two Kafka processes are running as follows -
jps # sample output - notice, two instances of Kafka 13100 Kafka 13089 Kafka 13155 Jps 10956 QuorumPeerMain
Testing Apache Kafka Cluster
We have successfully started two Kafka brokers on our machine and checked that two processes are running. However, we still need to check that cluster is functioning properly. In order to do that we will use Kafka provided utilities to create a topic, sending message and consuming messages.
We will start with creating a test-topic using below command -
./bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic test-topic --partitions 1 --replication-factor 1 # you will get below message if it is created successfully Created topic "test-topic".
We will now send couple of sample messages to this newly created topic using Kafka console producer utility. After executing below command, type your message and press enter to send it through.
./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic # type your message and press enter First test message Second test message
Next step is to consume these mesasges from Kafka console consumer utility using below command. Message that we sent using producer will be printed on console after successful execution of command -
./bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test-topic --from-beginning # below should be output of this command First test message Second test message
We can hence conclude that our Apache Kafka cluster is ready for our applications.
Thank you for reading through the tutorial. In case of any feedback/questions/concerns, you can communicate same to us through your comments and we shall get back to you as soon as possible.