Recent Tutorials and Articles
    Setting up Pseudo Distributed Kafka Cluster
    Published on: 14th September 2016
    Posted By: Amit Kumar

    This tutorial will provide you with the instructions for setting up pseduo-distributed multi-broker cluster of Apache Kafka.

    Abstract


    Apache Kafka is an open source, distributed, high-throughput publish-subscribe messaging system. It is often leveraged in real-time stream processing systems. Apache Kafka can be deployed into following two schemes - 

    1. Pseduo distributed multi-broker cluster - All Kafka brokers of a cluster are deployed on a single machine.
    2. Fully distributed multi-broker cluster - Each of Kafka brokers of a cluster is deployed on a separate machine.

    We will be providing the instructions for setting pseudo distributed multi-broker cluster in this tutorial.

     

    Pre-requisites


    Here are the software and hardware requirement to follow the instructions in this tutorial -

    1. Physical or Virtual Machine ideally with 4 GB RAM, 2 CPU cores and 20 GB disk space
    2. Linux operating system as Apache Kafka does not officially support Windows as yet
    3. JDK 8 with JAVA_HOME pointing to it

     

    Installing Apache Kafka


    Installing Apache Kafka is as simple as downloading its binaries and extracting those to your file system.

    You can download the latest version of Apache Kafka from offical website. You would see multiple binary downloads for different scala versions (2.10 and 2.11). If you are going to use Scala APIs, download the one with your scala version. In case of Java APIs, you can just download any of these.

    At the time of writing this tutorial, latest version is 0.10.0.1 so we will be installing this version.

    Once you have downloaded binary file, extract it to a directory where you would like it to execute from. In my case, i have extracted it to /opt/big-data/kafka/kafka_2.11-0.10.0.1 path.

     

    Configuring Apache ZooKeeper


    We first need to check configuration for Apache ZooKeeper. If you are using a separate ZooKeeper cluster, please skip this and next step related to ZooKeeper.

    You can check ZooKeeper configuration by executing following command from Kafka home directory (for me - /opt/big-data/kafka/kafka_2.11-0.10.0.1) -

    vi config/zookeeper.properties

    And check for following properties - 

    1. dataDir - It should point to a directory where you want ZooKeeper to save its data
    2. clientPort - Defaults to 2181. Leave it as it is.

    Here are how these properties have been configured in my case - 

    dataDir=/tmp/zookeeper
    # the port at which the clients will connect
    clientPort=2181

     

    Starting up Apache ZooKeeper


    We can now start our ZooKeeper by running following command from Kafka home directory - 

    ./bin/zookeeper-server-start.sh -daemon config/zookeeper.properties

    You can then use below command to verify if it has started - 

    jps

    You will see a process called QuorumPeerMain if ZooKeeper has started successfully - 

    10956 QuorumPeerMain

     

    Configuring Apache Kafka


    We will be creating a cluster of two Kafka instances (brokers) on our machine and hence will prepare two configuration files. We will be mainly focussing on following properties -

    1. broker.id - Id of the broker i.e. an integer. Each broker in a cluster needs to have a unique id.
    2. log.dirs - Directory where you want Kafka to commit its message. Not to be confused it with usual log files.
    3. port - Port on which Kafka will accept connections from producers and consumers
    4. zookeeper.connect - Comma separate list of ZooKeeper nodes. E.g. hostname1:port1,hostname2:port2. In our case, we will set it to localhost:2181

    One property file is already present in config directory of our Kafka installation and can be edited using below command from Kafka home directory - 

    vi config/server.properties

    Here are the values of concerned properties set in my broker - 

    broker.id=0
    # The port the socket server listens on
    port=9092
    # A comma seperated list of directories under which to store log files
    log.dirs=/tmp/kafka-logs-0
    zookeeper.connect=localhost:2181

     

    Next step is to create a new property file by making a copy of existing one as follows -

    cp config/server.properties config/server-1.properties

    We will now update our server-1.properties by updating properties as follows -  

    broker.id=1
    # The port the socket server listens on
    port=9093
    # A comma seperated list of directories under which to store log files
    log.dirs=/tmp/kafka-logs-1
    zookeeper.connect=localhost:2181

     

    Starting up Apache Kafka Cluster


    Finally, it's now time to start our Apache Kafka brokers. Once we have configured two properties files,starting Apache Kafka is as simple as executing following commands - 

    # start first broker
    ./bin/kafka-server-start.sh -daemon config/server.properties
    
    #start second broker
    ./bin/kafka-server-start.sh -daemon config/server-1.properties

    You can use jps command and check whether two Kafka processes are running as follows - 

    jps
    
    # sample output - notice, two instances of Kafka
    13100 Kafka
    13089 Kafka
    13155 Jps
    10956 QuorumPeerMain

     

    Testing Apache Kafka Cluster


    We have successfully started two Kafka brokers on our machine and checked that two processes are running. However, we still need to check that cluster is functioning properly. In order to do that we will use Kafka provided utilities to create a topic, sending message and consuming messages.

    We will start with creating a test-topic using below command - 

    ./bin/kafka-topics.sh --create --zookeeper localhost:2181 --topic test-topic --partitions 1 --replication-factor 1
    
    # you will get below message if it is created successfully
    Created topic "test-topic".

    We will now send couple of sample messages to this newly created topic using Kafka console producer utility. After executing below command, type your message and press enter to send it through. 

    ./bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test-topic
    
    # type your message and press enter
    First test message
    Second test message

    Next step is to consume these mesasges from Kafka console consumer utility using below command. Message that we sent using producer will be printed on console after successful execution of command - 

    ./bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test-topic --from-beginning
    
    # below should be output of this command
    First test message
    Second test message

    We can hence conclude that our Apache Kafka cluster is ready for our applications.

     

    Thank you for reading through the tutorial. In case of any feedback/questions/concerns, you can communicate same to us through your comments and we shall get back to you as soon as possible.

    Posted By: Amit Kumar
    Published on: 14th September 2016