Recent Tutorials and Articles
    Setting up Apache ZooKeeper Cluster
    Published on: 13th April 2015

    This tutorial provides step by step instructions to configure and start up Apache ZooKeeper 3.4.6 Multi-node cluster (also known as Ensemble).

    Abstract

    Apache ZooKeeper, at its core, provides an API to let you manage your application state in a highly read-dominant concurrent and distributed environment. It is optimized for and performs well in the scenario where read operations greatly outnumber write operations.

    This article assumes that you have got the basic idea of technical architecture and components of Apache ZooKeeper. If you are totally new to Apache ZooKeeper, you are strongly recommended to read the article - Introduction to Apache ZooKeeper.

    Pre-requisites

    First thing that we would need in order to install Apache ZooKeeper are multiple machines. In this tutorial, We will be utilizing following virtual machines to install Apache ZooKeeper -

    Parameter Name Virtual Machine 1 Virtual Machine 2
    Name VM1 VM2
    IP Address 192.168.111.130 192.168.111.132
    Operating System Ubuntu-14.04.1-64bit Ubuntu-14.04.1-64bit
    No of CPU Cores 4 4
    RAM 6 GB 6 GB

    Apart from above machines, please ensure that the following pre-requisites have been fulfilled to ensure that you are able to follow this article without any issues-

    1. JDK 6 or higher installed on all the virtual machines
    2. JAVA_HOME variable set to the path where JDK is installed
    3. Root access on all the virtual machines as all the steps should ideally be performed by root user
    4. Updated /etc/hosts file on both the virtual machines with the IP address of other virtual machines. E.g. /etc/hosts on VM1 will need to have IP address of VM2 along with hostname (VM2). In my case, this additional line in VM1 hosts file looks like 192.168.111.132 VM2.
    Installing Apache Zookeeper

    First step to install Apache ZooKeeper is to download its binaries on both the virtual machines. In this article, we will be installing Apache ZooKeeper 3.4.6 to set up cluster which can be downloaded from here.

    Once the libraries have been downloaded on the virtual machines, you can extract it to a directory where you would like ZooKeeper to be installed. We will refer this directory as $ZooKeeper_Base_Dir throughout this tutorial.

    Configuring Multi-node Cluster (Ensemble)

    Once Apache ZooKeeper has been extracted on all the virtual machines, next step is to configure these. Below diagram depicts the deployment architecture that we will be setting up -

    We don't need to mark any node as Leader node during configuration as the leader is automatically chosen by ZooKeeper service. So, configuration for all the nodes will be same. First part of configuration involves creating/updating a configuration file called zoo.cfg in $ZooKeeper_Base_Dir/conf directory with following contents:

    ZooKeeper Configuration - $ZooKeeper_Base_Dir/conf/zoo.cfg
    tickTime=2000
    
    #Replace the value of dataDir with the directory where you would like ZooKeeper to save its data
    dataDir=<$ZooKeeper_Base_Dir/data>
    
    #Replace the value of dataLogDir with the directory where you would like ZooKeeper to log
    dataLogDir=<$ZooKeeper_Base_Dir/logs>
    
    clientPort=2181
    initLimit=10
    syncLimit=5
    server.1=192.168.111.130:2888:3888
    server.2=192.168.111.132:2888:3888
    
    

    First thing that you would need to do in above zoo.cfg file is to replace the value of dataDir and dataLogDir with the directory where you would like ZooKeeper to save its data and log respectively. Now, let's talk about some of the important parts of above configuration.

    clientPort property, as the name suggests, is for the clients to connect to ZooKeeper Service.

    Next let's talk about the last two entries in server.x=hostname:nnnnn:mmmmmm format. Firstly, there are two port numbers nnnnn(2888) and mmmmm(3888). The first followers use to connect to the leader, and the second is for leader election. Secondly, x in server.x denotes the id of node. Each server.x row must have unique id. Each server is assigned an id by creating a file named myid, one for each server, which resides in that server's data directory, as specified by the configuration file parameter dataDir.

    The myid file consists of a single line containing only the text of that machine's id. So myid of server 1 would contain the text 1 and nothing else. The id must be unique within the ensemble and should have a value between 1 and 255.

    Starting Up Multi-node Cluster (Ensemble)

    Once you are all set up, next step is to start the cluster. On all the virtual machines, go to bin directory of Apache ZooKeeper and execute the following commands -

    ZooKeeper_Base_Dir/bin on all machines
    ./zkServer.sh start
    
    

    You can execute the follow command to check the status of Apache ZooKeeper -

    ZooKeeper_Base_Dir/bin on all machines
    ./zkServer.sh status
    
    
    Stopping Multi-node Cluster (Ensemble)

    In order to stop Apache ZooKeeper, execute the following command on all the virtual machines -

    $ZooKeeper_Base_Dir/bin on all machines
    ./zkServer.sh stop
    
    

    Thank you for reading through the tutorial. In case of any feedback/questions/concerns, you can communicate same to us through your comments and we shall get back to you as soon as possible.

    Published on: 13th April 2015

    Comment Form is loading comments...