Hadoop installation

  • October 14, 2018
  • by

Hadoop: Setting up a Single Node Cluster.

This document leads how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Supported Platforms

  • Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
  • Windows is also a supported platform but the followings steps are for Linux only. To set up Hadoop on Windows, see official hadoop wiki page


  • Java? must be installed.  Hadoop Version 2.7 and later of Apache Hadoop requires Java 7. It is built and tested on both OpenJDK and Oracle (HotSpot)’s JDK/JRE. Earlier versions (2.6 and earlier) support Java 6.Recommended Java versions are described at official documents.
$ sudo add-apt-repository ppa:openjdk-r/ppa  
$ sudo apt-get update   
$ sudo apt-get install openjdk-7-jdk  
  • ssh and rsync must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemon.
$ sudo apt-get install ssh
$ sudo apt-get install rsync

Hadoop Installation & Configuration

I recommend creating a normal (not root) account for Hadoop working such as hadoop and user group hadoop. To create an account using the following command.

$ sudo groupadd hadoop
$ sudo adduser hadoop
$ passwd hadoop
$ useradd -G hadoop hadoop

After creating the account, it also required to set up key-based ssh to its own account. To do this use execute following commands.

$ su - hadoop
$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

Let?s verify key based login. Below command should not ask for the password but the first time it will prompt for adding RSA to the list of known hosts.

$ ssh localhost
$ exit

Hadoop Distribution

In this step, download hadoop latest version source archive file using below command. You can also select alternate download mirror for increasing download speed.

$ cd ~
$ wget http://mirrors.gigenet.com/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz
$ tar xzf hadoop-3.1.1.tar.gz
$ mv hadoop-3.1.1 hadoop


$ export HADOOP_HOME=/home/hadoop/hadoop

Environmental Variables & Configuration

To run the hadoop distribution you have to define the JAVA_HOME variable

$ export JAVA_HOME=/usr/lib/jvm/java-8-oracle

then edit the file $HADOOP_HOME/etc/hadoop/hadoop-env.sh to define some parameters as follows:

$ vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh

and add the same code above to the this hadoop-env.sh file as

export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native"

Configuration Files

Hadoop has many of configuration files, which need to configure as per requirements of your Hadoop infrastructure. Let?s start with the configuration with basic Hadoop single node cluster setup. first, navigate to below location

$ cd $HADOOP_HOME/etc/hadoop

Edit core-site.xml


Edit hdfs-site.xml


Edit yarn-site.xml


Edit yarn-site.xml


Format Namenode

First in the $HADOOP_HOME/bin/,  format the namenode using the following command, make sure that Storage directory is

$ hdfs namenode -format

Start Hadoop Cluster

Let?s start your Hadoop cluster using the scripts provides by Hadoop. Just navigate to your $HADOOP_HOME/sbin directory and execute scripts one by one.

$ cd $HADOOP_HOME/sbin/

Now run start-dfs.sh script.

$ ./start-dfs.sh

Sample output:

Starting namenodes on [localhost]
Starting datanodes
Starting secondary namenodes [smart]
2018-05-02 18:00:32,565 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Now run start-yarn.sh script.

$ ./start-yarn.sh

Sample output:

Starting resourcemanager
Starting nodemanagers

Access Hadoop Services in Browser

Hadoop NameNode started on port 9870 default. Access your server on port 9870 in your favorite web browser.


Now access port 8042 for getting the information about the cluster and all applications


Access port 9864 to get details about your Hadoop node.


Test Hadoop Single Node Setup

Make the HDFS directories required using following commands.

$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/hadoop

Copy all files from local file system /var/log/httpd to hadoop distributed file system using below command

$ bin/hdfs dfs -put /var/log/apache2 logs

Browse Hadoop distributed file system by opening below URL in the browser. You will see an apache2 folder in the list. Click on the folder name to open and you will find all log files there.


Now copy logs directory for hadoop distributed file system to local file system.

$ bin/hdfs dfs -get logs /tmp/logs
$ ls -l /tmp/logs/

You may also Like

Leave a reply

Your email address will not be published. Required fields are marked *

About Me

I am a dad of two wonderful boys, Utku Efe and Omer Eren, and married with Elif. In addition, I am an academician and AI/ML scientist because I worked more than 15 years in universities, have M.S and Ph.D. thesis and more than 20 scientific papers/presentations and 100 citations. Now people call me as a Principal Developer in my last company :).
I am really hungry on learning new technologies and get more fun while developing new softwares.

Stay Connected