Install Hadoop 2.2.0 on Ubuntu Linux 13.04 (Single-Node Cluster)

This tutorial explains how to install Hadoop 2.2.0/2.3.0/2.4.0/2.4.1 on Ubuntu 13.04/13.10/14.04 (Single-Node Cluster). This setup does not require an additional user for Hadoop. All files related to Hadoop will be stored inside the `~/hadoop` directory.

Install a JRE. If you want the Oracle JRE, follow this post.
Install SSH:
```
sudo apt-get install openssh-server
```
Generate a SSH key:
```
ssh-keygen -t rsa -P ""
```
Enable this SSH key:
```
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
```
(Optional) Disable SSH login from remote addresses by setting in /etc/ssh/sshd_config:
```
ListenAddress 127.0.0.1
```
Test local connection:
```
ssh localhost
```
If ok, then exit:
```
exit
```
Otherwise debug :)
Download Hadoop 2.2.0 (or newer releases)

Unpack, rename and move to the home directory:

tar xvf hadoop-2.2.0.tar.gz
mv hadoop-2.2.0 ~/hadoop

Create HDFS directory:

mkdir -p ~/hadoop/data/namenode
mkdir -p ~/hadoop/data/datanode

In file ~/hadoop/etc/hadoop/hadoop-env.sh add (after the comment The java implementation to use.):

export JAVA_HOME="`dirname $(readlink /etc/alternatives/java)`/../"
export HADOOP_COMMON_LIB_NATIVE_DIR="~/hadoop/lib"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=~/hadoop/lib"

In file ~/hadoop/etc/hadoop/core-site.xml (inside <configuration> tag):

<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
</property>

In file ~/hadoop/etc/hadoop/hdfs-site.xml (inside <configuration> tag):

<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>

<property>
  <name>dfs.namenode.name.dir</name>
  <value>${user.home}/hadoop/data/namenode</value>
</property>

<property>
  <name>dfs.datanode.data.dir</name>
  <value>${user.home}/hadoop/data/datanode</value>
</property>

In file ~/hadoop/etc/hadoop/yarn-site.xml (inside <configuration> tag):

<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>

<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

Create file ~/hadoop/etc/hadoop/mapred-site.xml:

cp ~/hadoop/etc/hadoop/mapred-site.xml.template ~/hadoop/etc/hadoop/mapred-site.xml

And add (inside <configuration> tag):

<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>

Add Hadoop binaries to PATH:

echo "export PATH=$PATH:~/hadoop/bin:~/hadoop/sbin" >> ~/.bashrc
source ~/.bashrc

Format HDFS:
```
hdfs namenode -format
```
Start Hadoop:
```
start-dfs.sh && start-yarn.sh
```
If you get the warning:
```
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
```
It is because you are running on 64bit but Hadoop native library is 32bit. This is not a big issue. If you want (optional) to fix it, check this.
Check status:
```
jps
```
Expected output (PIDs may change!):
```
10969 DataNode
11745 NodeManager
11292 SecondaryNameNode
10708 NameNode
11483 ResourceManager
13096 Jps
```
N.B. The old JobTracker has been replaced by the ResourceManager.
Access web interfaces:
- Cluster status: http://localhost:8088
- HDFS status: http://localhost:50070
- Secondary NameNode status: http://localhost:50090

Test Hadoop:

hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10

Check the results and remove files:

hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -clean

And:

hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5

Stop hadoop:
```
stop-dfs.sh && stop-yarn.sh
```

Posted on November 23, 2013

Install Hadoop 2.2.0 on Ubuntu Linux 13.04 (Single-Node Cluster)

Powered by on