Install Hadoop 2.2.0 on Ubuntu Linux 13.04 (Single-Node Cluster)
This tutorial explains how to install Hadoop 2.2.0/2.3.0/2.4.0/2.4.1 on Ubuntu 13.04/13.10/14.04 (Single-Node Cluster). This setup does not require an additional user for Hadoop. All files related to Hadoop will be stored inside the `~/hadoop` directory.
-
Install a JRE. If you want the Oracle JRE, follow this post.
-
Install SSH:
sudo apt-get install openssh-serverGenerate a SSH key:
ssh-keygen -t rsa -P ""Enable this SSH key:
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys(Optional) Disable SSH login from remote addresses by setting in
/etc/ssh/sshd_config:ListenAddress 127.0.0.1Test local connection:
ssh localhostIf ok, then exit:
exitOtherwise debug :)
-
Download Hadoop 2.2.0 (or newer releases)
-
Unpack, rename and move to the home directory:
tar xvf hadoop-2.2.0.tar.gz mv hadoop-2.2.0 ~/hadoop -
Create HDFS directory:
mkdir -p ~/hadoop/data/namenode mkdir -p ~/hadoop/data/datanode -
In file
~/hadoop/etc/hadoop/hadoop-env.shadd (after the commentThe java implementation to use.):export JAVA_HOME="`dirname $(readlink /etc/alternatives/java)`/../" export HADOOP_COMMON_LIB_NATIVE_DIR="~/hadoop/lib" export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=~/hadoop/lib" -
In file
~/hadoop/etc/hadoop/core-site.xml(inside<configuration>tag):<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> -
In file
~/hadoop/etc/hadoop/hdfs-site.xml(inside<configuration>tag):<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>${user.home}/hadoop/data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>${user.home}/hadoop/data/datanode</value> </property> -
In file
~/hadoop/etc/hadoop/yarn-site.xml(inside<configuration>tag):<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> -
Create file
~/hadoop/etc/hadoop/mapred-site.xml:cp ~/hadoop/etc/hadoop/mapred-site.xml.template ~/hadoop/etc/hadoop/mapred-site.xmlAnd add (inside
<configuration>tag):<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> -
Add Hadoop binaries to PATH:
echo "export PATH=$PATH:~/hadoop/bin:~/hadoop/sbin" >> ~/.bashrc source ~/.bashrc -
Format HDFS:
hdfs namenode -format -
Start Hadoop:
start-dfs.sh && start-yarn.shIf you get the warning:
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableIt is because you are running on 64bit but Hadoop native library is 32bit. This is not a big issue. If you want (optional) to fix it, check this.
-
Check status:
jpsExpected output (PIDs may change!):
10969 DataNode 11745 NodeManager 11292 SecondaryNameNode 10708 NameNode 11483 ResourceManager 13096 JpsN.B. The old JobTracker has been replaced by the ResourceManager.
-
Access web interfaces:
- Cluster status:
http://localhost:8088 - HDFS status:
http://localhost:50070 - Secondary NameNode status:
http://localhost:50090
- Cluster status:
-
Test Hadoop:
hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10Check the results and remove files:
hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -cleanAnd:
hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5 -
Stop hadoop:
stop-dfs.sh && stop-yarn.sh
Posted on November 23, 2013