/* Site under maintenance */

[Linux] Install Hadoop 2.2.0 on Ubuntu Linux 13.04 (Single-Node Cluster)

This tutorial explains how to install Hadoop 2.2.0/2.3.0/2.4.0/2.4.1 on Ubuntu 13.04/13.10/14.04 (Single-Node Cluster). This setup does not require an additional user for Hadoop. All files related to Hadoop will be stored inside the ~/hadoop directory.

  • Install a JRE. If you want the Oracle JRE, follow this post.

  • Install SSH:sudo apt-get install openssh-serverGenerate a SSH key:ssh-keygen -t rsa -P ""Enable SSH key:cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys(Optional) Disable SSH login from remote addresses by setting in /etc/ssh/sshd_config:ListenAddress 127.0.0.1Test local connection:ssh localhostIf Ok, then exit:exitOtherwise debug :)

  • Download Hadoop 2.2.0 (or newer versions)

  • Unpack, rename and move to the home directory:tar xvf hadoop-2.2.0.tar.gzmv hadoop-2.2.0 ~/hadoop
  • Create HDFS directory:mkdir -p ~/hadoop/data/namenodemkdir -p ~/hadoop/data/datanode
  • In file ~/hadoop/etc/hadoop/hadoop-env.sh insert (after the comment "The java implementation to use."):export JAVA_HOME="`dirname $(readlink /etc/alternatives/java)`/../" export HADOOP_COMMON_LIB_NATIVE_DIR="~/hadoop/lib" export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=~/hadoop/lib"
  • In file ~/hadoop/etc/hadoop/core-site.xml (inside <configuration> tag):<property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property>
  • In file ~/hadoop/etc/hadoop/hdfs-site.xml (inside <configuration> tag):<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>${user.home}/hadoop/data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>${user.home}/hadoop/data/datanode</value> </property>
  • In file ~/hadoop/etc/hadoop/yarn-site.xml (inside <configuration> tag):<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>
  • Create file ~/hadoop/etc/hadoop/mapred-site.xml:cp ~/hadoop/etc/hadoop/mapred-site.xml.template ~/hadoop/etc/hadoop/mapred-site.xmlAnd insert (inside <configuration> tag):<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
  • Add Hadoop binaries to PATH:echo "export PATH=$PATH:~/hadoop/bin:~/hadoop/sbin" >> ~/.bashrcsource ~/.bashrc
  • Format HDFS:hdfs namenode -format
  • Start Hadoop:start-dfs.sh && start-yarn.shIf you get the warning:
    WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    It is because you are running on 64bit but Hadoop native library is 32bit. This is not a big issue. If you want (optional) to fix it, check this.

  • Check status:jpsExpected output (PIDs may change!):10969 DataNode 11745 NodeManager 11292 SecondaryNameNode 10708 NameNode 11483 ResourceManager 13096 JpsN.B. The old JobTracker has been replaced by the ResourceManager.

  • Access web interfaces:
    • Cluster status: http://localhost:8088
    • HDFS status: http://localhost:50070
    • Secondary NameNode status: http://localhost:50090

  • Test Hadoop:hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -write -nrFiles 20 -fileSize 10Check the results and remove files:hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar TestDFSIO -cleanAnd:hadoop jar ~/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 5
  • Stop hadoop:stop-dfs.sh && stop-yarn.sh

Some of these steps are taken from this tutorial.


10 commentiInserisci un commento • Pubblicato il 26 November 2013 • Ultima modifica 23 July 2014 • Feed commenti
1. Vineeth Rajendran - 11 March 2014 @ 01:51
Thanks Emilio. Your tutorial helped me get going on Hadoop installation. I'm new to this world of UNIX and Linux. I've always hated it for being convoluted. :)
But i love the concept of Hadoop and want to put to use for some of my ideas.

Thanks Again.

Regards,
Vineeth
2. ercoppa - 11 March 2014 @ 14:48
Hi Vineeth, I'm glad it helped :)
3. Varun - 5 April 2014 @ 17:09
Hi Emilio,

I will be very much thankful to you, if you can provide answers to my questions.

1) After untar the hadoop 2.2.0, it shows many folders inside, In this article you have specified to set the path for the following variables :-
export HADOOP_COMMON_LIB_NATIVE_DIR="~/hadoop/lib"
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=~/hadoop/lib"

there are many lib folders inside many subfolders of hadoop main folder, please specify the exact path

Also you have given this path ~/hadoop/etc/hadoop/core-site.xml for making changes, but in actual I am using the following path :-
/home/adminuser/hdfs/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/core-site.xml

Is this ok?

Likewise I am modifying the configuration settings for other xml files:-
/home/adminuser/hdfs/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/hadoop-env.sh
/home/adminuser/hdfs/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/yarn-site.xml
/home/adminuser/hdfs/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/mapred-site.xml
/home/adminuser/hdfs/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/etc/hadoop/hdfs-site.xml

likewise in .bashrc file I have set the environment varibales like this:-

export PATH=$PATH:$HADOOP_INSTALL/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/bin
#export PATH=$PATH:$HADOOP_INSTALL/hadoop-dist/target/hadoop-3.0.0-SNAPSHOT/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
#export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
#export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/hadoop-tools/hadoop-di$"
#HADOOP VARIABLES END

after doing all this when I execute hadoop version cmd then I get the following error message:-

Error: Could not find or load main class org.apache.hadoop.util.VersionInfo

when I execute hadoop namenode -format cmd then I get the following error message:-

Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode

Please help me in properly configuring hadoop, I am new to Ubuntu 13.10

Regards,
Varun
4. Varun - 5 April 2014 @ 17:13
Hi Emilio,

I forget to mention the following variables which I have set in .bashrc file:-

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export HADOOP_INSTALL=/home/adminuser/hdfs

when I type java -version then the output is:-

java version "1.7.0_51"
OpenJDK Runtime Environment (IcedTea 2.4.4) (7u51-2.4.4-0ubuntu0.13.10.1)
OpenJDK Client VM (build 24.45-b08, mixed mode, sharing)

Regards,
Varun
5. Varun - 7 April 2014 @ 11:13
Hi Emilio,

I forget to mention the following variables which I have set in .bashrc file:-

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-i386
export HADOOP_INSTALL=/home/adminuser/hdfs

when I type java -version then the output is:-

java version "1.7.0_51"
OpenJDK Runtime Environment (IcedTea 2.4.4) (7u51-2.4.4-0ubuntu0.13.10.1)
OpenJDK Client VM (build 24.45-b08, mixed mode, sharing)

Regards,
Varun
6. ercoppa - 7 April 2014 @ 11:42
Hi Varun,

> After untar the hadoop 2.2.0, it shows many folders inside

I think you have downloaded the Hadoop's source directory (hadoop-2.3.0-src.tar.gz). You can't run Hadoop from that directory. You have to download Hadoop (binary, NOT the source) and unzip it. For instance (2.3.0):

http://mirror.nohup.it/apache/hadoop/common/hadoop-2.3.0/hadoop-2.3.0.tar.gz

After you unzip it, you should get these directories:
--
data include libexec logs README.txt share
bin etc lib LICENSE.txt NOTICE.txt sbin
--

If you use the Hadoop (and not its sources) then your other problems should be fixed.

P.s. Try using the Oracle JRE (or JDK). OpenJDK may gives you some problems (I read about it in the web, never tested).
7. Amrish - 22 April 2014 @ 05:50
Thanks,

It was indeed useful for quick start of a hadoop installation.

Thanks again.
Amrish.
8. ercoppa - 22 April 2014 @ 11:14
@Amrish: Thank you, I'm happy it helped :)
9. Baban Gaigole - 30 May 2014 @ 16:58
Hello everybody. This one of the "easiest to follow" tutorials i have found. Very neat and precise. I too have setup a multi-node hadoop cluster inside oracle solaris 11.1 using zones. You can have a look at http://hashprompt.blogspot.in/2014/05/multi-node-hadoop-cluster-on-oracle.html
10. shravan - 20 June 2014 @ 14:10
Thank you!!! its nice one