Apache Hadoop Setup

Hadoop 2.x is based on YARN architecture, which uses ResourceManagaer and ApplicationManager. ResourceManagaer manage recourses across cluster and Application Manager manages job life cycles.
Installing Hadoop is quite simple what we need to do to just Untar the Hadoop tar on the cluster nodes.
Master nodes will take responsibility of NameNode and ResourceManager whereas slaves clusters will take up the responsibility of DataNode and NodeManager. NameNode and ResourceManager could be different nodes.
Below explain step by step how to setup Hadoop 2.x on a single-node cluster.
Prerequisites:
• Java 6 installed
• Dedicated user for Hadoop
• SSH configured

Platform:
We are using MacOS however we could also follow the same steps to install in Linux. If we need to install on Window, we have to install Cygwin to support shell.
Download
• Download tar from link http://hadoop.apache.org/releases.html
• Extract into /Application/hadoop-2.3.0
Setup Environment
1. $ export HADOOP_HOME=/Application/hadoop-2.3.0
2. export PATH=$HADOOP_HOME/bin:$PATH
3. $export PATH=$HADOOP_HOME/sbin:$PATH
Note: We could also add above command bash profile to avoid repeating above steps
Create directories
Create namenode and datanode directory as per below
$ mkdir -p $ HADOOP_HOME/data/hdfs/namenode
$ mkdir -p $ HADOOP_HOME/data/hdfs/datanode
Change in yarn-site.xml
Change in /Application/hadoop-2.3.0/etc/hadoop/yarn-site.xml as below

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
</property>
<property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

Change in core-site.xml
Change in /Application/hadoop-2.3.0/etc/hadoop/core-site.xml

<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs://localhost:9000/</value>
 </property>
</configuration>
Change in hdfs-site.xml
Change in /Application/hadoop-2.3.0/etc /hadoop/hdfs-site.xml:

<configuration>
<property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/ Application /hadoop-2.3.0/data/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/ Application /hadoop-2.3.0/data/hdfs/datanode</value>
 </property>
</configuration>

Change in mapred-site.xml
Change in / Application /hadoop-2.3.0/etc/hadoop/mapred-site.xml. If it is not available then create one.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
   <name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>
</configuration>

Logging
Update the conf/log4j.properties file to customize the Hadoop logging configuration.
Hadoop uses the Apache log4j via the Apache Commons Logging framework.
Format namenode

$ hadoop namenode -format
You will be getting message as below
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at username.local/xx.yyy.zz.aa
************************************************************/
Start HDFS server
run jps command by 

$jps
	782
912 Jps

Means as of now HDFS namenode has not been started
Start namenode

$sh hadoop-daemon.sh start namenode

Start datanode 

sh hadoop-daemon.sh start datanode
$ jps
1305 Jps
1238 DataNode
1201 NameNode

Resource Manager

$ sh yarn-daemon.sh start resourcemanager

Node Manager:

$ sh yarn-daemon.sh start nodemanager

Job History Server:

$ sh mr-jobhistory-daemon.sh start historyserver

Web interface
Browse HDFS and check health using http://localhost:50070 in the browser:

Reference
Hadoop Essence: The Beginner's Guide to Hadoop & Hive

Advertisements

6 thoughts on “Apache Hadoop Setup

  1. Pingback: Hadoop Map Reduce | Tutorials

  2. Pingback: Hadoop HDFS JAVA API | Tutorials

  3. Pingback: Hive Setup | Tutorials

  4. Pingback: Hadoop MapReduce Group By Operation – Part1 | Tutorials

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s