I started with installing Hadoop on a single node, i.e. my machine. It was a tricky task with most of the tutorials making many assumptions. Now that I have completed the install, I can safely say that those were simple assumptions and that anyone familiar with linux is deemed to understand them. Then I decided to write an installation instruction for the dummies.
Here is the most comprehensive documentation of "How to install Hadoop on your local system". Please let me know if I have missed anything.
Prerequisites
1. Linux
The first and foremost requirement is to get a PC with Linux installed on it. I used a machine with Ubuntu 9.10 installed on it. You can also work with Windows, as Hadoop is purely java based and it will work with any OS that can run JVM (which in turn implies pretty much all the modern OS's)
2. Sun Java6
Install the Sun Java6 on your Linux machine using:
$ sudo apt-get install sun-java6-bin sun-java6-jre sun-java6-jdk
3. Create a new user "hadoop"
Create a new user hadoop (though it is not required, it is recommended in order
to to separate the Hadoop installation from other software applications and user
accounts running on the same machine by having a dedicated user for hadoop).
Use the following commands:
$ sudo addgroup hadoop
$ sudo useradd -d /home/hadoop -m hadoop -g hadoop
4. Configure SSH
Install OpenSSHServer on your system:
$ sudo apt-get install openssh-server
Then generate an SSH key for the hadoop user. As the hadoop user do the
following:
$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Created directory '/home/hadoop/.ssh'.
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
1a:38:cd:0c:92:f9:8b:33:f3:a9:8e:dd:41:68:04:dc hadoop@paritoshdesktop
The key's randomart image is:
+[ RSA 2048]+
|o . |
| o E |
| = . |
| . + * |
| o = = S |
| . o o o |
| = o . |
| o * o |
|..+.+ |
++
$
Then enable SSH access to your local machine with this newly created key:
$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Test your SSH connection:
$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 1e:be:bb:db:71:25:e2:d5:b0:a9:87:9a:2c:43:e3:ae.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Linux paritoshdesktop 2.6.3120generic #58Ubuntu SMP Fri Mar 12 05:23:09 UTC
2010 i686
$
Now that the prerequisites are complete, lets go ahead with the Hadoop
installation.
Install Hadoop from Cloudera
1. Add repository
Create a new file /etc/apt/sources.list.d/cloudera.list with the following
contents, taking care to replace DISTRO with the name of your distribution (find
out by running lsb_release -c):
deb http://archive.cloudera.com/debian DISTRO-cdh3 contrib2. Add repository key. (optional)
debsrc http://archive.cloudera.com/debian DISTRO-cdh3 contrib
Add the Cloudera Public GPG Key to your repository by executing the following
command:
$ curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -
OK
$
This allows you to verify that you are downloading genuine packages.
Note: You may need to install curl:
$ sudo apt-get install curl
3. Update APT package index.
Simply run:
$ sudo apt-get update
4. Find and install packages.
You may now find and install packages from the Cloudera repository using your favorite APT package manager (e.g apt-get, aptitude, or dselect). For example:
$ apt-cache search hadoop
$ sudo apt-get install hadoop
Setting up a Hadoop Cluster
Here we will try to setup a Hadoop Cluster on a single node.
1. Configuration
Copy the hadoop0.20 directory to the hadoop home folder.
$ cd /usr/lib/Also, add the following to your .bashrc and .profile
$ cp -Rf hadoop0.20 /home/hadoop/
# Hadoop home dir declarationChange the following in different configuration files in the /$HADOOP_HOME
HADOOP_HOME=/home/hadoop/hadoop0.20
export HADOOP_HOME
1.1 hadoopenv.sh
Change the Java home, depending on where your java is installed:
# The java implementation to use. Required.
export JAVA_HOME=/usr/bin/java
1.2 core-site.xml
Change your core-site.xml to reflect the following:
1.3 mapred-site.xml
Change your mapred-site.xml to reflect the following:
1.4 hdfs-site.xml
Change your hdfs-site.xml to reflect the following:
2. Format the NameNode
To format the Hadoop Distributed filesystem (which simply initializes the directory specified by the dfs.name.dir variable), run the command:
$
/hadoop/bin/hadoop namenode -format
Running Hadoop
To start hadoop, run the startall.sh from the
$ ./start-all.sh
starting namenode, logging to /home/hadoop/hadoop-0.20/bin/../logs/hadoop-hadoop-namenode-paritosh-desktop.out
localhost: starting datanode, logging to /home/hadoop/hadoop-0.20/bin/../logs/hadoop-hadoop-datanode-paritosh-desktop.out
localhost: starting secondarynamenode, logging to /home/hadoop/hadoop-0.20/bin/../logs/hadoop-hadoop-secondarynamenode-paritosh-desktop.out
starting jobtracker, logging to /home/hadoop/hadoop-0.20/bin/../logs/hadoop-hadoop-jobtracker-paritosh-desktop.out
localhost: starting tasktracker, logging to /home/hadoop/hadoop-0.20/bin/../logs/hadoop-hadoop-tasktracker-paritosh-desktop.out
$
To check whether all the processes are running fine, run the following:
$ jps
17736 TaskTracker
17602 JobTracker
17235 NameNode
17533 SecondaryNameNode
17381 DataNode
17804 Jps
$
4 comments:
hai,thanks for that.i have learn to lot of hadoop.Hadoop Training in chenai
There are lots of information about latest technology and how to get trained in them, like Hadoop Training Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Hadoop Training in Chennai). By the way you are running a great blog. Thanks for sharing this (Salesforce Training in Chennai).
I have read your blog, it was good to read & I am getting some useful info's through your blog keep sharing... Informatica is an ETL tools helps to transform your old business leads into new vision. Learn Informatica training in chennai from corporate professionals with very good experience in informatica tool.
Regards,
Best Informatica Training In Chennai|Informatica training center in Chennai|Informatica training chennai
I really enjoyed while reading your article, the information you have mentioned in this post was damn good. Keep sharing your blog with updated and useful information.
Regards,
sas course in Chennai|sas training center in Chennai|sas training in Velachery
Post a Comment