Hadoop Upgrade from 0.19 to 0.20 Caltech

This notes comes from our experience upgrading our Tier3 from 1.2.x. Based on the OSG instructions (references) and adapted to our topology (rocks 5.4). For updated instructions check first OSG instructions.

References

https://www.opensciencegrid.org/bin/view/Documentation/Release3/HadoopOverview

https://www.opensciencegrid.org/bin/view/Storage/HadoopUpgrade

Before upgrade requirements

In rocks clusters is better to create users in advance. The hadoop upgrade to 0.20 requires two new users in adition to hadoop user(already existing in 0.19).

hdfs

mapred

In frontnode:

adduser hdfs
adduser mapred
#for rocks clusters
rocks sync users

In namenode:

hadoop fsck / -files -blocks -locations > hdfs-old-fsck.log
hadoop dfs -lsr / > hdfs-old-lsr.log
hadoop dfsadmin -report > hdfs-old-report.log

In all cluster:

#umount fuse
umount /mnt/hadoop
#shutdown hadoop
service hadoop stop
#remove old repo (hadoop-019)
rpm -ev osg-hadoop-1-2.el5

#install new repo (hadoop-020 Caltech)
rpm -ihv http://vdt.cs.wisc.edu/hadoop/osg-hadoop-20-3.el5.noarch.rpm

Install new RPMs

In all cluster:

yum install hadoop-0.20-osg

Configure hadoop-020 (changes compared to 019)

At some point in the transition from 020 Caltech to 020 OSG there will a change in default user from hadoop to hdfs. We addressed this by exchanging users in /etc/passwd at this stage and rerun rocks sync users command in frontend.

vi /etc/passwd
#exchange hadoop<-->hdfs
# add hdfs to hadoop group
vi /etc/group
#for rocks clusters
rocks sync users

changes in /etc/sysconfig/hadoop:

#019# HADOOP_CONF_DIR=/etc/hadoop
---
#020# HADOOP_CONF_DIR=/etc/hadoop/conf
---
#019# HADOOP_USER=hadoop
---
#020# HADOOP_USER=hdfs
---
#020# # The jvm heap size for the namenode.  The rule of thumb is a
#020# # minimum of 1GB per million hdfs blocks.  With a 128MB block
#020# # size, this comes out to roughly 1GB per 128TB of storage space.
#020# HADOOP_NAMENODE_HEAP=2048
---
#019# HADOOP_UMASK=018
---
#020# this definition is really gone in 020
#020# better is edit core-site.xml and set dfs.umaskmode 002-->022 
#020# HADOOP_UMASK=022

From /etc/hadoop-0.20/conf.osg make our own configuration template dir. in nodename first (or some test node)

#create our configuration dir
mkdir /etc/hadoop-0.20/conf.uprm
#copy default osg
cp -p /etc/hadoop-0.20/conf.osg/* /etc/hadoop-0.20/conf.uprm/.
#break simbolic links and replace for uprm dir
rm -i /etc/alternatives/hadoop-0.20-conf
ln -s /etc/hadoop-0.20/conf.uprm /etc/alternatives/hadoop-0.20-conf
#update conf from /etc/sysconfig/hadoop
/etc/init.d/hadoop-firstboot start

#reuse 019 hadoop-site.xml (019 is saved in /etc/hadoop-0.19 dir) this replace core-site.xml generated by hadoop-firstboot
cp -p /etc/hadoop-0.19/hadoop-site.xml /etc/hadoop-0.20/conf.uprm/core-site.xml

#copy template to shared dir to facilite datanode installations. 
cp -pR /etc/hadoop-0.20/conf.uprm /shared-dir/.

Copy new configuration into datanodes,secondary and (namenode if not done already)

#clon uprm conf
cp -pR /shared-dir/conf.uprm /etc/hadoop-0.20/.

#break simbolic links and replace for uprm dir
rm -i /etc/alternatives/hadoop-0.20-conf
ln -s /etc/hadoop-0.20/conf.uprm /etc/alternatives/hadoop-0.20-conf

#change permisions on hadoop data disk
chown -R hdfs:hadoop $HADOOP_DATADIR
#or (in this case disks mounted on /hadoop, /hadoop2)
chown -R hdfs:hadoop /hadoop/*
chown -R hdfs:hadoop /hadoop2/*

Run upgrade

In namenode:

su hdfs -s /bin/sh -c "/usr/bin/hadoop-daemon start namenode -upgrade"

In datanode/secondary:

#start service
service hadoop start
#or
#/etc/init.d/hadoop start
mount /mnt/hadoop

Follow upgrade process in namenode:

hadoop dfsadmin -upgradeProgress status

Upgrade done

Once the upgrade is done, you can verify your data as explain in the references. If you are sure that all is fine you can finalize the upgrade with this last step (NO RETURN after this). You can wait day or weeks before attempting this step.

hadoop dfsadmin -finalizeUpgrade