This notes comes from our experience upgrading our Tier3 from 1.2.x. Based on the OSG instructions (references) and adapted to our topology (rocks 5.4). For updated instructions check first OSG instructions.
https://www.opensciencegrid.org/bin/view/Documentation/Release3/HadoopOverview
https://www.opensciencegrid.org/bin/view/Storage/HadoopUpgrade
In rocks clusters is better to create users in advance. The hadoop upgrade to 0.20 requires two new users in adition to hadoop user(already existing in 0.19).
In frontnode:
adduser hdfs
adduser mapred
#for rocks clusters
rocks sync users
In namenode:
hadoop fsck / -files -blocks -locations > hdfs-old-fsck.log
hadoop dfs -lsr / > hdfs-old-lsr.log
hadoop dfsadmin -report > hdfs-old-report.log
In all cluster:
#umount fuse
umount /mnt/hadoop
#shutdown hadoop
service hadoop stop
#remove old repo (hadoop-019)
rpm -ev osg-hadoop-1-2.el5
#install new repo (hadoop-020 Caltech)
rpm -ihv http://vdt.cs.wisc.edu/hadoop/osg-hadoop-20-3.el5.noarch.rpm
In all cluster:
yum install hadoop-0.20-osg
At some point in the transition from 020 Caltech to 020 OSG there will a change in default user from hadoop to hdfs.
We addressed this by exchanging users in /etc/passwd at this stage and rerun rocks sync users command in frontend.
vi /etc/passwd
#exchange hadoop<-->hdfs
# add hdfs to hadoop group
vi /etc/group
#for rocks clusters
rocks sync users
changes in /etc/sysconfig/hadoop:
#019# HADOOP_CONF_DIR=/etc/hadoop
---
#020# HADOOP_CONF_DIR=/etc/hadoop/conf
---
#019# HADOOP_USER=hadoop
---
#020# HADOOP_USER=hdfs
---
#020# # The jvm heap size for the namenode. The rule of thumb is a
#020# # minimum of 1GB per million hdfs blocks. With a 128MB block
#020# # size, this comes out to roughly 1GB per 128TB of storage space.
#020# HADOOP_NAMENODE_HEAP=2048
---
#019# HADOOP_UMASK=018
---
#020# this definition is really gone in 020
#020# better is edit core-site.xml and set dfs.umaskmode 002-->022
#020# HADOOP_UMASK=022
From /etc/hadoop-0.20/conf.osg make our own configuration template dir. in nodename first (or some test node)
#create our configuration dir
mkdir /etc/hadoop-0.20/conf.uprm
#copy default osg
cp -p /etc/hadoop-0.20/conf.osg/* /etc/hadoop-0.20/conf.uprm/.
#break simbolic links and replace for uprm dir
rm -i /etc/alternatives/hadoop-0.20-conf
ln -s /etc/hadoop-0.20/conf.uprm /etc/alternatives/hadoop-0.20-conf
#update conf from /etc/sysconfig/hadoop
/etc/init.d/hadoop-firstboot start
#reuse 019 hadoop-site.xml (019 is saved in /etc/hadoop-0.19 dir) this replace core-site.xml generated by hadoop-firstboot
cp -p /etc/hadoop-0.19/hadoop-site.xml /etc/hadoop-0.20/conf.uprm/core-site.xml
#copy template to shared dir to facilite datanode installations.
cp -pR /etc/hadoop-0.20/conf.uprm /shared-dir/.
Copy new configuration into datanodes,secondary and (namenode if not done already)
#clon uprm conf
cp -pR /shared-dir/conf.uprm /etc/hadoop-0.20/.
#break simbolic links and replace for uprm dir
rm -i /etc/alternatives/hadoop-0.20-conf
ln -s /etc/hadoop-0.20/conf.uprm /etc/alternatives/hadoop-0.20-conf
#change permisions on hadoop data disk
chown -R hdfs:hadoop $HADOOP_DATADIR
#or (in this case disks mounted on /hadoop, /hadoop2)
chown -R hdfs:hadoop /hadoop/*
chown -R hdfs:hadoop /hadoop2/*
In namenode:
su hdfs -s /bin/sh -c "/usr/bin/hadoop-daemon start namenode -upgrade"
In datanode/secondary:
#start service
service hadoop start
#or
#/etc/init.d/hadoop start
mount /mnt/hadoop
Follow upgrade process in namenode:
hadoop dfsadmin -upgradeProgress status
Once the upgrade is done, you can verify your data as explain in the references.
If you are sure that all is fine you can finalize the upgrade with this last step (NO RETURN after this).
You can wait day or weeks before attempting this step.
hadoop dfsadmin -finalizeUpgrade