Hadoop Upgrade from 0.19 to 0.20 Caltech

This notes comes from our experience upgrading our Tier3 from 1.2.x. Based on the OSG instructions (references) and adapted to our topology (rocks 5.4). For updated instructions check first OSG instructions.

References

https://www.opensciencegrid.org/bin/view/Documentation/Release3/HadoopOverview

https://www.opensciencegrid.org/bin/view/Storage/HadoopUpgrade

Before upgrade requirements

In rocks clusters is better to create users in advance. The hadoop upgrade to 0.20 requires two new users in adition to hadoop user(already existing in 0.19).

  • hdfs
  • mapred
  • In frontnode:

    adduser hdfs
    adduser mapred
    #for rocks clusters
    rocks sync users
    

    In namenode:

    hadoop fsck / -files -blocks -locations > hdfs-old-fsck.log
    hadoop dfs -lsr / > hdfs-old-lsr.log
    hadoop dfsadmin -report > hdfs-old-report.log
    

    In all cluster:

    #umount fuse
    umount /mnt/hadoop
    #shutdown hadoop
    service hadoop stop
    #remove old repo (hadoop-019)
    rpm -ev osg-hadoop-1-2.el5
    
    #install new repo (hadoop-020 Caltech)
    rpm -ihv http://vdt.cs.wisc.edu/hadoop/osg-hadoop-20-3.el5.noarch.rpm
    

    Install new RPMs

    In all cluster:

    yum install hadoop-0.20-osg
    

    Configure hadoop-020 (changes compared to 019)

    At some point in the transition from 020 Caltech to 020 OSG there will a change in default user from hadoop to hdfs. We addressed this by exchanging users in /etc/passwd at this stage and rerun rocks sync users command in frontend.

    vi /etc/passwd
    #exchange hadoop<-->hdfs
    # add hdfs to hadoop group
    vi /etc/group
    #for rocks clusters
    rocks sync users
    

    changes in /etc/sysconfig/hadoop:

    #019# HADOOP_CONF_DIR=/etc/hadoop
    ---
    #020# HADOOP_CONF_DIR=/etc/hadoop/conf
    ---
    #019# HADOOP_USER=hadoop
    ---
    #020# HADOOP_USER=hdfs
    ---
    #020# # The jvm heap size for the namenode.  The rule of thumb is a
    #020# # minimum of 1GB per million hdfs blocks.  With a 128MB block
    #020# # size, this comes out to roughly 1GB per 128TB of storage space.
    #020# HADOOP_NAMENODE_HEAP=2048
    ---
    #019# HADOOP_UMASK=018
    ---
    #020# this definition is really gone in 020
    #020# better is edit core-site.xml and set dfs.umaskmode 002-->022 
    #020# HADOOP_UMASK=022
    

    From /etc/hadoop-0.20/conf.osg make our own configuration template dir. in nodename first (or some test node)

    #create our configuration dir
    mkdir /etc/hadoop-0.20/conf.uprm
    #copy default osg
    cp -p /etc/hadoop-0.20/conf.osg/* /etc/hadoop-0.20/conf.uprm/.
    #break simbolic links and replace for uprm dir
    rm -i /etc/alternatives/hadoop-0.20-conf
    ln -s /etc/hadoop-0.20/conf.uprm /etc/alternatives/hadoop-0.20-conf
    #update conf from /etc/sysconfig/hadoop
    /etc/init.d/hadoop-firstboot start
    
    #reuse 019 hadoop-site.xml (019 is saved in /etc/hadoop-0.19 dir) this replace core-site.xml generated by hadoop-firstboot
    cp -p /etc/hadoop-0.19/hadoop-site.xml /etc/hadoop-0.20/conf.uprm/core-site.xml
    
    #copy template to shared dir to facilite datanode installations. 
    cp -pR /etc/hadoop-0.20/conf.uprm /shared-dir/.
    

    Copy new configuration into datanodes,secondary and (namenode if not done already)

    #clon uprm conf
    cp -pR /shared-dir/conf.uprm /etc/hadoop-0.20/.
    
    #break simbolic links and replace for uprm dir
    rm -i /etc/alternatives/hadoop-0.20-conf
    ln -s /etc/hadoop-0.20/conf.uprm /etc/alternatives/hadoop-0.20-conf
    
    #change permisions on hadoop data disk
    chown -R hdfs:hadoop $HADOOP_DATADIR
    #or (in this case disks mounted on /hadoop, /hadoop2)
    chown -R hdfs:hadoop /hadoop/*
    chown -R hdfs:hadoop /hadoop2/*
    
    

    Run upgrade

    In namenode:

    su hdfs -s /bin/sh -c "/usr/bin/hadoop-daemon start namenode -upgrade"
    

    In datanode/secondary:

    #start service
    service hadoop start
    #or
    #/etc/init.d/hadoop start
    mount /mnt/hadoop
    

    Follow upgrade process in namenode:

    hadoop dfsadmin -upgradeProgress status
    

    Upgrade done

    Once the upgrade is done, you can verify your data as explain in the references. If you are sure that all is fine you can finalize the upgrade with this last step (NO RETURN after this). You can wait day or weeks before attempting this step.

    hadoop dfsadmin -finalizeUpgrade