dsy851009<em></e 阅读(40) 评论(0)

为满足公司程序运行环境,hadoop集群由原来的1.0版本升级到CDH5版本,又一次集群安装经历,分享给有需要的人。

一、机器准备

    Linux版本CentOs 5.8,x86_64,如果你的linux版本是6.x,也可以参照下面步骤安装;
本人此次安装共准备了5台机器:
192.168.32.70(master),192.168.32.71(slave1),192.168.32.72(slave2),192.168.32.73(slave3),192.168.32.79(slave4);
修改
/etc/sysconfig/network文件中的HOSTNAME,修改为方便记忆的名字,当然你也可以不改,只要你觉得方便就好;
修改/etc/hosts文件(五台机器都要修改):
192.168.32.70 master
192.168.32.71 slave1
192.168.32.72 slave2
192.168.32.73 slave3
192.168.32.79 slave4

二、环境准备

1、打通ssh
>所有机器 ssh-keygen -t rsa 一路按回车;
>在master机器上执行:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys;
>scp文件到其他几台机器:
    scp ~/.ssh/authorized_keys root@slave1:~/.ssh/
    scp ~/.ssh/authorized_keys root@slave2:~/.ssh/
                  ......
    scp ~/.ssh/authorized_keys root@slave4:~/.ssh/
>试验下免密码功能是否正常:

点击(此处)折叠或打开

  1. [root@master hadoop-conf]# ssh slave1
  2. Last login: Wed Sep 24 16:07:12 2014 from master
  3. [root@slave1 ~]#
没有提示输入密码,表示成功了;
2、安装JDK7
>官网下载jdk-7u51-linux-x64.rpm包;
>rpm -ivh jdk-7u51-linux-x64.rpm
>添加环境变量;
vi /etc/profile
增加
     JAVA_HOME=/usr/java/latest
     PATH=$PATH:$JAVA_HOME/bin
     CLASSPATH=$JAVA_HOME/lib:$JAVA_HOME/jre/lib
     export JAVA_HOME CLASSPATH
>执行source生效;
source /etc/profile
3、创建hadoop用户
    groupadd hdfs
    useradd hadoop -g hdfs

三、安装cdh5

1、下载rpm安装包
>进入目录/data/tools/ (个人习惯的软件存储目录,你可以自己随便选择);
    wget "http://archive.cloudera.com/cdh5/one-click-install/redhat/5/x86_64/cloudera-cdh-5-0.x86_64.rpm"   ---------如果你的Linux版本是6.x这里改为6即可,下同;
    yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
>添加cloudera仓库验证;
    rpm --import http://archive.cloudera.com/cdh5/redhat/5/x86_64/cdh/RPM-GPG-KEY-cloudera
2、安装
> master 安装NN,NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-hdfs-namenode
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
>slave1 安装RM,NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-yarn-resourcemanager
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
>slave2 、slave3、slave4安装NM,DN,MR,hadoop-client
yum clean all; yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
yum clean all; yum install hadoop-client
3、创建目录 (本人机器只有一个盘cache1,如果你有多个可以创建多个)
DN:
mkdir -p /data/cache1/dfs/dn
mkdir -p /data/cache1/dfs/mapred/local
chown -R hdfs:hadoop /data/cache1/dfs/dn
chown -R mapred:hadoop /data/cache1/dfs/mapred/local
NN:
mkdir -p /data/cache1/dfs/nn
chown -R hdfs:hadoop /data/cache1/dfs/nn
chmod 700 /data/cache1/dfs/nn
4、修改配置文件
修改master机器上的配置文件,然后scp到各个slave;
1)/etc/hadoop/conf/core-site.xml   红色IP为NN地址;

点击(此处)折叠或打开

  1. [root@master conf]# cat core-site.xml
  2. <?xml version="1.0"?>
  3. <!--
  4. Licensed to the Apache Software Foundation (ASF) under one or more
  5. contributor license agreements. See the NOTICE file distributed with
  6. this work for additional information regarding copyright ownership.
  7. The ASF licenses this file to You under the Apache License, Version 2.0
  8. (the "License"); you may not use this file except in compliance with
  9. the License. You may obtain a copy of the License at
  10. http://www.apache.org/licenses/LICENSE-2.0
  11. Unless required by applicable law or agreed to in writing, software
  12. distributed under the License is distributed on an "AS IS" BASIS,
  13. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14. See the License for the specific language governing permissions and
  15. limitations under the License.
  16. -->
  17. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  18. <configuration>
  19. <property>
  20. <name>fs.defaultFS</name>
  21. <value>hdfs://192.168.32.70:8020</value>
  22. </property>
  23. <property>
  24. <name>dfs.replication</name>
  25. <value>1</value>
  26. </property>
  27. </configuration>
2)/etc/hadoop/conf/hdfs-site.xml/yarn-site.xml

点击(此处)折叠或打开

  1. [root@master conf]# cat /etc/hadoop/conf/hdfs-site.xml
  2. <?xml version="1.0"?>
  3. <!--
  4. Licensed to the Apache Software Foundation (ASF) under one or more
  5. contributor license agreements. See the NOTICE file distributed with
  6. this work for additional information regarding copyright ownership.
  7. The ASF licenses this file to You under the Apache License, Version 2.0
  8. (the "License"); you may not use this file except in compliance with
  9. the License. You may obtain a copy of the License at
  10. http://www.apache.org/licenses/LICENSE-2.0
  11. Unless required by applicable law or agreed to in writing, software
  12. distributed under the License is distributed on an "AS IS" BASIS,
  13. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14. See the License for the specific language governing permissions and
  15. limitations under the License.
  16. -->
  17. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  18. <configuration>
  19. <property>
  20. <name>dfs.name.dir</name>
  21. <value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
  22. </property>
  23. <property>
  24. <name>dfs.datanode.data.dir</name>
  25. <value>/data/cache1/dfs/dn/</value>
  26. </property>
  27. </configuration>
3)/etc/hadoop/conf  红色的IP为装RM的机器,本例子中是192.168.32.71;

点击(此处)折叠或打开

  1. [root@master conf]# cat yarn-site.xml
  2. <?xml version="1.0"?>
  3. <!--
  4. Licensed to the Apache Software Foundation (ASF) under one or more
  5. contributor license agreements. See the NOTICE file distributed with
  6. this work for additional information regarding copyright ownership.
  7. The ASF licenses this file to You under the Apache License, Version 2.0
  8. (the "License"); you may not use this file except in compliance with
  9. the License. You may obtain a copy of the License at
  10. http://www.apache.org/licenses/LICENSE-2.0
  11. Unless required by applicable law or agreed to in writing, software
  12. distributed under the License is distributed on an "AS IS" BASIS,
  13. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  14. See the License for the specific language governing permissions and
  15. limitations under the License.
  16. -->
  17. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  18. <configuration>
  19. <property>
  20. <name>yarn.nodemanager.aux-services</name>
  21. <value>mapreduce_shuffle</value>
  22. </property>
  23. <property>
  24. <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
  25. <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  26. </property>
  27. <property>
  28. <name>yarn.log-aggregation-enable</name>
  29. <value>true</value>
  30. </property>
  31. <property>
  32. <description>List of directories to store localized files in.</description>
  33. <name>yarn.nodemanager.local-dirs</name>
  34. <value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
  35. </property>
  36. <property>
  37. <name>yarn.resourcemanager.address</name>
  38. <value>192.168.32.71:8032</value>
  39. </property>
  40. <property>
  41. <name>yarn.resourcemanager.scheduler.address</name>
  42. <value>192.168.32.71:8030</value>
  43. </property>
  44. <property>
  45. <name>yarn.resourcemanager.webapp.address</name>
  46. <value>0.0.0.0:8088</value>
  47. </property>
  48. <property>
  49. <name>yarn.resourcemanager.resource-tracker.address</name>
  50. <value>192.168.32.71:8031</value>
  51. </property>
  52. <property>
  53. <name>yarn.resourcemanager.admin.address</name>
  54. <value>192.168.32.71:8033</value>
  55. </property>
  56. <property>
  57. <description>Where to store container logs.</description>
  58. <name>yarn.nodemanager.log-dirs</name>
  59. <value>/var/log/hadoop-yarn/containers</value>
  60. </property>
  61. <property>
  62. <description>Where to aggregate logs to.</description>
  63. <name>yarn.nodemanager.remote-app-log-dir</name>
  64. <value>/var/log/hadoop-yarn/apps</value>
  65. </property>
  66. <property>
  67. <description>Classpath for typical applications.</description>
  68. <name>yarn.application.classpath</name>
  69. <value>
  70. $HADOOP_CONF_DIR,
  71. $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
  72. $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
  73. $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
  74. $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
  75. </value>
  76. </property>
  77. </configuration>
4)/etc/hadoop/conf/hadoop-env.sh 

点击(此处)折叠或打开

  1. [root@master conf]# cat hadoop-env.sh
  2. # Set Hadoop-specific environment variables here.
  3. # The only required environment variable is JAVA_HOME. All others are
  4. # optional. When running a distributed configuration it is best to
  5. # set JAVA_HOME in this file, so that it is correctly defined on
  6. # remote nodes.
  7. # The maximum amount of heap to use, in MB. Default is 1000.
  8. #export HADOOP_HEAPSIZE=
  9. #export HADOOP_NAMENODE_INIT_HEAPSIZE=""
  10. # Extra Java runtime options. Empty by default.
  11. export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true ${HADOOP_OPTS}"
  12. # Command specific options appended to HADOOP_OPTS when specified
  13. export HADOOP_NAMENODE_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}"
  14. HADOOP_JOBTRACKER_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dmapred.audit.logger=INFO,MRAUDIT -Dmapred.jobsummary.logger=INFO,JSA ${HADOOP_JOBTRACKER_OPTS}"
  15. HADOOP_TASKTRACKER_OPTS="-Dsecurity.audit.logger=ERROR,console -Dmapred.audit.logger=ERROR,console ${HADOOP_TASKTRACKER_OPTS}"
  16. HADOOP_DATANODE_OPTS="-Dsecurity.audit.logger=ERROR,DRFAS ${HADOOP_DATANODE_OPTS}"
  17. export HADOOP_SECONDARYNAMENODE_OPTS="-Dsecurity.audit.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_SECONDARYNAMENODE_OPTS}"
  18. # The following applies to multiple commands (fs, dfs, fsck, distcp etc)
  19. export HADOOP_CLIENT_OPTS="-Xmx128m ${HADOOP_CLIENT_OPTS}"
  20. #HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData ${HADOOP_JAVA_PLATFORM_OPTS}"
  21. # On secure datanodes, user to run the datanode as after dropping privileges
  22. export HADOOP_SECURE_DN_USER=hdfs
  23. # Where log files are stored. $HADOOP_HOME/logs by default.
  24. export HADOOP_LOG_DIR=/var/local/hadoop/logs
  25. # Where log files are stored in the secure data environment.
  26. export HADOOP_SECURE_DN_LOG_DIR=$HADOOP_LOG_DIR
  27. # The directory where pid files are stored. /tmp by default.
  28. export HADOOP_PID_DIR=/var/local/hadoop/pid
  29. export HADOOP_SECURE_DN_PID_DIR=$HADOOP_PID_DIR
  30. # A string representing this instance of hadoop. $USER by default.
  31. export HADOOP_IDENT_STRING=$USER
  32. export JAVA_HOME=/usr/java/latest
5)修改/etc/hadoop/conf/slave文件;
添加slave:
slave1
slave2
slave3
slave4
6)scp文件到各个slave;
scp /etc/hadoop/conf root@slave1:/etc/hadoop/conf

四、启动

1)NN(master)启动
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
2)DN(slave1)启动(装有RM)
/etc/init.d/hadoop-datanode
/etc/init.d/hadoop-yarn-nodemanager
/etc/init.d/hadoop-yarn-resourcemanager
3)DN(slave2/slave3/slave4)启动
/etc/init.d/hadoop-datanode
/etc/init.d/hadoop-yarn-nodemanager

五、查看

http://192.168.32.70:50070    
http://192.168.32.71:8088/cluster   (类似于hadoop1.0的Jobtracker地址,即50030端口

六、安装中出现的问题以及解决办法

启动NN时报:log4j:ERROR Could not find value for key log4j.appender.DRFAAUDIT错误;
解决办法:在/etc/hadoop/conf/log4j.properties 加入以下配置
log4j.appender.DRFAAUDIT=org.apache.log4j.ConsoleAppender
log4j.appender.DRFAAUDIT.layout=org.apache.log4j.PatternLayout

、总结

hadoop 1.0版本和当前装的cdh5版本,从安装方面差别还是挺大的,不过还好不算麻烦,一步一步来,遇到问题多问就OK;