dsy851009 阅读(106) 评论(0)
一、环境准备阶段:
假设你需要配置集群的设备有5台:分别为master,slave1,slave2,slave3,slave4
1、每台机器都创建一个账户hadoop;
2、修改每台机器的主机名:/etc/sysconfig/network  
如master的机器:
NETWORKING=yes
HOSTNAME=master(这个名字可以随便起,方便记忆)
slave1:
NETWORKING=yes
HOSTNAME=slave1
slave2:
.....以此类推;
修改完文件后,最后记得在相应的机器上执行hostname master(你修改后的名字) ,hostname slave1等;
3、修改每台机器的/etc/hosts,保证每台机器间都可以通过机器名解析,注意master和slave每台机器都要修改,保证所有机器的hosts文件内容一样;
如:
192.168.30.60  master
192.168.30.61  slave1
192.168.30.62  slave2
192.168.30.63  slave3
192.168.30.65  slave4

4、实现无密码登陆ssh
由于hadoop需要通过ssh服务在各个节点之间登陆并运行服务,因此必须确保安装hadoop的各个节点之间的网络畅通;
确保机器上安装了ssh
(1) 用hadoop用户登陆master机器:
(2)执行:ssh-keygen -t rsa 一路回车(记得不要输入任何字符),将在/home/hadoop/.ssh下生成密钥id_rsa和公钥id_rsa.pub
id_rsa.pub的可能内容:
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA3XYLxqxNfltkbKuCpJJDTuQekVJ0L3XA6dLoLQpPLbZxJNQ7DsogcMYM9opg+R1baTMvm1Cbj/cfIwELHPSRLFjN7E6x9S7PWnS2tObXosBNZ/eo6+eZiAF0h0LL+1Rsfsne2cP3amhdztbudSzm1ezLRPBLNUh0FKwDjbgnK2ZZy49h6vCvOZRKJPQf+B3xTSTbix/omalecCdYc1bCFvifOy1pgWVchKSQsynN0V901dA7CAfIjsAKc4DfyGcdoFNFp+POz6+q4AiYUmO+QTh7wPRa2vTg6FRlaaqvTUfnep6prFSVPe/Jh6dt6yyH0k7sIPDIl/kca6cZX0YgNw== hadoop@master
(3)把公钥id_rsa.pub内容拷贝到authorized_keys 
cat /home/hadoop/.ssh/id_rsa.pub >>/home/hadoop/.ssh/authorized_keys
(4)把authorized_keys复制到其他的slave机器上:scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.30.61:/home/hadoop/.ssh/、scp /home/hadoop/.ssh/authorized_keys hadoop@192.168.30.62:/home/hadoop/.ssh/ ......等,先确定slave机器上都有.ssh目录,如果没有手动创建一个;
(5)设置目录权限(所有机器)
chmod 750 hadoop
chmod 750 .ssh
chmod 644 authorized_keys 
(6)验证ssh是否成功
在master机器上执行ssh slave1 
如果不需要输入密码即可

5、安装JDK
这里和普通的安装JDK步骤一样;
首先下载最近的JDK,安装程序,修改环境变量等等;

二、安装hadoop
1、获取cdh3 yum 源并安装Hadoop
(1)wget -c http://archive.cloudera.com/redhat/cdh/cdh3-repository-1.0-1.noarch.rpm
(2)yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm   //安装后将得到 cloudera-cdh3.repo 文件
(3)rpm --import http://archive.cloudera.com/redhat/cdh/RPM-GPG-KEY-cloudera  //导入 rpm key
(4)yum install hadoop-0.20
(5)yum install hadoop-0.20-namenode    (安装到要作为namenode的机器,在/etc/hadoop/conf/core-site.xml中配置,后面会讲到
        yum install hadoop-0.20-datanode     (安装到所有的slave机器,也可以安装到namenode机器,把namenode也作为一台datanode)
        yum install hadoop-0.20-jobtracker     (安装到作为jobtracker机器,jobtrancker机器配置是在/etc/hadoop/conf/hdfs-site.xml 里面配置)
        yum install hadoop-0.20-tasktracker
        不同的角色安装不同服务;安装datanode的机器需要安装tasktracker,namenode机器也可以用来作为datanode
2、修改配置文档 (hdfs 方面)
//slaves 配置文件 namenode 上配置即可 
cat /etc/hadoop/conf/slaves  
192.168.30.61
192.168.30.62
192.168.30.63
192.168.30.64
cat /etc/hadoop/conf/masters 
192.168.30.60
3、修改/etc/hadoop/conf/hdfs-site.xml 配置文件
<?xml version="1.0"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 


<!-- Put site-specific property overrides in this file. --> 


<configuration> 
 <property> 
   <name>dfs.replication</name> 
   <value>1</value> 
 </property> 
 <property> 
    <name>dfs.permissions</name> 
    <value>false</value> 
 </property> 
 <!-- Immediately exit safemode as soon as one DataNode checks in. 
      On a multi-node cluster, these configurations must be removed.  --> 
 <property> 
   <name>dfs.safemode.extension</name> 
   <value>0</value> 
 </property> 
 <property> 
    <name>dfs.safemode.min.datanodes</name> 
    <value>1</value> 
 </property> 
<!-- 
 <property> 
     specify this so that running 'hadoop namenode -format' formats the right dir 
    <name>dfs.name.dir</name> 
    <value>/var/lib/hadoop-0.20/cache/hadoop/dfs/name</value> 
 </property> 


--> 


<!-- add by dongnan --> 


<property> 
<name>dfs.data.dir</name> 
<value>/data/dfs/data</value> 
</property> 


<property> 
<name>hadoop.tmp.dir</name> 
<value>/data/dfs/tmp</value> 
</property> 


<property> 
<name>dfs.datanode.max.xcievers</name> 
<value>200000</value> 
</property> 




</configuration> 
4、修改/etc/hadoop/conf/core-site.xml 配置文件
<?xml version="1.0"?> 
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 


<!-- Put site-specific property overrides in this file. --> 



<configuration> 


<property> 
<name>fs.default.name</name> 
<value>hdfs://namenode:8020</value> 
</property> 


</configuration>
5、修改/etc/hadoop/conf/mapred-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->


<configuration>


<property>
    <name>mapred.job.tracker</name>
        <value>192.168.30.61:9001</value>
          </property>


  <property>
      <name>mapred.child.java.opts</name>
          <value>-Xmx1024m -XX:+UseConcMarkSweepGC</value>
            </property>


  <property>
      <name>mapred.tasktracker.map.tasks.maximum</name>
          <value>1</value>
            </property>


  <property>
      <name>mapred.tasktracker.reduce.tasks.maximum</name>
          <value>1</value>
            </property>


  <property>
      <name>mapred.local.dir</name>
          <value>/data1/hdfs/</value>
              <description>The local directory where MapReduce stores intermediate
                  data files.  May be a comma-separated list of
                      directories on different devices in order to spread disk i/o.
                          Directories that do not exist are ignored.
                              </description>
                                </property>


    <property>
            <name>mapreduce.jobtracker.staging.root.dir</name>
                    <value>/user</value>
                        </property>


    <property>
            <name>mapred.system.dir</name>
                    <value>/mapred/system</value>
                        </property>


  <property>
      <name>io.sort.mb</name>
          <value>256</value>
              <description>The total amount of buffer memory to use while sorting 
                    files, in megabytes.  By default, gives each merge stream 1MB, which
                          should minimize seeks.</description>
                            </property>


  <property>
      <name>io.sort.factor</name>
          <value>64</value>
            </property>


  <property>
      <name>mapred.max.map.failures.percent</name>
          <value>10</value>
            </property>


  <property>
      <name>mapred.job.reuse.jvm.num.tasks</name>
          <value>1</value>
              <description>jvm reuse tasks count. default is 1. If it is -1, there is no limit</description>
                </property>


  <property>
      <name>mapred.reduce.parallel.copies</name>
          <value>64</value>
            </property>


<!--
       <property>
           <name>job.end.notification.url</name>
               <value>http://182.61.128.18:50030/test_url.jsp?jobid=$jobId&amp;jobStatus=$jobStatus</value>
                   <description>jvm reuse tasks count. default is 1. If it is -1, there is no limit</description>
                     </property>


-->
<!--
<property>   
     <name>mapred.jobtracker.taskScheduler</name>    
          <value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>
          </property>


<property>   
     <name>mapred.queue.names</name>    
          <value>default,ca</value>
          </property>
-->


</configuration>

6、启动hadoop 相应进程
root@namenode ~]# /etc/init.d/hadoop-0.20-namenode start  (1台namenode)
[root@slave1 /]# /etc/init.d/hadoop-0.20-datanode start          (4台datanode)
[root@slave2 /]# /etc/init.d/hadoop-0.20-datanode start          
[root@slave1 /]# /etc/init.d/hadoop-0.20-tasktracker start         (4台tasktracker 跟datanode相应)
[root@slave1 /]# /etc/init.d/hadoop-0.20-jobtracker start  
         (1台jobtracker)
在相应的机器上启动相应的服务;
7、OK安装完毕
http://192.168.30.60:50070/                  (namenode) 
http://192.168.30.61:50030/jobtracker.jsp   (jobtracker)