Loading...
Hadoop学习之配置单一节点(Setting up a Single Node Cluster)

Hadoop笔记 2015/12/18 Hadoop , CentOS

Hadoop学习之配置单一节点(Setting up a Single Node Cluster)

2014年就开始学习Hadoop,但也只是阅读资料,安装Hadoop实例以及集群,做下测试,并不系统;现在决定开始系统的将学习的过程记录下来,写成文档的形式;另外,我整理学习笔记不喜欢长篇大论的照抄别家的教程,我只写核心的内容、自己测试的过程和理解;借助的学习资料引用,放在文章结尾的“参考文档”处。


百度百科,输入"Hadoop"关键字检索,由"Hadoop"词条开始入门学习Hadoop,Hadoop的研究学习由此处开始。


#准备工作

hadoop同时支持linux和windows平台,这里在linux平台安装学习,准备一台Centos 6.7 Linux虚拟主机。

安装JDK1.7并且配置好环境变量

并且安装好ssh并且启动sshd


#安装ssh和rsync

[root@centos1 ~]# yum install ssh

[root@centos1 ~]# yum install rsync


#安装配置

下载hadoop-2.7.1.tar.gz,http://hadoop.apache.org/releases.html,解压至/data/hadoop-2.7.1


查看JAVA_HOME 

[root@centos1 ~]# echo $JAVA_HOME

/usr/java/jdk1.7.0_79


[root@centos1 ~]# cd /data/hadoop-2.7.1/etc/hadoop

[root@centos1 ~]# vi hadoop-env.sh 

# The java implementation to use.

#export JAVA_HOME=${JAVA_HOME}

export JAVA_HOME=/usr/java/jdk1.7.0_79


#查看hadoop版本

[root@centos1 ~]# cd /data/hadoop-2.7.1/bin

[root@centos1 bin]# ./hadoop version

Hadoop 2.7.1

Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a

Compiled by jenkins on 2015-06-29T06:04Z

Compiled with protoc 2.5.0

From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a

This command was run using /data/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar


#Standalone Operation  独立运行操作

[root@centos1 hadoop-2.7.1]# mkdir input

[root@centos1 hadoop-2.7.1]# cp etc/hadoop/*.xml input

[root@centos1 hadoop-2.7.1]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+'

[root@centos1 hadoop-2.7.1]# cat output/*


#Pseudo-Distributed Operation 伪分布式操作

#配置核心site,指定master节点

[root@centos1 hadoop-2.7.1]# vi etc/hadoop/core-site.xml 

<configuration>

    <property>

        <name>fs.defaultFS</name>

        <value>hdfs://centos1:9000</value>

    </property>

</configuration>


#编辑hdfs-site.xml文件,设置从节点的数量为1

[root@centos1 hadoop-2.7.1]# vi etc/hadoop/hdfs-site.xml

<configuration>

    <property>

        <name>dfs.replication</name>

        <value>1</value>

    </property>

</configuration>


#ssh无密码登录localhost

[root@centos1 hadoop-2.7.1]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

[root@centos1 hadoop-2.7.1]# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

[root@centos1 hadoop-2.7.1]# export HADOOP\_PREFIX=/data/hadoop-2.7.1

[root@centos1 hadoop-2.7.1]# ssh localhost


#开始运行

1、Format the filesystem:格式化文件系统

[root@centos1 hadoop-2.7.1]# bin/hdfs namenode -format


2、Start NameNode daemon and DataNode daemon:启动NameNode和DataNode 

[root@centos1 hadoop-2.7.1]# sbin/start-dfs.sh

[root@centos1 hadoop-2.7.1]# jps

53665 NameNode

54961 Jps

53783 DataNode


3、Browse the web interface for the NameNode; by default it is available at:通过WEB浏览NameNode信息

NameNode - http://localhost:50070/


4、Make the HDFS directories required to execute MapReduce jobs:创建执行MapReduce 需要的HAFS目录

[root@centos1 hadoop-2.7.1]# bin/hdfs dfs -mkdir /user

[root@centos1 hadoop-2.7.1]# bin/hdfs dfs -mkdir /user/root

[root@centos1 hadoop-2.7.1]# bin/hdfs dfs -mkdir /user/root/input


5、Copy the input files into the distributed filesystem:将etc/hadoop下文件copy到input目录

[root@centos1 hadoop-2.7.1]# bin/hdfs dfs -put etc/hadoop input


6、Run some of the examples provided:执行example

[root@centos1 hadoop-2.7.1]#bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep input output 'dfs[a-z.]+'


7、Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:

[root@centos1 hadoop-2.7.1]# bin/hdfs dfs -get output output

[root@centos1 hadoop-2.7.1]# cat output/*


8、When you’re done, stop the daemons with:执行完成后,关闭节点

[root@centos1 hadoop-2.7.1]#sbin/stop-dfs.sh


#YARN on a Single Node 独立节点中的YARN mapreduce框架

1、Configure parameters as follows:配置参数

[root@centos1 hadoop-2.7.1]# vi etc/hadoop/mapred-site.xml

<configuration>

    <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

    </property>

</configuration>

[root@centos1 hadoop-2.7.1]# vi etc/hadoop/yarn-site.xml

<configuration>

    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

</configuration>


2、Start ResourceManager daemon and NodeManager daemon:启动守护进程

[root@centos1 hadoop-2.7.1]# sbin/start-yarn.sh

[root@centos1 hadoop-2.7.1]# jps

56479 Jps

56386 NodeManager

56170 ResourceManager


3、Browse the web interface for the ResourceManager; 通过WEB浏览资源管理信息

ResourceManager - http://localhost:8088/


4、Run a MapReduce job.该步骤在伪分布式操作中已经执行过了


5、When you’re done, stop the daemons with:关闭

[root@centos1 hadoop-2.7.1]# $ sbin/stop-yarn.sh


初步学习和测试Hadoop分布式文件系统到此处结束


#参考文档

http://hadoop.apache.org/

http://blog.fens.me/hadoop-family-roadmap/




Comments