0xTrustTryEP

Just do it, deeply...

Follow me on GitHub

Hadoop2.7.1 HA环境搭建(hdfs)

write by donaldhan, 2019-01-13 17:02

引言

说来惭愧,在2015年7月的时候,就已经接触大数据,然而当时迫于自身的基础(夯实基础),暂时放下大数据。然而一晃就是三年,当初的选择是否正确,暂且不论。基础已经差不到了, 所以就重新拾取大数据。在当前形势,很多选择努力奋斗,希望可以通过努力实现财富自由。然而对于大多数人来说,努力不一定能够实现财务自由,最起码可以解决温饱,最近看了一片文章你的思考方式决定你的层次, 讲的很有道理。层次从高到低如下:

  • 精神

领袖、文人:活着为改变世界

  • 身份

觉醒者:因为我是XX,所以我会XX

  • BVR(信念、价值观、规条)

战略家:什么才是最重要的?

  • 能力

战术家:方法总比问题多

  • 行为

行动派:我还不够努力

  • 环境

怨妇:都是你们的错

一般人从环境到精神,神人,从精神到环境。当前本人还处于行动派,下一阶段为能力,及战术家,因为我们还是一般人,我希望想神人一样,可是还有没有定位自己。在没有定位好之前, 还是先从行动做起吧。 本文使用的Hadoop版本为2.7.1,具体文档可以参考hadoop-doc。想了解hdfs,可以参Hadoop基础知识,这个只是1.0版本的,2.0的可以去官网查看。

目录

环境准备

系统配置

  • 电脑1(Lenovo),win7 64位系统,8G内存,此电脑虚拟机上运行nameNode系统。
  • 电脑1(Lenovo ),win7 64位系统,8G内存,此电脑虚拟机上运行standyname系统
  • 电脑1(Lenovo),win7 64位系统,8G内存,此电脑虚拟机上运行resourceManager系统
  • 虚拟机:Vmware12.0
  • Hadoop2.7.1
  • Zookeeper3.4.12 需要注意的是,内存和处理器配置要足够,否则,跑MR, 等死你。

集群规划

JournalServer 应该单纯一台,slaves文件中为JournalNode(存储name的元数据) journalServer and journalNode 中配置zookeeper,name和standy name

主机名	IP	安装软件	运行的进程
nameNode	192.168.5.131	Jdk,hadoop,zookeeper  namenode、DFSZKFailoverController、datanode、jobhistorysever、NodeManager、JournalNode、QuorumPeerMain
secondlyNameNode	192.168.5.132	Jdk,hadoop,zookeeper	Namenode、DFSZKFailoverController,datanode、NodeManager、JournalNode、QuorumPeerMain
resourceManager	192.168.5.133	Jdk,hadoop
zookeeper	datanode、NodeManager、JournalNode、QuorumPeerMain,ResourceManager

说明: 在hadoop2.0中通常由两个namenode组成,一个处于active状态,另一个处于standby状态。Active name对外提供服务,而Standby name则不对外提供服务,仅同步activename的状态,以便能够在它失败时快速进行切换。hadoop2.0官方提供了两种HDFS HA的解决方案,一种是NFS,另一种是QJM。这里我们使用简单的QJM。在该方案中,主备name之间通过一组JournalNode同步元数据信息,一条数据只要成功写入多数JournalNode即认为写入成功。通常配置奇数个JournalNode,这里还配置了一个zookeeper集群,用于ZKFC(DFSZKFailoverController)故障转移,当Active name挂掉了,会自动切换Standby name为standby状态。

1.配置3台机器的主机名: 在nameNode,secondlyNameNode,resourceManager命令行vim /etc/hostname中分别设置nameNode,secondlyNameNode,resourceManager的主机名,如下图所示:

donaldhan@nameNode:~$ cat /etc/hostname
nameNode

2.在nameNode,secondlyNameNode,resourceManager命令行vim /etc/hosts 中设置nameNode,secondlyNameNode,resourceManager主机名和ip地址的对应关系,如下图所示:

donaldhan@nameNode:~$ cat /etc/hosts
127.0.0.1	localhost
192.168.5.131  nameNode
192.168.5.132 secondlyNameNode
192.168.5.133 resourceManager
192.168.5.131 ns
# The following lines are desirable for IPv6 capable hosts
#::1     ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters
donaldhan@nameNode:~$

需要注意的是,hosts配置里面有一行为:

192.168.5.131 ns

这是因为我们的HDFS HA族为ns,高可用的HDFS,我在namenode无法使用hdfs命令访问hdfs,所以添加加了这么一行,在secondlyNameNode,resourceManager也老添加这么一行, 因为secondlyNameNode,resourceManager同时为datanode。不然执行MR的时候回报一下异常:

-mkdir: java.net.UnknownHostException: ns
  1. 验证各系统之间是否能够ping通。
  2. 安装SSH 并产生公私钥在name上:(可以copy ~/.ssh 到 sname和amrm,统一公私钥)
                      ssh-keygen  -t  dsa  -P  ''  -f  ~/.ssh/id_dsa
                      cat  ~/.ssh/id_dsa.pub  >>  ~/.ssh/authorized_keys
    

    拷贝公钥到secondlyNameNode,resourceManager做同样的动作(最好统一公私钥)

                  scp  -r  /donaldhan/.ssh   donaldhan@secondlyNameNode:/root/
                  scp  -r  /donaldhan/.ssh/id_dsa.pub  donaldhan@secondlyNameNode:/donaldhan/.ssh/id_dsa.pub
                  scp  -r  /donaldhan/.ssh/id_dsa.pub  donaldhan@resourceManager:/donaldhan/.ssh/id_dsa.pub
    

    检查 ssh secondlyNameNode,resourceManager 保证互相访问不需要密码 ,如果slaves文件中包括自己那么还要执行

      ssh nameNode
    

Scp 命令:

// scp from source to destination(local)
scp donaldhan@nameNode:/donaldhan/.ssh/id_dsa.pub  ~/.ssh/data_dsa.pub
// scp from source(local) to destination
 scp  -r  /donaldhan/.ssh/id_dsa.pub  donaldhan@resourceManager:/donaldhan/.ssh/id_dsa.pub

注:scp命令在ssh通的情况下使用 相关错误:

  • Please contact your system administrator. Add correct host key in /root/.ssh/known_hosts to get rid of this message. Offending ECDSA key in /root/.ssh/known_hosts:2 remove with: ssh-keygen -f “/root/.ssh/known_hosts” -R sname

解决方式: 执行ssh-keygen -f “/root/.ssh/known_hosts” -R sname 或删除/root/.ssh/known_hosts的第2行。

  • Warning: the ECDSA host key for ‘sname’ differs from the key for the IP address ‘192.168.32.138’ Offending key for IP in /root/.ssh/known_hosts:2 Matching host key in /root/.ssh/known_hosts:5 Are you sure you want to continue connecting (yes/no)? yes Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64) Documentation: https://help.ubuntu.com/ 82 packages can be updated. 42 updates are security updates. Last login: Fri Dec 11 22:30:00 2015 from 192.168.32.138

解决方式:删除/root/.ssh/known_hosts的第2行。

  • Your id_dsa is 755 cann’t used

解决方式:chmod 700 ~/.ssh/id_dsa(私钥文件权限)

5.关ip6 -1.cat /proc/sys/net/ipv6/conf/all/disable_ipv6 显示0说明ipv6开启,1说明关闭 

-2在 /etc/sysctl.conf 增加下面几行,并重启。

#disable IPv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

-3. sudo vim /etc/default/grub

-4. 将文件中的  GRUB_CMDLINE_LINUX_DEFAULT=”quiet spalsh”  修改为  GRUB_CMDLINE_LINUX_DEFAULT=”ipv6.disable=1 quiet splash”

-5. wq保存后,运行sudo update-grub更新

-6. 重启网络服务,禁用ipv6成功    可以使用

               ip a | grep inet6

  查看关闭情况,若没有结果则说明禁用IPv6成功

更多参见:linux关闭IP6

安装配置zookeeper集群

  1. 解压zookeeper压缩包到/hadoop
     tar –zxvf zookeeper-3.4.12.tar.gz   /bdp/zookeeper
     mv /bdp/zookeeper/zookeeper-3.4.12  /hadoop/zookeeper-3.4.12
    
  2. 在/hadoop/zookeeper-3.4.12/conf修改zookeeper配置zoo.cfg,具体配置如下图所示:
donaldhan@nameNode:/bdp/zookeeper/zookeeper-3.4.12/conf$ cat zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/bdp/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature

  1. 在/bdp/zookeeper/zookeeper-3.4.12中设置创建tmp目录 Mkdir /bdp/zookeeper/zookeeper-3.4.12/tmp
  2. 在/bdp/zookeeper/zookeeper-3.4.12/tmp目录中创建空文件myid,并写入4 vim /bdp/zookeeper/zookeeper-3.4.12/tmp/myid。
  3. 将配置好的zookeeper拷贝到sname和amrm scp -r /bdp/zookeeper/zookeeper-3.4.12 doanldhan@secondlyNameNode:/bdp/zookeeper/zookeeper-3.4.12 scp -r /bdp/zookeeper/zookeeper-3.4.12 doanldhan@resourceManager:/bdp/zookeeper/zookeeper-3.4.12
  4. 在secondlyNameNode和resourceManager中分别修改myid为2和3。

安装配置hadoop集群

  1. 解压hadoop压缩包到/bdp/hadoop
    tar -zxvf hadoop-2.7.1.tar.gz  /bdp/hadoop
    
  2. 在~/.bashrc中配置hadoop的环境变量信息,如下图所示:
    # private path
    export JAVA_HOME=/usr/share/jdk1.8.0_191
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export HADOOP_HOME=/bdp/hadoop/hadoop-2.7.1
    export ZOOKEEPER_HOME=/bdp/zookeeper/zookeeper-3.4.12
    export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZOOKEEPER_HOME}/bin:${PATH}
    export HADOOP_MAPRED_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_HOME}
    export HADOOP_HDFS_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP}/lib/native
    export YARN_HOME=${HADOOP_HOME}
    export HADOOP_OPT="-Djava.library.path=${HADOOP_HOME}/lib/native"
    

    hadoop2.7.1的所有配置文件从存在/hadoop/hadoop-2.7.1/etc/hadoop之中(cd /bdp/hadoop/hadoop-2.7.1/etc/hadoop)

  3. 修改hadoop-env.sh 加入jdk家目录
    export JAVA_HOME=/usr/share/jdk1.8.0_191
    
  4. 修改core-site.xml
    ```
    donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ cat core-site.xml <?xml version=”1.0” encoding=”UTF-8”?>
fs.defaultFS hdfs://ns hadoop.tmp.dir /bdp/hadoop/tmp hadoop.proxyuser.hadoop.hosts * hadoop.proxyuser.hadoop.groups * ha.zookeeper.quorum nameNode:2181,secondlyNameNode:2181,resourceManager:2181

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$

5. 修改hdfs-site.xml   

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ cat hdfs-site.xml <?xml version=”1.0” encoding=”UTF-8”?> <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

dfs.nameservices ns dfs.ha.namenodes.ns nm,snm dfs.namenode.rpc-address.ns.nm nameNode:8020 dfs.namenode.http-address.ns.nm nameNode:50070 dfs.namenode.rpc-address.ns.snm secondlyNameNode:8020 dfs.namenode.http-address.ns.snm secondlyNameNode:50070 dfs.namenode.name.dir /bdp/hadoop/dfs/name dfs.datanode.data.dir /bdp/hadoop/dfs/data dfs.replication 3 dfs.webhdfs.enabled true dfs.namenode.shared.edits.dir qjournal://nameNode:8485;secondlyNameNode:8485;resourceManager:8485/ns dfs.journalnode.edits.dir /bdp/hadoop/journal dfs.ha.automatic-failover.enabled true dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence shell(/bin/true) dfs.ha.fencing.ssh.private-key-files /home/donaldhan/.ssh/id_dsa dfs.ha.fencing.ssh.connect-timeout 30000

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$

6. 修改mapred-site.xml   

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ cat mapred-site.xml <?xml version=”1.0”?> <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>

mapreduce.framework.name yarn mapreduce.jobhistory.address nameNode:10020 mapreduce.jobhistory.webapp.address nameNode:19888 mapreduce.jobhistory.intermediate-done-dir /history/indone mapreduce.jobhistory.done-dir /history/done

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$

7. 修改yarn-site.xml   

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ cat yarn-site.xml <?xml version=”1.0”?>

yarn.resourcemanager.hostname resourceManager yarn.resourcemanager.address ${yarn.resourcemanager.hostname}:8032 yarn.resourcemanager.scheduler.address ${yarn.resourcemanager.hostname}:8030 yarn.resourcemanager.resource-tracker.address ${yarn.resourcemanager.hostname}:8031 yarn.resourcemanager.admin.address ${yarn.resourcemanager.hostname}:8033 yarn.resourcemanager.webapp.address ${yarn.resourcemanager.hostname}:8088 yarn.nodemanager.aux-services mapreduce_shuffle
8. 修改slaves  
slaves是指定子节点的位置,因为要在nameNode上启动HDFS、在resourceManager启动yarn,所以nameNode上的slaves文件指定的是datanode的位置,resourceManager上的slaves文件指定的是nodemanager的位置   

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ cat slaves nameNode secondlyNameNode resourceManager

9. 将配置好的hadoop拷贝到secondlyNameNode和resourceManager

scp -r /bdp/hadoop/hadoop-2.7.1/etc donaldhan@secondlyNameNode:/bdp/hadoop/hadoop-2.7.1 scp -r /bdp/hadoop/hadoop-2.7.1/etc donaldhan@resourceManager:/bdp/hadoop/hadoop-2.7.1

注意:*以下操作必须严格按照顺序*    
10. 启动zookeeper集群,(在nameNode,secondlyNameNode,resourceManager开启zk)
 按顺序启动nameNode,secondlyNameNode,resourceManager   

./zkServer.sh start(启动zookeeper节点) ./zkServer.sh status(查看zookeeper状态)


12. 启动journalnode(在nameNode,secondlyNameNode,resourceManager)
//在name中启动即可 非hadoop-daemon.sh   

hadoop-daemons.sh start journalnode

jps(依次在每个节点中查看各节点是否多了Journalnode进程)
13. 格式化HDFS,在name上执行格式化命令    

hdfs namenode -format ns

格式化后会再讲core-site.xml中的      
     <property>
dfs.namenode.name.dir /bdp/hadoop/dfs/name

</property>

dfs.datanode.data.dir /bdp/hadoop/dfs/data

</property>

配置生成个目录文件,拷贝到secondlyNameNode和resourceManager的/bdp/hadoop/hadoop-2.7.1下。    

scp -r /bdp/hadoop/hadoop-2.7.1/etc donaldhan@secondlyNameNode:/bdp/hadoop/hadoop-2.7.1 scp -r /bdp/hadoop/hadoop-2.7.1/etc donaldhan@resourceManager:/bdp/hadoop/hadoop-2.7.1

注:*格式化生成的目录不要轻易删除,否者启动回报不一致异常*   
14. 格式化ZK,在name上执行格式化命令   

hdfs zkfc -formatZK

15. 启动HDFS,在nameNode的执行start-dfs.sh命令   

start-dfs.sh

16. 启动 historyserver    

hdfs dfs -mkdir /histroy hdfs dfs -mkdir /histroy/indone hdfs dfs -mkdir /histroy/done mr-jobhistory-daemon.sh start historyserver

17. 同步namenode的元数据信息到standlyNameNode   
在secondlynamenode上,先同步namenode的元数据信息,执行如下命令即可:    

donaldhan@sname:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ hdfs namenode -bootstrapStandby

18. 启动standlyNameNode    

donaldhan@sname:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ hadoop-daemon.sh start namenode

19. 启动YARN   
在resourceManager上执行start-yarn.sh命令  

start-yarn.sh

在resourceManager上执行start-yarn.sh,把nameNode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动.
致此,HA集群启动完毕。   
20. 到此,hadoop2.7.0的配置完毕,可以通过浏览器访问来查看部署是否成功   
namenode:http://namenode:50070  
进程

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1$ jps 2691 ZooKeeperMain 10851 JobHistoryServer 2404 QuorumPeerMain 10742 DFSZKFailoverController 10999 NodeManager 10281 DataNode 10141 NameNode 10525 JournalNode 14190 Jps


secondlynamenode:http://secondlynamenode:50070
进程

donaldhan@secondlyNamenode:/bdp/zookeeper/zookeeper-3.4.12/bin$ jps 7249 NameNode 7459 JournalNode 2420 QuorumPeerMain 9828 Jps 7349 DataNode 7595 DFSZKFailoverController 7757 NodeManager

resourcemanager:http://resourcemanager:8088

进程

donaldhan@resourceManager:/bdp/hadoop/dfs$ jps 8208 NodeManager 8066 ResourceManager 2227 QuorumPeerMain 10313 Jps 7950 JournalNode 7823 DataNode donaldhan@resourceManager:/bdp/hadoop/dfs$

jobhistroysever:http://namenode:19888


## 执行job

1. hdfs  dfs  -mkdir /user
2. hdfs  dfs  -mkdir /donaldhan
3. hdfs  dfs  -mkdir /input
4. hdfs  dfs  -put  etc/hadoop/\*.xml  /input
5. hadoop jar  share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep /input  /output 'dfs[a-z.]+'
6. hdfs  dfs  -get /output   output  //当前目录
7. cat   output/* 查看结果

关闭hadoop
在resourceManager中

stop-yarn.sh

在namenode中

mr-jobhistory-daemon.sh stop historyserver stop-dfs.sh hadoop dfsadmin -safemode leave

注:以上过程有什么问题,可以查看相关日志文件

## 总结

我们从准备环境,配置zk,配置hadoop,并格式化HDFS,最后,跑了一个简单的JOB。如果在这个过程中,出现什么问题的话,首先检查权限,再检查网络,最后配置。这边文章我们的namenode是高可用的,但arn不是高可用的,RM(ResourceMananger) HA可以访问如下地址:
http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html。

## 附录
相关异常处理策略:
1. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby  
原因:  
因为掉电,导致hadoop 的HA 出现 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby 此问题,从web 页面查看,是两个节点都变成了standy,所以要切换namenode  
解决方式:

hdfs haadmin -transitionToActive –forcemanual nm


2. org.apache.hadoop.ipc.Client: Retrying connect to server: amrm/192.168.32.132:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
telnet: Unable to connect to remote host: Connection refused  Ubuntu 15.10    
原因:网络不通   
解决方式:  
查看能否ping通,查看端口是否开放,如果能ping通,同时端口开放,用如下命令查看系统端口监听
 netstat -ntulp
确保local Address的地址为0.0.0.0 或192.168.32.137。
解决办法 修改/etc/hosts 地址映射
3. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby

原因:name 处于standby状态

4. org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://sname:9000/user/donaldhan/grep-temp-1382738569  
这个是由于map的产生的文件放在分布式文件系统/user/${username}中新建

hdfs dfs -mkdir /user hdfs dfs -mkdir /user/${username}

5. java.io.IOException: Unknown Job job_1450012188054_0001 at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:218)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getCounters(HistoryClientService.java:232) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:159) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:281)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

解决办法:

hdfs dfs -chmod -R 777 /history ```

  1. donaldhan@nameNode:/bdp/hadoop$ hdfs zkfc -formatZK Exception in thread “main” org.apache.hadoop.HadoopIllegalArgumentException: HA is not enabled for this namenode. at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.create(DFSZKFailoverController.java:121) at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:177)

高可用没有开启,高可用检查配置hdfs-site.xml

  1. mkdir: Couldn’t create proxy provider null

DFSZKFailoverController没有起来,hdfs格式化有问题,检查配置hdfs-site.xml 8. 19/01/06 22:44:05 ERROR nameNode.FSNamesystem: FSNamesystem initialization failed. java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:762) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:697) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:984) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.nameNode.NameNode.main(NameNode.java:1554) 19/01/06 22:44:05 INFO namenode.FSNamesystem: Stopping services started for active state 19/01/06 22:44:05 INFO namenode.FSNamesystem: Stopping services started for standby state 19/01/06 22:44:05 WARN namenode.NameNode: Encountered exception during format: java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:762) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:697) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:984) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) 19/01/06 22:44:05 ERROR namenode.NameNode: Failed to start nameNode. java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:762) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:697) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:984) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554)

出现原因:hadoop-2.5.0/etc/hadoop/slaves里的内容错误:缺少nameNode对应的主机名(前提是你在/etc/hosts里配置过,如果你没配置过,那么你可以在此文件中填写nameNode对应的服务器的IP)。

解决办法:在/etc/hadoop/slaves里添加上nameNode对应的主机名。如果还是不行,检查配置参数

9. 19/01/06 23:45:44 WARN namenode.NameNode: Encountered exception during format: org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown: 192.168.5.132:8485: Call From nameNode/192.168.5.131 to secondlyNameNode:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:900) at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:987) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) 19/01/06 23:45:44 ERROR namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown: 192.168.5.132:8485: Call From nameNode/192.168.5.131 to secondlyNameNode:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:900) at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:987) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) 19/01/06 23:45:44 INFO util.ExitUtil: Exiting with status 1 19/01/06 23:45:44 INFO namenode.NameNode: SHUTDOWN_MSG: /********************

解决方式:确保secondlyNameNode的hadoop文件及配置完整。

10. 19/01/07 00:02:06 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:06 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:06 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:07 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:07 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:07 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:08 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:08 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:08 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:09 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:09 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:09 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:10 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:10 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:10 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:11 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:11 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:11 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:12 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:12 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:12 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:13 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:13 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:13 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:14 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:14 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:14 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:15 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:15 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:15 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:15 WARN namenode.NameNode: Encountered exception during format: org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown: 192.168.5.132:8485: Call From nameNode/192.168.5.131 to secondlyNameNode:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:900) at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:987) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) 19/01/07 00:02:15 ERROR namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown: 192.168.5.132:8485: Call From nameNode/192.168.5.131 to secondlyNameNode:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:900) at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:987) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) 19/01/07 00:02:15 INFO util.ExitUtil: Exiting with status 1 19/01/07 00:02:15 INFO namenode.NameNode: SHUTDOWN_MSG: /********************

原因,没有启动journalnode