Hadoop2.7.1 HA环境搭建(hdfs)
write by donaldhan, 2019-01-13 17:02引言
说来惭愧,在2015年7月的时候,就已经接触大数据,然而当时迫于自身的基础(夯实基础),暂时放下大数据。然而一晃就是三年,当初的选择是否正确,暂且不论。基础已经差不到了, 所以就重新拾取大数据。在当前形势,很多选择努力奋斗,希望可以通过努力实现财富自由。然而对于大多数人来说,努力不一定能够实现财务自由,最起码可以解决温饱,最近看了一片文章你的思考方式决定你的层次, 讲的很有道理。层次从高到低如下:
- 精神
领袖、文人:活着为改变世界
- 身份
觉醒者:因为我是XX,所以我会XX
- BVR(信念、价值观、规条)
战略家:什么才是最重要的?
- 能力
战术家:方法总比问题多
- 行为
行动派:我还不够努力
- 环境
怨妇:都是你们的错
一般人从环境到精神,神人,从精神到环境。当前本人还处于行动派,下一阶段为能力,及战术家,因为我们还是一般人,我希望想神人一样,可是还有没有定位自己。在没有定位好之前, 还是先从行动做起吧。 本文使用的Hadoop版本为2.7.1,具体文档可以参考hadoop-doc。想了解hdfs,可以参Hadoop基础知识,这个只是1.0版本的,2.0的可以去官网查看。
目录
环境准备
系统配置
- 电脑1(Lenovo),win7 64位系统,8G内存,此电脑虚拟机上运行nameNode系统。
- 电脑1(Lenovo ),win7 64位系统,8G内存,此电脑虚拟机上运行standyname系统
- 电脑1(Lenovo),win7 64位系统,8G内存,此电脑虚拟机上运行resourceManager系统
- 虚拟机:Vmware12.0
- Hadoop2.7.1
- Zookeeper3.4.12 需要注意的是,内存和处理器配置要足够,否则,跑MR, 等死你。
集群规划
JournalServer 应该单纯一台,slaves文件中为JournalNode(存储name的元数据) journalServer and journalNode 中配置zookeeper,name和standy name
主机名 IP 安装软件 运行的进程
nameNode 192.168.5.131 Jdk,hadoop,zookeeper namenode、DFSZKFailoverController、datanode、jobhistorysever、NodeManager、JournalNode、QuorumPeerMain
secondlyNameNode 192.168.5.132 Jdk,hadoop,zookeeper Namenode、DFSZKFailoverController,datanode、NodeManager、JournalNode、QuorumPeerMain
resourceManager 192.168.5.133 Jdk,hadoop
zookeeper datanode、NodeManager、JournalNode、QuorumPeerMain,ResourceManager
说明: 在hadoop2.0中通常由两个namenode组成,一个处于active状态,另一个处于standby状态。Active name对外提供服务,而Standby name则不对外提供服务,仅同步activename的状态,以便能够在它失败时快速进行切换。hadoop2.0官方提供了两种HDFS HA的解决方案,一种是NFS,另一种是QJM。这里我们使用简单的QJM。在该方案中,主备name之间通过一组JournalNode同步元数据信息,一条数据只要成功写入多数JournalNode即认为写入成功。通常配置奇数个JournalNode,这里还配置了一个zookeeper集群,用于ZKFC(DFSZKFailoverController)故障转移,当Active name挂掉了,会自动切换Standby name为standby状态。
1.配置3台机器的主机名: 在nameNode,secondlyNameNode,resourceManager命令行vim /etc/hostname中分别设置nameNode,secondlyNameNode,resourceManager的主机名,如下图所示:
donaldhan@nameNode:~$ cat /etc/hostname
nameNode
2.在nameNode,secondlyNameNode,resourceManager命令行vim /etc/hosts 中设置nameNode,secondlyNameNode,resourceManager主机名和ip地址的对应关系,如下图所示:
donaldhan@nameNode:~$ cat /etc/hosts
127.0.0.1 localhost
192.168.5.131 nameNode
192.168.5.132 secondlyNameNode
192.168.5.133 resourceManager
192.168.5.131 ns
# The following lines are desirable for IPv6 capable hosts
#::1 ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters
donaldhan@nameNode:~$
需要注意的是,hosts配置里面有一行为:
192.168.5.131 ns
这是因为我们的HDFS HA族为ns,高可用的HDFS,我在namenode无法使用hdfs命令访问hdfs,所以添加加了这么一行,在secondlyNameNode,resourceManager也老添加这么一行, 因为secondlyNameNode,resourceManager同时为datanode。不然执行MR的时候回报一下异常:
-mkdir: java.net.UnknownHostException: ns
- 验证各系统之间是否能够ping通。
- 安装SSH 并产生公私钥在name上:(可以copy ~/.ssh 到 sname和amrm,统一公私钥)
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
拷贝公钥到secondlyNameNode,resourceManager做同样的动作(最好统一公私钥)
scp -r /donaldhan/.ssh donaldhan@secondlyNameNode:/root/ scp -r /donaldhan/.ssh/id_dsa.pub donaldhan@secondlyNameNode:/donaldhan/.ssh/id_dsa.pub scp -r /donaldhan/.ssh/id_dsa.pub donaldhan@resourceManager:/donaldhan/.ssh/id_dsa.pub
检查 ssh secondlyNameNode,resourceManager 保证互相访问不需要密码 ,如果slaves文件中包括自己那么还要执行
ssh nameNode
Scp 命令:
// scp from source to destination(local)
scp donaldhan@nameNode:/donaldhan/.ssh/id_dsa.pub ~/.ssh/data_dsa.pub
// scp from source(local) to destination
scp -r /donaldhan/.ssh/id_dsa.pub donaldhan@resourceManager:/donaldhan/.ssh/id_dsa.pub
注:scp命令在ssh通的情况下使用 相关错误:
- Please contact your system administrator. Add correct host key in /root/.ssh/known_hosts to get rid of this message. Offending ECDSA key in /root/.ssh/known_hosts:2 remove with: ssh-keygen -f “/root/.ssh/known_hosts” -R sname
解决方式: 执行ssh-keygen -f “/root/.ssh/known_hosts” -R sname 或删除/root/.ssh/known_hosts的第2行。
- Warning: the ECDSA host key for ‘sname’ differs from the key for the IP address ‘192.168.32.138’ Offending key for IP in /root/.ssh/known_hosts:2 Matching host key in /root/.ssh/known_hosts:5 Are you sure you want to continue connecting (yes/no)? yes Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64) Documentation: https://help.ubuntu.com/ 82 packages can be updated. 42 updates are security updates. Last login: Fri Dec 11 22:30:00 2015 from 192.168.32.138
解决方式:删除/root/.ssh/known_hosts的第2行。
- Your id_dsa is 755 cann’t used
解决方式:chmod 700 ~/.ssh/id_dsa(私钥文件权限)
5.关ip6 -1.cat /proc/sys/net/ipv6/conf/all/disable_ipv6 显示0说明ipv6开启,1说明关闭
-2在 /etc/sysctl.conf 增加下面几行,并重启。
#disable IPv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
-3. sudo vim /etc/default/grub
-4. 将文件中的 GRUB_CMDLINE_LINUX_DEFAULT=”quiet spalsh” 修改为 GRUB_CMDLINE_LINUX_DEFAULT=”ipv6.disable=1 quiet splash”
-5. wq保存后,运行sudo update-grub更新
-6. 重启网络服务,禁用ipv6成功 可以使用
ip a | grep inet6
查看关闭情况,若没有结果则说明禁用IPv6成功
更多参见:linux关闭IP6
安装配置zookeeper集群
- 解压zookeeper压缩包到/hadoop
tar –zxvf zookeeper-3.4.12.tar.gz /bdp/zookeeper mv /bdp/zookeeper/zookeeper-3.4.12 /hadoop/zookeeper-3.4.12
- 在/hadoop/zookeeper-3.4.12/conf修改zookeeper配置zoo.cfg,具体配置如下图所示:
donaldhan@nameNode:/bdp/zookeeper/zookeeper-3.4.12/conf$ cat zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/bdp/zookeeper/data
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
- 在/bdp/zookeeper/zookeeper-3.4.12中设置创建tmp目录 Mkdir /bdp/zookeeper/zookeeper-3.4.12/tmp
- 在/bdp/zookeeper/zookeeper-3.4.12/tmp目录中创建空文件myid,并写入4 vim /bdp/zookeeper/zookeeper-3.4.12/tmp/myid。
- 将配置好的zookeeper拷贝到sname和amrm scp -r /bdp/zookeeper/zookeeper-3.4.12 doanldhan@secondlyNameNode:/bdp/zookeeper/zookeeper-3.4.12 scp -r /bdp/zookeeper/zookeeper-3.4.12 doanldhan@resourceManager:/bdp/zookeeper/zookeeper-3.4.12
- 在secondlyNameNode和resourceManager中分别修改myid为2和3。
安装配置hadoop集群
- 解压hadoop压缩包到/bdp/hadoop
tar -zxvf hadoop-2.7.1.tar.gz /bdp/hadoop
- 在~/.bashrc中配置hadoop的环境变量信息,如下图所示:
# private path export JAVA_HOME=/usr/share/jdk1.8.0_191 export JRE_HOME=${JAVA_HOME}/jre export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export HADOOP_HOME=/bdp/hadoop/hadoop-2.7.1 export ZOOKEEPER_HOME=/bdp/zookeeper/zookeeper-3.4.12 export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZOOKEEPER_HOME}/bin:${PATH} export HADOOP_MAPRED_HOME=${HADOOP_HOME} export HADOOP_COMMON_HOME=${HADOOP_HOME} export HADOOP_HDFS_HOME=${HADOOP_HOME} export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP}/lib/native export YARN_HOME=${HADOOP_HOME} export HADOOP_OPT="-Djava.library.path=${HADOOP_HOME}/lib/native"
hadoop2.7.1的所有配置文件从存在/hadoop/hadoop-2.7.1/etc/hadoop之中(cd /bdp/hadoop/hadoop-2.7.1/etc/hadoop)
- 修改hadoop-env.sh 加入jdk家目录
export JAVA_HOME=/usr/share/jdk1.8.0_191
- 修改core-site.xml
```
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ cat core-site.xml <?xml version=”1.0” encoding=”UTF-8”?>
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$
5. 修改hdfs-site.xml
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ cat hdfs-site.xml <?xml version=”1.0” encoding=”UTF-8”?> <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$
6. 修改mapred-site.xml
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ cat mapred-site.xml <?xml version=”1.0”?> <?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$
7. 修改yarn-site.xml
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ cat yarn-site.xml <?xml version=”1.0”?>
8. 修改slaves
slaves是指定子节点的位置,因为要在nameNode上启动HDFS、在resourceManager启动yarn,所以nameNode上的slaves文件指定的是datanode的位置,resourceManager上的slaves文件指定的是nodemanager的位置
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ cat slaves nameNode secondlyNameNode resourceManager
9. 将配置好的hadoop拷贝到secondlyNameNode和resourceManager
scp -r /bdp/hadoop/hadoop-2.7.1/etc donaldhan@secondlyNameNode:/bdp/hadoop/hadoop-2.7.1 scp -r /bdp/hadoop/hadoop-2.7.1/etc donaldhan@resourceManager:/bdp/hadoop/hadoop-2.7.1
注意:*以下操作必须严格按照顺序*
10. 启动zookeeper集群,(在nameNode,secondlyNameNode,resourceManager开启zk)
按顺序启动nameNode,secondlyNameNode,resourceManager
./zkServer.sh start(启动zookeeper节点) ./zkServer.sh status(查看zookeeper状态)
12. 启动journalnode(在nameNode,secondlyNameNode,resourceManager)
//在name中启动即可 非hadoop-daemon.sh
hadoop-daemons.sh start journalnode
jps(依次在每个节点中查看各节点是否多了Journalnode进程)
13. 格式化HDFS,在name上执行格式化命令
hdfs namenode -format ns
格式化后会再讲core-site.xml中的
<property>
</property>
</property>
配置生成个目录文件,拷贝到secondlyNameNode和resourceManager的/bdp/hadoop/hadoop-2.7.1下。
scp -r /bdp/hadoop/hadoop-2.7.1/etc donaldhan@secondlyNameNode:/bdp/hadoop/hadoop-2.7.1 scp -r /bdp/hadoop/hadoop-2.7.1/etc donaldhan@resourceManager:/bdp/hadoop/hadoop-2.7.1
注:*格式化生成的目录不要轻易删除,否者启动回报不一致异常*
14. 格式化ZK,在name上执行格式化命令
hdfs zkfc -formatZK
15. 启动HDFS,在nameNode的执行start-dfs.sh命令
start-dfs.sh
16. 启动 historyserver
hdfs dfs -mkdir /histroy hdfs dfs -mkdir /histroy/indone hdfs dfs -mkdir /histroy/done mr-jobhistory-daemon.sh start historyserver
17. 同步namenode的元数据信息到standlyNameNode
在secondlynamenode上,先同步namenode的元数据信息,执行如下命令即可:
donaldhan@sname:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ hdfs namenode -bootstrapStandby
18. 启动standlyNameNode
donaldhan@sname:/bdp/hadoop/hadoop-2.7.1/etc/hadoop$ hadoop-daemon.sh start namenode
19. 启动YARN
在resourceManager上执行start-yarn.sh命令
start-yarn.sh
在resourceManager上执行start-yarn.sh,把nameNode和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动.
致此,HA集群启动完毕。
20. 到此,hadoop2.7.0的配置完毕,可以通过浏览器访问来查看部署是否成功
namenode:http://namenode:50070
进程
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1$ jps 2691 ZooKeeperMain 10851 JobHistoryServer 2404 QuorumPeerMain 10742 DFSZKFailoverController 10999 NodeManager 10281 DataNode 10141 NameNode 10525 JournalNode 14190 Jps
secondlynamenode:http://secondlynamenode:50070
进程
donaldhan@secondlyNamenode:/bdp/zookeeper/zookeeper-3.4.12/bin$ jps 7249 NameNode 7459 JournalNode 2420 QuorumPeerMain 9828 Jps 7349 DataNode 7595 DFSZKFailoverController 7757 NodeManager
resourcemanager:http://resourcemanager:8088
进程
donaldhan@resourceManager:/bdp/hadoop/dfs$ jps 8208 NodeManager 8066 ResourceManager 2227 QuorumPeerMain 10313 Jps 7950 JournalNode 7823 DataNode donaldhan@resourceManager:/bdp/hadoop/dfs$
jobhistroysever:http://namenode:19888
## 执行job
1. hdfs dfs -mkdir /user
2. hdfs dfs -mkdir /donaldhan
3. hdfs dfs -mkdir /input
4. hdfs dfs -put etc/hadoop/\*.xml /input
5. hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep /input /output 'dfs[a-z.]+'
6. hdfs dfs -get /output output //当前目录
7. cat output/* 查看结果
关闭hadoop
在resourceManager中
stop-yarn.sh
在namenode中
mr-jobhistory-daemon.sh stop historyserver stop-dfs.sh hadoop dfsadmin -safemode leave
注:以上过程有什么问题,可以查看相关日志文件
## 总结
我们从准备环境,配置zk,配置hadoop,并格式化HDFS,最后,跑了一个简单的JOB。如果在这个过程中,出现什么问题的话,首先检查权限,再检查网络,最后配置。这边文章我们的namenode是高可用的,但arn不是高可用的,RM(ResourceMananger) HA可以访问如下地址:
http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html。
## 附录
相关异常处理策略:
1. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
原因:
因为掉电,导致hadoop 的HA 出现 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby 此问题,从web 页面查看,是两个节点都变成了standy,所以要切换namenode
解决方式:
hdfs haadmin -transitionToActive –forcemanual nm
2. org.apache.hadoop.ipc.Client: Retrying connect to server: amrm/192.168.32.132:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
telnet: Unable to connect to remote host: Connection refused Ubuntu 15.10
原因:网络不通
解决方式:
查看能否ping通,查看端口是否开放,如果能ping通,同时端口开放,用如下命令查看系统端口监听
netstat -ntulp
确保local Address的地址为0.0.0.0 或192.168.32.137。
解决办法 修改/etc/hosts 地址映射
3. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
原因:name 处于standby状态
4. org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://sname:9000/user/donaldhan/grep-temp-1382738569
这个是由于map的产生的文件放在分布式文件系统/user/${username}中新建
hdfs dfs -mkdir /user hdfs dfs -mkdir /user/${username}
5. java.io.IOException: Unknown Job job_1450012188054_0001 at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:218)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getCounters(HistoryClientService.java:232) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:159) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:281)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
解决办法:
hdfs dfs -chmod -R 777 /history ```
- donaldhan@nameNode:/bdp/hadoop$ hdfs zkfc -formatZK Exception in thread “main” org.apache.hadoop.HadoopIllegalArgumentException: HA is not enabled for this namenode. at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.create(DFSZKFailoverController.java:121) at org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:177)
高可用没有开启,高可用检查配置hdfs-site.xml
- mkdir: Couldn’t create proxy provider null
DFSZKFailoverController没有起来,hdfs格式化有问题,检查配置hdfs-site.xml
8.
19/01/06 22:44:05 ERROR nameNode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled.
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
出现原因:hadoop-2.5.0/etc/hadoop/slaves里的内容错误:缺少nameNode对应的主机名(前提是你在/etc/hosts里配置过,如果你没配置过,那么你可以在此文件中填写nameNode对应的服务器的IP)。
解决办法:在/etc/hadoop/slaves里添加上nameNode对应的主机名。如果还是不行,检查配置参数
9. 19/01/06 23:45:44 WARN namenode.NameNode: Encountered exception during format: org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown: 192.168.5.132:8485: Call From nameNode/192.168.5.131 to secondlyNameNode:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:900) at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:987) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) 19/01/06 23:45:44 ERROR namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown: 192.168.5.132:8485: Call From nameNode/192.168.5.131 to secondlyNameNode:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:900) at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:987) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) 19/01/06 23:45:44 INFO util.ExitUtil: Exiting with status 1 19/01/06 23:45:44 INFO namenode.NameNode: SHUTDOWN_MSG: /********************
解决方式:确保secondlyNameNode的hadoop文件及配置完整。
10. 19/01/07 00:02:06 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:06 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:06 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:07 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:07 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:07 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:08 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:08 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:08 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:09 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:09 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:09 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:10 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:10 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:10 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:11 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:11 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:11 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:12 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:12 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:12 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:13 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:13 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:13 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:14 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:14 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:14 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:15 INFO ipc.Client: Retrying connect to server: nameNode/192.168.5.131:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:15 INFO ipc.Client: Retrying connect to server: secondlyNameNode/192.168.5.132:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:15 INFO ipc.Client: Retrying connect to server: resourceManager/192.168.5.133:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 19/01/07 00:02:15 WARN namenode.NameNode: Encountered exception during format: org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown: 192.168.5.132:8485: Call From nameNode/192.168.5.131 to secondlyNameNode:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:900) at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:987) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) 19/01/07 00:02:15 ERROR namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown: 192.168.5.132:8485: Call From nameNode/192.168.5.131 to secondlyNameNode:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81) at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:223) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.hasSomeData(QuorumJournalManager.java:232) at org.apache.hadoop.hdfs.server.common.Storage.confirmFormat(Storage.java:900) at org.apache.hadoop.hdfs.server.namenode.FSImage.confirmFormat(FSImage.java:184) at org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:987) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1429) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1554) 19/01/07 00:02:15 INFO util.ExitUtil: Exiting with status 1 19/01/07 00:02:15 INFO namenode.NameNode: SHUTDOWN_MSG: /********************
原因,没有启动journalnode