0xTrustTryEP

Just do it, deeply...

Follow me on GitHub

HIVE高可用环境的搭建

write by donaldhan, 2020-02-22 20:57

引言

Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。相对于用java代码编写mapreduce来说,Hive的优势明显:快速开发,人员成本低,可扩展性(自由扩展集群规模),延展性(支持自定义函数)。搭建Hive高可用的环境,我们可以使用HAProxy实现HiveServer2负载均衡这种方案也有一些弊端,如HAProxy本身就是单点,虽然可以通过Keepalived来实现HAProxy的高可用,但这样配置会比较麻烦并且需要多部署两个组件,增大了系统运维的复杂度。另外还有一种hadoop生态换将高可用方案knox。但在大数据平台中Zookeeper是一个必不可少且自身具有高可用保证的组件,本文主要讲述如何使用Zookeeper实现HiveServer2的HA。

目录

集群规划

总共3台机: nameNode,secondlyNameNode,resourceManager

我们计划在上面3台机上配置启动HiveServer2;

使用的HIVE版本为:apache-hive-2.3.4

在3台机上安装Hadoop集群和Zookeeper,请参考前面的文章:

Hadoop2.7.1 HA环境搭建(hdfs):https://donaldhan.github.io/hadoop/2019/01/13/Hadoop2.7.1-HA%E7%8E%AF%E5%A2%83%E6%90%AD%E5%BB%BA-hdfs.html

配置集群

下载资源包

首先下载apache-hive-2.3.4; 由于我们Metastore用的是mysql,所以,我们要使用下载mysql的Jar包; 注意mysql支持的最低版本为5.6.7. 下载完后,上载的nameNode机器上,并scp其他两台机器上;

donaldhan@nameNode:/bdp$ scp -r ./hive/ donaldhan@resourceManager:/bdp/
haproxy-1.7.9.tar.gz                                                                                                                                                      100% 1707KB   1.7MB/s   00:00    
mysql-connector-java-5.1.41.jar                                                                                                                                           100%  970KB 969.5KB/s   00:00    
mysql-server_5.7.11-1ubuntu15.10_amd64.deb-bundle.tar                                                                                                                     100%  180MB  25.7MB/s   00:07    
apache-hive-2.3.4-bin.tar.gz                                                                                                                                              100%  221MB  14.8MB/s   00:15    
donaldhan@nameNode:/bdp$ 

加载hive包,其他两台机,同样操作;

donaldhan@secondlyNamenode:/bdp/hive$ tar -xvf apache-hive-2.3.4-bin.tar.gz 
donaldhan@secondlyNamenode:/bdp/hive$ ls
apache-hive-2.3.4-bin  apache-hive-2.3.4-bin.tar.gz  haproxy-1.7.9.tar.gz  mysql-connector-java-5.1.41.jar  mysql-server_5.7.11-1ubuntu15.10_amd64.deb-bundle.tar

拷贝mysqljar到HIVE的lib文件下

metastore使用

donaldhan@nameNode:/bdp/hive$ cp mysql-connector-java-5.1.41.jar  apache-hive-2.3.4-bin/lib/
donaldhan@nameNode:/bdp/hive$ ls -al apache-hive-2.3.4-bin/lib | grep mysql
-rw-rw-r--  1 donaldhan donaldhan   992805 Feb 13 22:00 mysql-connector-java-5.1.41.jar
-rw-r--r--  1 donaldhan donaldhan     7954 Oct 25  2018 mysql-metadata-storage-0.9.2.jar
donaldhan@nameNode:/bdp/hive$ 

安装mysql

donaldhan@nameNode:/bdp/hive$ tar -xvf mysql-server_5.7.11-1ubuntu15.10_amd64.deb-bundle.tar -C  mysql-server-5.7.11/
mysql-community-server_5.7.11-1ubuntu15.10_amd64.deb
libmysqlclient-dev_5.7.11-1ubuntu15.10_amd64.deb
libmysqld-dev_5.7.11-1ubuntu15.10_amd64.deb
mysql-client_5.7.11-1ubuntu15.10_amd64.deb
mysql-server_5.7.11-1ubuntu15.10_amd64.deb
mysql-community-client_5.7.11-1ubuntu15.10_amd64.deb
mysql-common_5.7.11-1ubuntu15.10_amd64.deb
mysql-community-test_5.7.11-1ubuntu15.10_amd64.deb
mysql-community-source_5.7.11-1ubuntu15.10_amd64.deb
mysql-community_5.7.11-1ubuntu15.10_amd64.changes
libmysqlclient20_5.7.11-1ubuntu15.10_amd64.deb
mysql-testsuite_5.7.11-1ubuntu15.10_amd64.deb
donaldhan@nameNode:/bdp/hive$ ls
apache-hive-2.3.4-bin  apache-hive-2.3.4-bin.tar.gz  haproxy-1.7.9.tar.gz  mysql-connector-java-5.1.41.jar  mysql-server-5.7.11  mysql-server_5.7.11-1ubuntu15.10_amd64.deb-bundle.tar
donaldhan@nameNode:/bdp/hive$ cd mysql-server-5.7.11/
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ ls
libmysqlclient20_5.7.11-1ubuntu15.10_amd64.deb    mysql-common_5.7.11-1ubuntu15.10_amd64.deb            mysql-community-source_5.7.11-1ubuntu15.10_amd64.deb
libmysqlclient-dev_5.7.11-1ubuntu15.10_amd64.deb  mysql-community_5.7.11-1ubuntu15.10_amd64.changes     mysql-community-test_5.7.11-1ubuntu15.10_amd64.deb
libmysqld-dev_5.7.11-1ubuntu15.10_amd64.deb       mysql-community-client_5.7.11-1ubuntu15.10_amd64.deb  mysql-server_5.7.11-1ubuntu15.10_amd64.deb
mysql-client_5.7.11-1ubuntu15.10_amd64.deb        mysql-community-server_5.7.11-1ubuntu15.10_amd64.deb  mysql-testsuite_5.7.11-1ubuntu15.10_amd64.deb
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ 
sudo dpkg -i {libmysqlclient20,libmysqlclient-dev,libmysqld-dev}_*.deb
sudo dpkg -i mysql-{common,community-client,client,community-server,server}_*.deb
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ dpkg -l | grep mysql
ii  libmysqlclient-dev                            5.7.11-1ubuntu15.10                        amd64        MySQL development headers
ii  libmysqlclient20:amd64                        5.7.11-1ubuntu15.10                        amd64        MySQL shared client libraries
ii  libmysqld-dev                                 5.7.11-1ubuntu15.10                        amd64        MySQL embedded server library
ii  mysql-client                                  5.7.11-1ubuntu15.10                        amd64        MySQL Client meta package depending on latest version
ii  mysql-common                                  5.7.11-1ubuntu15.10                        amd64        MySQL configuration for client and server
ii  mysql-community-client                        5.7.11-1ubuntu15.10                        amd64        MySQL Client and client tools
rc  mysql-community-server                        5.7.11-1ubuntu

注意,我的mysql安装包失败,具体见附篇,我使用的是本地的mysql不是,集群上的,不过这不影响我们的集群搭建;

配置mysql host

donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ cat /etc/hosts
127.0.0.1	localhost
192.168.5.135  nameNode
192.168.5.136 secondlyNameNode
192.168.5.137 resourceManager
192.168.5.135 ns
192.168.3.106 mysqldb

配置全局HIVE HOME路径

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ tail -n -6 ~/.bashrc 
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP}/lib/native
export YARN_HOME=${HADOOP_HOME}
export HADOOP_OPT="-Djava.library.path=${HADOOP_HOME}/lib/native"
export HIVE_HOME=/bdp/hive/apache-hive-2.3.4-bin
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZOOKEEPER_HOME}/bin:${HBASE_HOME}/bin:${HIVE_HOME}/bin:${PATH}

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ 

配置hive环境变量

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ cp hive-env.sh.template hive-env.sh
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ vim hive-env.sh
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ tail -n 5 hive-env.sh
# export HIVE_AUX_JARS_PATH=
HADOOP_HOME=/bdp/hadoop/hadoop-2.7.1
HIVE_CONF_DIR=/bdp/hive/apache-hive-2.3.4-bin/conf
HIVE_AUX_JARS_PATH=/bdp/hive/apache-hive-2.3.4-bin/lib

配置hive-site

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ cp hive-default.xml.template hive-site.xml
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ vim hive-site.xml 

具体如下:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--><configuration>
  <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
  <!-- WARNING!!! Any changes you make to this file will be ignored by Hive.   -->
  <!-- WARNING!!! You must make your changes in hive-site.xml instead.         -->
  <!-- Hive Execution Parameters -->
 <property>
    <name>hive.exec.scratchdir</name>
    <value>/user/hive/tmp</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>
 <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/bdp/hive/jobslog/${system:user.name}</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/bdp/hive/jobslog/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
<!-- warehouse config -->
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>

  <!-- metastore config -->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://mysqldb:3306/hive_db?createDatabaseIfNotExist=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
 <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
    <description>password to use against metastore database</description>
  </property>
  <!-- HiveServer2 HA config -->
<property>
    <name>hive.zookeeper.quorum</name>
    <value>nameNode:2181,secondlyNameNode:2181,resourceManager:2181</value>
    <description>
      List of ZooKeeper servers to talk to. This is needed for:
      1. Read/write locks - when hive.lock.manager is set to
      org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,
      2. When HiveServer2 supports service discovery via Zookeeper.
      3. For delegation token storage if zookeeper store is used, if
      hive.cluster.delegation.token.store.zookeeper.connectString is not set
      4. LLAP daemon registry service
      5. Leader selection for privilege synchronizer
    </description>
  </property>
<property>
    <name>hive.server2.support.dynamic.service.discovery</name>
    <value>true</value>
    <description>Whether HiveServer2 supports dynamic service discovery for its clients. To support this, each instance of HiveServer2 currently uses ZooKeeper to register itself, when it is brought up. JDBC/ODBC clients should use the ZooKeeper ensemble: hive.zookeeper.quorum in their connection string.</description>
  </property>
  <property>
    <name>hive.server2.zookeeper.namespace</name>
    <value>hiveserver2_zk</value>
    <description>The parent node in ZooKeeper used by HiveServer2 when supporting dynamic service discovery.</description>
  </property>

  <property>
    <name>hive.server2.zookeeper.publish.configs</name>
    <value>true</value>
    <description>Whether we should publish HiveServer2's configs to ZooKeeper.</description>
</property>
  <property>
    <name>hive.zookeeper.client.port</name>
    <value>2181</value>
  </property>

<!-- thrift config -->
<property>
    <name>hive.server2.thrift.bind.host</name>
<!-- 这个不同的机器上面,对应不同的主机名 -->
    <value>nameNode</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
  </property>

  <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
  </property>

  <property>
    <name>hive.server2.thrift.http.port</name>
    <value>10001</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.</description>
  </property>
<!-- 这里要和hadoop,core-site.xml 代理用户要一致 -->
<property>
    <name>hive.server2.thrift.client.user</name>
    <value>hadoop</value>
    <description>Username to use against thrift client</description>
  </property>
  <property>
    <name>hive.server2.thrift.client.password</name>
    <value>123456</value>
    <description>Password to use against thrift client</description>
  </property>
<!-- 使用连接用户,执行Hive相关的操作 -->
<property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
    <description>
      Setting this property to true will have HiveServer2 execute
      Hive operations as the user making the calls to it.
    </description>
  </property>
  <!-- java.io config -->
<property>
    <name>system:java.io.tmpdir</name>
    <value>/bdp/hive/javaio</value>
</property>
<property>
    <name>system:user.name</name>
    <value>${user.name}</value>
</property>
</configuration>

注意,在将配置拷贝到secondlyNamenode, resourceManager机器上时,需要修改如下配合

<property>
    <name>hive.server2.thrift.bind.host</name>
    <value>secondlyNamenode</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
  </property>

<property>
    <name>hive.server2.thrift.bind.host</name>
    <value>resourceManager</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
  </property>

修改日志目录

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ cp hive-log4j2.properties.template hive-log4j2.properties
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ vim hive-log4j2.propertie
<!-- 修改如下属性 -->
property.hive.log.dir = /bdp/hive/log

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ vim hive-exec-log4j2.properties
<!-- 修改如下属性 -->
property.hive.log.dir = /bdp/hive/exelog

copy 配置文件到其他两台机

主要有.bashrc ,hive-env.sh ,hive-site.xml ,hive-log4j2.properties ,hive-exec-log4j2.properties配置文件。

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ scp /home/donaldhan/.bashrc donaldhan@secondlyNameNode:/home/donaldhan/
.bashrc                                       100% 4512     4.4KB/s   00:00    
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ scp /home/donaldhan/.bashrc donaldhan@resourceManager:/home/donaldhan/
.bashrc                                       100% 4512     4.4KB/s   00:00    
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ scp /bdp/hive/apache-hive-2.3.4-bin/conf/hive-env.sh donaldhan@secondlyNameNode:/bdp/hive/apache-hive-2.3.4-bin/conf/
hive-env.sh                                   100% 2509     2.5KB/s   00:00    
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ scp /bdp/hive/apache-hive-2.3.4-bin/conf/hive-env.sh donaldhan@resourceManager:/bdp/hive/apache-hive-2.3.4-bin/conf/
hive-env.sh                                   100% 2509     2.5KB/s   00:00    
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ scp /bdp/hive/apache-hive-2.3.4-bin/conf/hive-site.xml  donaldhan@resourceManager:/bdp/hive/apache-hive-2.3.4-bin/conf/
hive-site.xml                                 100% 4824     4.7KB/s   00:00    
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ scp /bdp/hive/apache-hive-2.3.4-bin/conf/hive-site.xml  donaldhan@secondlyNameNode:/bdp/hive/apache-hive-2.3.4-bin/conf/
hive-site.xml                                 100% 4824     4.7KB/s   00:00    
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ scp /bdp/hive/apache-hive-2.3.4-bin/conf/hive-log4j2.properties  donaldhan@secondlyNameNode:/bdp/hive/apache-hive-2.3.4-bin/conf/
hive-log4j2.properties                        100% 2900     2.8KB/s   00:00    
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ scp /bdp/hive/apache-hive-2.3.4-bin/conf/hive-log4j2.properties  donaldhan@resourceManager:/bdp/hive/apache-hive-2.3.4-bin/conf/
hive-log4j2.properties                        100% 2900     2.8KB/s   00:00    
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ scp /bdp/hive/apache-hive-2.3.4-bin/conf/hive-exec-log4j2.properties   donaldhan@resourceManager:/bdp/hive/apache-hive-2.3.4-bin/conf/
hive-exec-log4j2.properties                   100% 2252     2.2KB/s   00:00    
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ scp /bdp/hive/apache-hive-2.3.4-bin/conf/hive-exec-log4j2.properties   donaldhan@secondlyNameNode:/bdp/hive/apache-hive-2.3.4-bin/conf/
hive-exec-log4j2.properties                   100% 2252     2.2KB/s   00:00    
donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ 

开启hadoop的webhdfs配置

hdfs-site.xml  文件配置

<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>

配置hadoop访问用户

core-site.xml 配置, 这个配置很重要  name页签中的hadoop登录hdfs的具体的用户名,如果写错的话,我们使用hadoop用户,访问hive的时候会没有权限访问。

<property>
     <name>hadoop.proxyuser.hadoop.hosts</name>
     <value>*</value>
   </property>
   <property>
    <name>hadoop.proxyuser.hadoop.groups</name>
    <value>*</value>
</property>

启动Hadoop集群

首先在分布式环境中启动zk

zkServer.sh start

启动hadoop集群

start-dfs.sh

查看启动进程

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc$ jps
3650 NameNode
4051 JournalNode
4389 Jps
3800 DataNode
3433 QuorumPeerMain
4271 DFSZKFailoverController

创建HIVE数仓目录和job中间目录

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc$ hdfs dfs -mkdir -p /user/hive/warehouse
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc$ hdfs dfs -mkdir -p /user/hive/tmp
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc$ hdfs dfs -chmod 777 /user/hive/tmp 
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc$ hdfs dfs -chmod 777 /user/hive/warehouse
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc$ hdfs dfs -ls /user/hive/Found 2 items
drwxrwxrwx   - donaldhan supergroup          0 2020-02-22 22:59 /user/hive/tmp
drwxrwxrwx   - donaldhan supergroup          0 2020-02-22 22:59 /user/hive/warehouse
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc$ 

初始化Metastore元信息Schema

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/bin$ schematool -dbType mysql  -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bdp/hive/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/bdp/hadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://mysqldb:3306/hive_db?createDatabaseIfNotExist=true
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed

查看元信息Schema

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/bin$ schematool -dbType mysql -info
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bdp/hive/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/bdp/hadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:	 jdbc:mysql://mysqldb:3306/hive_db?createDatabaseIfNotExist=true
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
Hive distribution version:	 2.3.0
Metastore schema version:	 2.3.0
schemaTool completed

启动HiveServe2

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/bin$ ./hiveserver2 
2020-02-23 11:11:32: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bdp/hive/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/bdp/hadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

查看启动日志

donaldhan@nameNode:/bdp/hive/log$ tail -f hive.log
2020-02-23T11:11:08,679  INFO [Thread-1] server.HiveServer2: Web UI has stopped
2020-02-23T11:11:08,680  INFO [Thread-1] server.HiveServer2: Server instance removed from ZooKeeper.
...
vletContextHandler{/logs,file:/bdp/hive/log/}
2020-02-23T11:11:51,809  INFO [main] server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:10002
2020-02-23T11:11:51,813  INFO [main] server.HiveServer2: Web UI has started on port 10002
2020-02-23T11:11:51,813  INFO [main] http.HttpServer: Started HttpServer[hiveserver2] on port 10002

在mysql的hive_db数据库中, 可以查看metastore schema的表信息

aux_table, bucketing_cols, cds, columns_v2, compaction_queue, completed_compactions, completed_txn_components, database_params, db_privs, dbs, delegation_tokens, func_ru, funcs, global_privs, hive_locks, idxs, index_params, key_constraints, master_keys, next_compaction_queue_id, next_lock_id, next_txn_id, notification_log, notification_sequence, nucleus_tables, part_col_privs, part_col_stats, part_privs, partition_events, partition_key_vals, partition_keys, partition_params, partitions, role_map, roles, sd_params, sds, sequence_table, serde_params, serdes, skewed_col_names, skewed_col_value_loc_map, skewed_string_list, skewed_string_list_values, skewed_values, sort_cols, tab_col_stats, table_params, tbl_col_privs, tbl_privs, tbls, txn_components, txns, type_fields, types, version, write_set

具体的Metadata的表关系图,可以通过下面的链接找到

All the metadata for Hive tables and partitions are accessed through the Hive Metastore. Metadata is persisted using JPOX ORM solution (Data Nucleus) so any database that is supported by it can be used by Hive. Most of the commercial relational databases and many open source databases are supported. See the list of supported databases in section below. You can find an E/R diagram for the metastore here.

AdminManual Metastore Administration:https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration

查看进程,多了一个RunJar的进行,启动成功

donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc$ jps
3650 NameNode
4051 JournalNode
4708 RunJar
4888 Jps
3800 DataNode
3433 QuorumPeerMain
4271 DFSZKFailoverController
donaldhan@nameNode:/bdp/hadoop/hadoop-2.7.1/etc$ 

同时访问

HiveServerWebUI:http://namenode:10002/

可以查看HIVE的相关会话,查询及配置,日志等信息。

启动secondlyNamenode, resourceManager两台机的HiveServer2:

到zk上查看hiveserver2_zk路径内容,如下:

donaldhan@secondlyNamenode:~$ zkCli.sh 
Connecting to localhost:2181
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls
[zk: localhost:2181(CONNECTED) 1] ls /
[zookeeper, hadoop-ha, hbase, hiveserver2_zk]
[zk: localhost:2181(CONNECTED) 2] ls /hiveserver2_zk
[serverUri=nameNode:10000;version=2.3.4;sequence=0000000001]
[zk: localhost:2181(CONNECTED) 12] ls /hiveserver2_zk
[serverUri=secondlyNameNode:10000;version=2.3.4;sequence=0000000009, serverUri=resourceManager:10000;version=2.3.4;sequence=0000000010, serverUri=nameNode:10000;version=2.3.4;sequence=0000000007]
[zk: localhost:2181(CONNECTED) 13] 

这是3台机器上的HiveServer全部启动,并且注册到zookeeper:

CLI命令行

数据仓库的工具hive提供了两种ETL运行方式,分别是通过Hive 命令行(hive1)和beeline客户端(hive2);hive1和hive2的区别就是hive2中解决了hive1中的单点故障,可以搭建高可用的hive集群,hive2界面显示格式要比hive1直观、好看(类似于mysql中的shell界面)

命令行方式即通过hive进入命令模式后通过执行不同的HQL命令得到对应的结果;相当于胖客户端模式,即客户机中需要安装JRE环境和Hive程序。

要想使用hive2,需要先执行如下命令开启hive2服务

hiveserver2

beeline客户端方式相当于瘦客户端模式,采用JDBC方式借助于Hive Thrift服务访问Hive数据仓库。

HiveThrift(HiveServer)是Hive中的组件之一,设计目的是为了实现跨语言轻量级访问Hive数据仓库,有Hiveserver和 Hiveserver2两个版本,两者不兼容,使用中要注意区分。体现在启动HiveServer的参数和jdbc:hiveX的参数上。

单机下如果使用的是hive1(HIVE命令模式)中做的,笔者安装的hive版本为apache-hive-2.3.4,即支持hive1,也支持,下面的我们来使用hive2进行操作,

HIVE CLI

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin$ bin/hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bdp/hive/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/bdp/hadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/bdp/hive/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> select 1;
OK
1
Time taken: 13.003 seconds, Fetched: 1 row(s)
hive> 
<!-- 查看数据库 -->
hive> show databases;
OK
default
Time taken: 8.316 seconds, Fetched: 1 row(s)
<!-- 创建数据库 -->
hive> create database test;
OK
Time taken: 0.617 seconds
hive> show databases;
OK
default
test
Time taken: 0.028 seconds, Fetched: 2 row(s)
<!-- 查看数据库详情,如果没有创建数据库的指定位置,默认为在数仓目录,(HDFS目录为:/user/hive/warehouse)创建文件夹,文件夹命名格式为:数据库名.db -->
hive> 
    > desc database extended test;
OK
test		hdfs://ns/user/hive/warehouse/test.db	donaldhan	USER	
Time taken: 0.671 seconds, Fetched: 1 row(s)
<!-- 切换数据库 -->
hive> use test;
OK
Time taken: 0.078 seconds
<!-- 创建数据表 -->
hive> create table student(
    > id int,
    > name string,
    > tel string,
    > age int
    > )
    > row format delimited fields terminated by '\t';
OK
Time taken: 0.925 seconds
<!-- 查看所有表,不指定存储位置,默认在HDFS的DB目录下,创建一个表名对应的文件夹 -->
hive> show tables;
OK
emp
student
Time taken: 0.068 seconds, Fetched: 2 row(s)
<!-- 查看指定表 -->
hive> desc student;
OK
id                  	int                 	                    
name                	string              	                    
tel                 	string              	                    
age                 	int                 	                    
Time taken: 0.189 seconds, Fetched: 4 row(s)
<!-- 查看指定表详情 -->
hive> desc extended student;
OK
id                  	int                 	                    
name                	string              	                    
tel                 	string              	                    
age                 	int                 	                    
	 	 
Detailed Table Information	Table(tableName:student, dbName:test, owner:donaldhan, createTime:1582452310, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null), FieldSchema(name:tel, type:string, comment:null), FieldSchema(name:age, type:int, comment:null)], location:hdfs://ns/user/hive/warehouse/test.db/student, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=	, field.delim=
Time taken: 0.381 seconds, Fetched: 6 row(s)

由于新的开发将基于HiveServer2, Hive CLI将会别丢弃,使用新的命令工具,Beeline CLI。 我们来看一下Beeline CLI的使用;

Beeline CLI

donaldhan@nameNode:/bdp/hive/apache-hive-2.3.4-bin/conf$ beeline 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bdp/hive/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/bdp/hadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Beeline version 2.3.4 by Apache Hive
beeline>  !connect jdbc:hive2://namenode:10000 hadoop 123456
Connecting to jdbc:hive2://namenode:10000
Connected to: Apache Hive (version 2.3.4)
Driver: Hive JDBC (version 2.3.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://namenode:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| test           |
+----------------+
2 rows selected (1.694 seconds)
0: jdbc:hive2://namenode:10000> use test;
No rows affected (0.366 seconds)
0: jdbc:hive2://namenode:10000> show tables;
+-----------+
| tab_name  |
+-----------+
| emp       |
| student   |
+-----------+
2 rows selected (0.323 seconds)
0: jdbc:hive2://namenode:10000> desc student;
+-----------+------------+----------+
| col_name  | data_type  | comment  |
+-----------+------------+----------+
| id        | int        |          |
| name      | string     |          |
| tel       | string     |          |
| age       | int        |          |
+-----------+------------+----------+
4 rows selected (0.59 seconds)
0: jdbc:hive2://namenode:10000> !quit
Closing: 0: jdbc:hive2://namenode:10000

Beeline集群模式

命令

!connect jdbc:hive2://nameNode:2181,secondlyNameNode:2181,resourceManager:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk

然后输入账号,密码,具体如下:

beeline> !connect jdbc:hive2://nameNode:2181,secondlyNameNode:2181,resourceManager:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk
Connecting to jdbc:hive2://nameNode:2181,secondlyNameNode:2181,resourceManager:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk
Enter username for jdbc:hive2://nameNode:2181,secondlyNameNode:2181,resourceManager:2181/: hadoop
Enter password for jdbc:hive2://nameNode:2181,secondlyNameNode:2181,resourceManager:2181/: ******
20/02/23 21:27:14 [main]: INFO jdbc.HiveConnection: Connected to nameNode:10000
Connected to: Apache Hive (version 2.3.4)
Driver: Hive JDBC (version 2.3.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://nameNode:2181,secondlyNameNod> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| test           |
+----------------+
2 rows selected (3.584 seconds)
0: jdbc:hive2://nameNode:2181,secondlyNameNod> 

HiveServer2+Clients:https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients Hive DDL DML及SQL操作:https://blog.csdn.net/HG_Harvey/article/details/77488314
LanguageManual DDL:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
LanguageManual DML:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML

使用jdbc方式连接

HiveServer2Clients-JDBC:https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC

这种方式我们后面的文章中,再说。至此我们Hive HA环境搭建完毕,同时使用HIVE CLI和Beeline, 进行了一些简单的DDL和DML。

总结

我们在3台机上搭建了基于Zookeeper的Hive高可用HA环境,同时使用HIVE CLI和Beeline, 体验了一些简单的DDL和DML。 在配置的过程中metastore,要先初始化;另外hive.server2.thrift.bind.host配置,不同的机器,绑定的主机名为相应的主机,HiveServer2就是单点模式了。最重要的注意hive.server2.thrift.client.user配置用户,要与hadoop的core-site.xml中的代理用户名要一致。

引用文献

HIVE AdministratorDocumentation config:https://cwiki.apache.org/confluence/display/Hive/Home#Home-AdministratorDocumentation

Apache Hive TM:h<ttps://cloud.tencent.com/developer/article/1561886>
HIVE快速入门教程2Hive架构:https://www.jianshu.com/p/eeb65dcfcc6a
Hive的使用:https://www.jianshu.com/p/7bf9a390d7e6
Hive教程:https://www.yiibai.com/hive/
HIVE Metastore:http://www.pianshen.com/article/8317243978/
Hive Metastore的故事:https://zhuanlan.zhihu.com/p/100585524
Hive MetaStore的结构:https://www.jianshu.com/p/420ddb3bde7f
Hive Metastore原理及配置:https://blog.csdn.net/qq_40990732/article/details/80914873
Hive为什么要启用Metastore:https://blog.csdn.net/qq_35440040/article/details/82462269

Hive HA使用说明及Hive使用HAProxy配置HA(高可用): https://www.aboutyun.com/thread-10938-1-1.html

HAProxy用法详解 全网最详细中文文档: http://www.ttlsa.com/linux/haproxy-study-tutorial/

HiveMetaStore高可用性(HA)配置:https://blog.csdn.net/rotkang/article/details/78683626
一个失败,连接另外一个

HiveServer2的高可用-HA配置:http://lxw1234.com/archives/2016/05/675.htm
HiveServer2 高可用配置:https://www.jianshu.com/p/3dfa4b4e7ce0
如何使用Zookeeper实现HiveServer2的HA:https://cloud.tencent.com/developer/article/1078331
构建高可用Hive HA和整合HBase开发环境:https://blog.csdn.net/pysense/article/details/102987186

hive 3.1.1 高可用集群搭建(与zookeeper集成)搭建笔记:https://blog.csdn.net/liuhuabing760596103/article/details/89175063

Hive集群部署:https://www.alongparty.cn/hive-cluster-deployment.html

Centos7.6+Hadoop 3.1.2(HA)+Zookeeper3.4.13+Hbase1.4.9(HA)+Hive2.3.4+Spark2.4.0(HA)高可用集群搭建:https://mshk.top/2019/03/centos-hadoop-zookeeper-hbase-hive-spark-high-availability/

Dynamic HA Provider Configuration:https://cwiki.apache.org/confluence/display/KNOX/Dynamic+HA+Provider+Configuration

knox:http://knox.apache.org/

WebHDFS

Gateway: https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/webhdfs
Cluster: http://{webhdfs-host}:50070/webhdfs

WebHCat (Templeton)

Gateway: https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/templeton
Cluster: http://{webhcat-host}:50111/templeton} # Oozie
Gateway: https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/oozie
Cluster: http://{oozie-host}:11000/oozie} # HBase
Gateway: https://{gateway-host}:{gateway-port}/{gateway-path}/{cluster-name}/hbase
Cluster: http://{hbase-host}:8080 # Hive JDBC
Gateway: jdbc:hive2://{gateway-host}:{gateway-port}/;ssl=true;sslTrustStore={gateway-trust-store-path};trustStorePassword={gateway-trust-store-password};transportMode=http;httpPath={gateway-path}/{cluster-name}/hive
Cluster: http://{hive-host}:10001/cliservice

后台启动HiveServer2,连接集群客户端

后台启动服务. 在hive节点上启动即可

nohup hiveserver2 -hiveconf hive.root.logger=DEBUG,console  1> hive.log 2>&1 &

客户端访问  belline 敲入以下指令登录

这种方式还没有操作过

!connect jdbc:hive2://nameNode:2181,secondlyNameNode:2181,resourceManager:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk hadoop "123456"

完整配置

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
--><configuration>
  <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
  <!-- WARNING!!! Any changes you make to this file will be ignored by Hive.   -->
  <!-- WARNING!!! You must make your changes in hive-site.xml instead.         -->
  <!-- Hive Execution Parameters -->
 <property>
    <name>hive.exec.scratchdir</name>
    <value>/user/hive/tmp</value>
    <description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/&lt;username&gt; is created, with ${hive.scratch.dir.permission}.</description>
  </property>
 <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/bdp/hive/jobslog/${system:user.name}</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/bdp/hive/jobslog/${hive.session.id}_resources</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
    <description>location of default database for the warehouse</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://mysqldb:3306/hive_db?createDatabaseIfNotExist=true</value>
    <description>
      JDBC connect string for a JDBC metastore.
      To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
      For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
    </description>
  </property>
<property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
  </property>
 <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
    <description>password to use against metastore database</description>
  </property>
<property>
    <name>hive.zookeeper.quorum</name>
    <value>nameNode:2181,secondlyNameNode:2181,resourceManager:2181</value>
    <description>
      List of ZooKeeper servers to talk to. This is needed for:
      1. Read/write locks - when hive.lock.manager is set to
      org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager,
      2. When HiveServer2 supports service discovery via Zookeeper.
      3. For delegation token storage if zookeeper store is used, if
      hive.cluster.delegation.token.store.zookeeper.connectString is not set
      4. LLAP daemon registry service
      5. Leader selection for privilege synchronizer
    </description>
  </property>
<property>
    <name>hive.server2.support.dynamic.service.discovery</name>
    <value>true</value>
    <description>Whether HiveServer2 supports dynamic service discovery for its clients. To support this, each instance of HiveServer2 currently uses ZooKeeper to register itself, when it is brought up. JDBC/ODBC clients should use the ZooKeeper ensemble: hive.zookeeper.quorum in their connection string.</description>
  </property>
  <property>
    <name>hive.server2.zookeeper.namespace</name>
    <value>hiveserver2_zk</value>
    <description>The parent node in ZooKeeper used by HiveServer2 when supporting dynamic service discovery.</description>
  </property>
  <property>
    <name>hive.server2.zookeeper.publish.configs</name>
    <value>true</value>
    <description>Whether we should publish HiveServer2's configs to ZooKeeper.</description>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
<property>
    <name>hive.server2.thrift.bind.host</name>
    <value>nameNode</value>
    <description>Bind host on which to run the HiveServer2 Thrift service.</description>
  </property>

  <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
  </property>

  <property>
    <name>hive.server2.thrift.http.port</name>
    <value>10001</value>
    <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.</description>
  </property>
<property>
    <name>hive.server2.thrift.client.user</name>
    <value>hadoop</value>
    <description>Username to use against thrift client</description>
  </property>
  <property>
    <name>hive.server2.thrift.client.password</name>
    <value>123456</value>
    <description>Password to use against thrift client</description>
  </property>
  <property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
    <description>
      Setting this property to true will have HiveServer2 execute
      Hive operations as the user making the calls to it.
    </description>
  </property>
  <property>
    <name>system:java.io.tmpdir</name>
    <value>/bdp/hive/javaio</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>
</configuration>

安装mysql缺少so包

离线安装mysql缺少so包:具体信息如下:

donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ sudo dpkg -i  mysql-community-server_5.7.11-1ubuntu15.10_amd64.deb
(Reading database ... 160808 files and directories currently installed.)
Preparing to unpack mysql-community-server_5.7.11-1ubuntu15.10_amd64.deb ...
.
Unpacking mysql-community-server (5.7.11-1ubuntu15.10) over (5.7.11-1ubuntu15.10) ...
dpkg: dependency problems prevent configuration of mysql-community-server:
 mysql-community-server depends on mysql-client (= 5.7.11-1ubuntu15.10); however:
  Package mysql-client is not installed.
 mysql-community-server depends on libaio1 (>= 0.3.93); however:
  Package libaio1 is not installed.
 mysql-community-server depends on libmecab2v5 (>= 0.996-1.1ubuntu1); however:
  Package libmecab2v5 is not installed.

dpkg: error processing package mysql-community-server (--install):
 dependency problems - leaving unconfigured
Processing triggers for ureadahead (0.100.0-19) ...
Processing triggers for systemd (225-1ubuntu9) ...
Processing triggers for man-db (2.7.4-1) ...
Errors were encountered while processing:
 mysql-community-server

当出现这个问题时,首先安装libaio1,在安装libmecab2v5,但libmecab2v5对Ubuntu15.10没有对应的资源,这是Ubuntu15.10下mysql离线安装包的bug;

donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ apt-get install libaio1
E: Could not open lock file /var/lib/dpkg/lock - open (13: Permission denied)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ ^C
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ sudo apt-get install ^C
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ sudo apt-get install libmecab2v5
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Package libmecab2v5 is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

E: Package 'libmecab2v5' has no installation candidate
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ sudo apt-get update 
E: Could not get lock /var/lib/apt/lists/lock - open (11: Resource temporarily unavailable)
E: Unable to lock directory /var/lib/apt/lists/
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ sudo apt-get upgrade
Reading package lists... Done
Building dependency tree       
Reading state information... Done
You might want to run 'apt-get -f install' to correct these.
The following packages have unmet dependencies:
 mysql-community-client : Depends: libaio1 (>= 0.3.93) but it is not installed
 mysql-community-server : Depends: libaio1 (>= 0.3.93) but it is not installed
                          Depends: libmecab2v5 (>= 0.996-1.1ubuntu1) but it is not installable
E: Unmet dependencies. Try using -f.
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ sudo apt-get upgrade -f 

重新安装还是不行:具体原因可以参考如下文献:

mysql-bug:https://bugs.mysql.com/bug.php?id=79798
mysql-community-server-depends-on-libmecab2-however-libmecab2-is-not-installed:https://www.fatalerrors.org/a/mysql-community-server-depends-on-libmecab2-however-libmecab2-is-not-installed.html

主要原因依赖的libmecab2v5包,已经不存在,已经重新命名为libmecab2

修改安装

donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ sudo apt-get -f install
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Correcting dependencies... Done
The following package was automatically installed and is no longer required:
  libdbusmenu-gtk4
Use 'apt-get autoremove' to remove it.
The following packages will be REMOVED:
  mysql-community-server mysql-server
0 upgraded, 0 newly installed, 2 to remove and 10 not upgraded.
2 not fully installed or removed.
After this operation, 136 MB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 160904 files and directories currently installed.)
Removing mysql-server (5.7.11-1ubuntu15.10) ...
Removing mysql-community-server (5.7.11-1ubuntu15.10) ...
Processing triggers for man-db (2.7.4-1) ...
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ sudo dpkg -l|grep mysql
ii  libmysqlclient-dev                            5.7.11-1ubuntu15.10                        amd64        MySQL development headers
ii  libmysqlclient20:amd64                        5.7.11-1ubuntu15.10                        amd64        MySQL shared client libraries
ii  libmysqld-dev                                 5.7.11-1ubuntu15.10                        amd64        MySQL embedded server library
ii  mysql-client                                  5.7.11-1ubuntu15.10                        amd64        MySQL Client meta package depending on latest version
ii  mysql-common                                  5.7.11-1ubuntu15.10                        amd64        MySQL configuration for client and server
ii  mysql-community-client                        5.7.11-1ubuntu15.10                        amd64        MySQL Client and client tools
rc  mysql-community-server                        5.7.11-1ubuntu15.10                        amd64        MySQL Server and server tools
donaldhan@nameNode:/bdp/hive/mysql-server-5.7.11$ 

无法启动;

只能卸载mysql

sudo rm /var/lib/mysql/ -R
sudo rm /etc/mysql/ -R
sudo apt-get autoremove mysql* --purge
sudo apt-get remove apparmor # select Yes in this step
sudo apt-get install mysql-server mysql-common # Reenter password

使用在线安装模式,同样的错误;

尝试使用mysql-apt-config,仍无法解决

how-can-i-install-mysql-5-7-9-to-ubuntu-14-04:https://serverfault.com/questions/752063/how-can-i-install-mysql-5-7-9-to-ubuntu-14-04

package=mysql-apt-config_0.8.11-1_all.deb
wget http://dev.mysql.com/get/$package
sudo dpkg -i $package
sudo apt-get update
sudo apt-get install mysql-community-server mysql-server

mysql-apt-config_0.6.0-1相关包及下载路径

package=mysql-apt-config_0.6.0-1_all.deb
http://dev.mysql.com/get/mysql-apt-config_0.6.0-1_all.deb

重新安装libmecab2,仍无法安装

 donaldhan@nameNode:/bdp/hive$ sudo apt-get install libmecab2
Reading package lists... Done
Building dependency tree       
Reading state information... Done
libmecab2 is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 32 not upgraded.
donaldhan@nameNode:/bdp/hive$ 
$ sudo apt-get install software-properties-common
$ sudo add-apt-repository -y ppa:ondrej/mysql-5.7
$ sudo apt-get update
$ sudo apt-get install mysql-server

针对lock file,我们移除相应(rm lock file)的文件即可。

Couldn’t create directory /bdb/hive/local/jobslog/

 2020-02-22T23:07:31,704  WARN [main] server.HiveServer2: Error starting HiveServer2 on attempt 5, will retry in 60000ms
java.lang.RuntimeException: Error applying authorization policy on hive configuration: Couldn't create directory /bdb/hive/local/jobslog/46fb3bec-497a-4029-8c10-13e409c15214_resources
        at org.apache.hive.service.cli.CLIService.init(CLIService.java:117) ~[hive-service-2.3.4.jar:2.3.4]
        at org.apache.hive.service.CompositeService.init(CompositeService.java:59) ~[hive-service-2.3.4.jar:2.3.4]
        at org.apache.hive.service.server.HiveServer2.init(HiveServer2.java:142) ~[hive-service-2.3.4.jar:2.3.4]
        at org.apache.hive.service.server.HiveServer2.startHiveServer2(HiveServer2.java:607) [hive-service-2.3.4.jar:2.3.4]
        at org.apache.hive.service.server.HiveServer2.access$700(HiveServer2.java:100) [hive-service-2.3.4.jar:2.3.4]
        at org.apache.hive.service.server.HiveServer2$StartOptionExecutor.execute(HiveServer2.java:855) [hive-service-2.3.4.jar:2.3.4]
        at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:724) [hive-service-2.3.4.jar:2.3.4]
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221) [hadoop-common-2.7.1.jar:?]
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136) [hadoop-common-2.7.1.jar:?]
Caused by: java.lang.RuntimeException: Couldn't create directory /bdb/hive/local/jobslog/46fb3bec-497a-4029-8c10-13e409c15214_resources
        at org.apache.hadoop.hive.ql.util.ResourceDownloader.ensureDirectory(ResourceDownloader.java:116) ~[hive-exec-2.3.4.jar:2.3.4]
        at org.apache.hadoop.hive.ql.util.ResourceDownloader.<init>(ResourceDownloader.java:47) ~[hive-exec-2.3.4.jar:2.3.4]
        at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:397) ~[hive-exec-2.3.4.jar:2.3.4]
        at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:370) ~[hive-exec-2.3.4.jar:2.3.4]
        at org.apache.hive.service.cli.CLIService.applyAuthorizationConfigPolicy(CLIService.java:127) ~[hive-service-2.3.4.jar:2.3.4]
        at org.apache.hive.service.cli.CLIService.init(CLIService.java:114) ~[hive-service-2.3.4.jar:2.3.4]

解决方式

对应的文件目录不存在,创建即可

Metastore元信息没有初始化

2020-02-22T23:19:57,488  WARN [main] metastore.MetaStoreDirectSql: Self-test query [select "DB_ID" from "DBS"] failed; direct SQL is disabled
javax.jdo.JDODataStoreException: Error executing SQL query "select "DB_ID" from "DBS"".
	at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543) ~[datanucleus-api-jdo-4.2.4.jar:?]
	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) ~[datanucleus-api-jdo-4.2.4.jar:?]
	at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:216) ~[datanucleus-api-jdo-4.2.4.jar:?]

解决方式

初始化metastore schema

schematool -dbType mysql -initSchema

获取schema信息

schematool -dbType mysql -info

Hive Schema Tool:https://cwiki.apache.org/confluence/display/Hive/Hive+Schema+Tool

java.lang.NumberFormatException: For input string: “2181

 [main] zookeeper.ZooKeeper: Initiating client connection, connectString=      nameNode:2181,secondlyNameNode:2181,resourceManager:2181
    sessionTimeout=1200000 watcher=org.apache.curator.ConnectionState@4776e209
2020-02-23T11:01:57,485 ERROR [main] imps.CuratorFrameworkImpl: Background exception was not retry-able or retry gave up
java.lang.NumberFormatException: For input string: "2181
   "
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_191]
	at java.lang.Integer.parseInt(Integer.java:580) ~[?:1.8.0_191]

解决方式

检查hive.zookeeper.quorum配置的正确性

<property>
    <name>hive.zookeeper.quorum</name>
    <value>nameNode:2181,secondlyNameNode:2181,resourceManager:2181</value>
  </property>

WARN [main] server.HiveServer2: Error starting HiveServer2 on attempt 1, will retry in 60000ms

java.lang.RuntimeException: Error applying authorization policy on hive configuration: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1774)

解决方式

原因,由于我的hadoop是高可用的,nameNode,当前为standy模式,同时我的Thirft绑定的主机名为nameNode,我这边重启hadoop,完事了。

User is not allowed to impersonate anonymous

User is not allowed to impersonate anonymous (state=08S01,code=0) org.apache.hadoop.security.authorize.AuthorizationException

解决方式

主要是由于hadoop的代理用户访问和hive的thrift相关的用户校验配置不一致导致的

<!-- 这里要和hadoop,core-site.xml 代理用户要一直 -->
<property>
    <name>hive.server2.thrift.client.user</name>
    <value>hadoop</value>
    <description>Username to use against thrift client</description>
  </property>
<!-- 使用连接用户,执行Hive相关的操作 -->
<property>
    <name>hive.server2.enable.doAs</name>
    <value>false</value>
    <description>
      Setting this property to true will have HiveServer2 execute
      Hive operations as the user making the calls to it.
    </description>
  </property>
  <property>
    <name>system:java.io.tmpdir</name>
    <value>/bdp/hive/javaio</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>

core-site.xml配置

<property>
     <name>hadoop.proxyuser.hadoop.hosts</name>
     <value>*</value>
   </property>
   <property>
    <name>hadoop.proxyuser.hadoop.groups</name>
    <value>*</value>
</property>

统一用户名为hadoop。

User is not allowed to impersonate anonymous:https://stackoverflow.com/questions/52994585/user-is-not-allowed-to-impersonate-anonymous-state-08s01-code-0-org-apache-had beeline通过HiveServer2访问Hive的配置和操作:https://blog.csdn.net/wqhlmark64/article/details/77894026 hadoop的用户代理机制:https://blog.csdn.net/u012948976/article/details/49904675

java.net.URISyntaxException: Relative path in absolute URI

Error: java.io.IOException: java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:user.name%7D (state=,code=0)

解决方式

  <property>
    <name>system:java.io.tmpdir</name>
    <value>/bdp/hive/javaio</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>

java.net.URISyntaxException when starting HIVE :https://stackoverflow.com/questions/27099898/java-net-urisyntaxexception-when-starting-hive

Unknown command: connect

beeline> !connect jdbc:hive2://nameNode:2181,secondlyNameNode:2181,resourceManager:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk Unknown command: connect jdbc:hive2://nameNode:2181,secondlyNameNode:2181,resourceManager:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2_zk beeline> show databases; No current connection

这个一般使我们的命令是否有空格,或者字符中英文问题