HIVE单机环境搭建
write by donaldhan, 2020-02-25 22:00引言
上一篇HIVE高可用环境的搭建,我们在3台机上搭建了基于Zookeeper的Hive高可用HA环境,同时使用HIVE CLI和Beeline, 体验了一些简单的DDL和DML。在配置的过程中metastore,要先初始化;另外hive.server2.thrift.bind.host配置,不同的机器,绑定的主机名为相应的主机,HiveServer2就是单点模式了。最重要的注意hive.server2.thrift.client.user配置用户,要与hadoop的core-site.xml中的代理用户名要一致。今天我们来搭建一个单机版,以为我们后面的DDL和DML做铺垫。
目录
环境准备
即然是单机版,今天我们用一台机pseduoDisHadoop,Hadoop的版本为2.7.1;使用的HIVE版本为:apache-hive-2.3.4。Hadoop我们搭建的是伪分布式模式。伪分布式的搭建,参见下面的文章。使用mysql作为hive的metastore。
Hadoop2.7.1伪分布式环境搭建:https://www.iteye.com/blog/donald-draper-2317446
拷贝mysqljar到HIVE的lib文件下
metastore使用
donaldhan@pseduoDisHadoop:/bdp/hive$ cp mysql-connector-java-5.1.41.jar apache-hive-2.3.4-bin/lib/
donaldhan@pseduoDisHadoop:/bdp/hive$ ls -al apache-hive-2.3.4-bin/lib | grep mysql
-rw-rw-r-- 1 donaldhan donaldhan 992805 Feb 13 22:00 mysql-connector-java-5.1.41.jar
-rw-r--r-- 1 donaldhan donaldhan 7954 Oct 25 2018 mysql-metadata-storage-0.9.2.jar
donaldhan@pseduoDisHadoop:/bdp/hive$
安装mysql
donaldhan@pseduoDisHadoop:/bdp/hive$ tar -xvf mysql-server_5.7.11-1ubuntu15.10_amd64.deb-bundle.tar -C mysql-server-5.7.11/
mysql-community-server_5.7.11-1ubuntu15.10_amd64.deb
libmysqlclient-dev_5.7.11-1ubuntu15.10_amd64.deb
libmysqld-dev_5.7.11-1ubuntu15.10_amd64.deb
mysql-client_5.7.11-1ubuntu15.10_amd64.deb
mysql-server_5.7.11-1ubuntu15.10_amd64.deb
mysql-community-client_5.7.11-1ubuntu15.10_amd64.deb
mysql-common_5.7.11-1ubuntu15.10_amd64.deb
mysql-community-test_5.7.11-1ubuntu15.10_amd64.deb
mysql-community-source_5.7.11-1ubuntu15.10_amd64.deb
mysql-community_5.7.11-1ubuntu15.10_amd64.changes
libmysqlclient20_5.7.11-1ubuntu15.10_amd64.deb
mysql-testsuite_5.7.11-1ubuntu15.10_amd64.deb
donaldhan@pseduoDisHadoop:/bdp/hive$ ls
apache-hive-2.3.4-bin apache-hive-2.3.4-bin.tar.gz haproxy-1.7.9.tar.gz mysql-connector-java-5.1.41.jar mysql-server-5.7.11 mysql-server_5.7.11-1ubuntu15.10_amd64.deb-bundle.tar
donaldhan@pseduoDisHadoop:/bdp/hive$ cd mysql-server-5.7.11/
donaldhan@pseduoDisHadoop:/bdp/hive/mysql-server-5.7.11$ ls
libmysqlclient20_5.7.11-1ubuntu15.10_amd64.deb mysql-common_5.7.11-1ubuntu15.10_amd64.deb mysql-community-source_5.7.11-1ubuntu15.10_amd64.deb
libmysqlclient-dev_5.7.11-1ubuntu15.10_amd64.deb mysql-community_5.7.11-1ubuntu15.10_amd64.changes mysql-community-test_5.7.11-1ubuntu15.10_amd64.deb
libmysqld-dev_5.7.11-1ubuntu15.10_amd64.deb mysql-community-client_5.7.11-1ubuntu15.10_amd64.deb mysql-server_5.7.11-1ubuntu15.10_amd64.deb
mysql-client_5.7.11-1ubuntu15.10_amd64.deb mysql-community-server_5.7.11-1ubuntu15.10_amd64.deb mysql-testsuite_5.7.11-1ubuntu15.10_amd64.deb
donaldhan@pseduoDisHadoop:/bdp/hive/mysql-server-5.7.11$
sudo dpkg -i mysql-common_*.deb
sudo dpkg -i {libmysqlclient20,libmysqlclient-dev,libmysqld-dev}_*.deb
sudo dpkg -i mysql-{community-client,client}_*.deb
配置mysql host
donaldhan@pseduoDisHadoop:/bdp/hive/mysql-server-5.7.11$ cat /etc/hosts
127.0.0.1 localhost
192.168.5.139 pseduoDisHadoop
192.168.3.106 mysqldb
*注意mysql我使用的是宿主机的mysql,本地的mysql安装存在问题,具体原因见上一篇文章。
配置全局HIVE HOME路径
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$ tail -n -6 ~/.bashrc
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP}/lib/native
export YARN_HOME=${HADOOP_HOME}
export HADOOP_OPT="-Djava.library.path=${HADOOP_HOME}/lib/native"
export HIVE_HOME=/bdp/hive/apache-hive-2.3.4-bin
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZOOKEEPER_HOME}/bin:${HBASE_HOME}/bin:${HIVE_HOME}/bin:${PATH}
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$
配置hive环境变量
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$ cp hive-env.sh.template hive-env.sh
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$ vim hive-env.sh
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$ tail -n 5 hive-env.sh
# export HIVE_AUX_JARS_PATH=
HADOOP_HOME=/bdp/hadoop/hadoop-2.7.1
HIVE_CONF_DIR=/bdp/hive/apache-hive-2.3.4-bin/conf
HIVE_AUX_JARS_PATH=/bdp/hive/apache-hive-2.3.4-bin/lib
配置hive-site
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$ cp hive-default.xml.template hive-site.xml
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$ vim hive-site.xml
具体如下:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
--><configuration>
<!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
<!-- WARNING!!! Any changes you make to this file will be ignored by Hive. -->
<!-- WARNING!!! You must make your changes in hive-site.xml instead. -->
<!-- Hive Execution Parameters -->
<property>
<name>hive.exec.scratchdir</name>
<value>/user/hive/tmp</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/bdp/hive/jobslog/${system:user.name}</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/bdp/hive/jobslog/${hive.session.id}_resources</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<!-- warehouse config -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<!-- metastore config -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://mysqldb:3306/single_hive_db?createDatabaseIfNotExist=true</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property>
<!-- thrift config -->
<property>
<name>hive.server2.thrift.bind.host</name>
<!-- 这个不同的机器上面,对应不同的主机名 -->
<value>pseduoDisHadoop</value>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
</property>
<property>
<name>hive.server2.thrift.http.port</name>
<value>10001</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.</description>
</property>
<!-- 这里要和hadoop,core-site.xml 代理用户要一致 -->
<property>
<name>hive.server2.thrift.client.user</name>
<value>hadoop</value>
<description>Username to use against thrift client</description>
</property>
<property>
<name>hive.server2.thrift.client.password</name>
<value>123456</value>
<description>Password to use against thrift client</description>
</property>
<!-- 使用连接用户,执行Hive相关的操作 -->
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
<description>
Setting this property to true will have HiveServer2 execute
Hive operations as the user making the calls to it.
</description>
</property>
<!-- java.io config -->
<property>
<name>system:java.io.tmpdir</name>
<value>/bdp/hive/javaio</value>
</property>
<property>
<name>system:user.name</name>
<value>${user.name}</value>
</property>
</configuration>
修改日志目录
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$ cp hive-log4j2.properties.template hive-log4j2.properties
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$ vim hive-log4j2.propertie
<!-- 修改如下属性 -->
property.hive.log.dir = /bdp/hive/log
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$ cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/conf$ vim hive-exec-log4j2.properties
<!-- 修改如下属性 -->
property.hive.log.dir = /bdp/hive/exelog
启动Hadoop集群
启动hadoop集群
start-dfs.sh
查看启动进程
donaldhan@pseduoDisHadoop:/bdp/hadoop/hadoop-2.7.1$ jps
2354 NameNode
2883 Jps
2730 SecondaryNameNode
2494 DataNode
创建HIVE数仓目录和job中间目录
donaldhan@pseduoDisHadoop:/bdp/hadoop/hadoop-2.7.1/etc$ hdfs dfs -mkdir -p /user/hive/warehouse
donaldhan@pseduoDisHadoop:/bdp/hadoop/hadoop-2.7.1/etc$ hdfs dfs -mkdir -p /user/hive/tmp
donaldhan@pseduoDisHadoop:/bdp/hadoop/hadoop-2.7.1/etc$ hdfs dfs -chmod 777 /user/hive/tmp
donaldhan@pseduoDisHadoop:/bdp/hadoop/hadoop-2.7.1/etc$ hdfs dfs -chmod 777 /user/hive/warehouse
donaldhan@pseduoDisHadoop:/bdp/hadoop/hadoop-2.7.1/etc$ hdfs dfs -ls /user/hive/
Found 2 items
drwxrwxrwx - donaldhan supergroup 0 2020-02-22 22:59 /user/hive/tmp
drwxrwxrwx - donaldhan supergroup 0 2020-02-22 22:59 /user/hive/warehouse
donaldhan@pseduoDisHadoop:/bdp/hadoop/hadoop-2.7.1/etc$
初始化Metastore元信息Schema
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/bin$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bdp/hive/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/bdp/hadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://mysqldb:3306/hive_db?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
查看元信息Schema
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/bin$ schematool -dbType mysql -info
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bdp/hive/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/bdp/hadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://mysqldb:3306/hive_db?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Hive distribution version: 2.3.0
Metastore schema version: 2.3.0
schemaTool completed
启动HiveServe2
donaldhan@pseduoDisHadoop:/bdp/hive/apache-hive-2.3.4-bin/bin$ ./hiveserver2
2020-02-23 11:11:32: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bdp/hive/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/bdp/hadoop/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
查看启动日志
donaldhan@pseduoDisHadoop:/bdp/hive/log$ tail -f hive.log
2020-02-23T11:11:08,679 INFO [Thread-1] server.HiveServer2: Web UI has stopped
2020-02-23T11:11:08,680 INFO [Thread-1] server.HiveServer2: Server instance removed from ZooKeeper.
...
vletContextHandler{/logs,file:/bdp/hive/log/}
2020-02-23T11:11:51,809 INFO [main] server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:10002
2020-02-23T11:11:51,813 INFO [main] server.HiveServer2: Web UI has started on port 10002
2020-02-23T11:11:51,813 INFO [main] http.HttpServer: Started HttpServer[hiveserver2] on port 10002
在mysql的hive_db数据库中, 可以查看metastore schema的表信息
aux_table, bucketing_cols, cds, columns_v2, compaction_queue, completed_compactions, completed_txn_components, database_params, db_privs, dbs, delegation_tokens, func_ru, funcs, global_privs, hive_locks, idxs, index_params, key_constraints, master_keys, next_compaction_queue_id, next_lock_id, next_txn_id, notification_log, notification_sequence, nucleus_tables, part_col_privs, part_col_stats, part_privs, partition_events, partition_key_vals, partition_keys, partition_params, partitions, role_map, roles, sd_params, sds, sequence_table, serde_params, serdes, skewed_col_names, skewed_col_value_loc_map, skewed_string_list, skewed_string_list_values, skewed_values, sort_cols, tab_col_stats, table_params, tbl_col_privs, tbl_privs, tbls, txn_components, txns, type_fields, types, version, write_set
具体的Metadata的表关系图,可以通过下面的链接找到
All the metadata for Hive tables and partitions are accessed through the Hive Metastore. Metadata is persisted using JPOX ORM solution (Data Nucleus) so any database that is supported by it can be used by Hive. Most of the commercial relational databases and many open source databases are supported. See the list of supported databases in section below. You can find an E/R diagram for the metastore here.
AdminManual Metastore Administration:https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration
查看进程,多了一个RunJar的进行,启动成功
donaldhan@pseduoDisHadoop:/bdp/hadoop/hadoop-2.7.1$ jps
2354 NameNode
3714 RunJar
2730 SecondaryNameNode
3820 Jps
2494 DataNode
同时访问
HiveServerWebUI:http://pseduoDisHadoop:10002/
可以查看HIVE的相关会话,查询及配置,日志等信息。
CLI 命令行
HIVE命令行
donaldhan@pseduoDisHadoop:~$ hive
...
using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
OK
default
Time taken: 11.368 seconds, Fetched: 1 row(s)
hive> create database test;
OK
Time taken: 0.586 seconds
hive> show databases;
OK
default
test
Time taken: 0.045 seconds, Fetched: 2 row(s)
hive>
beeline 命令行
donaldhan@pseduoDisHadoop:~$ beeline
...
beeline> !connect jdbc:hive2://pseduoDisHadoop:10000
Connecting to jdbc:hive2://pseduoDisHadoop:10000
Enter username for jdbc:hive2://pseduoDisHadoop:10000: hadoop
Enter password for jdbc:hive2://pseduoDisHadoop:10000: ******
Connected to: Apache Hive (version 2.3.4)
Driver: Hive JDBC (version 2.3.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://pseduoDisHadoop:10000> show databases;
+----------------+
| database_name |
+----------------+
| default |
| test |
+----------------+
2 rows selected (3.638 seconds)
0: jdbc:hive2://pseduoDisHadoop:10000> desc database extended test;
+----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| db_name | comment | location | owner_name | owner_type | parameters |
+----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| test | | hdfs://pseduoDisHadoop:9000/user/hive/warehouse/test.db | donaldhan | USER | |
+----------+----------+----------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.385 seconds)
0: jdbc:hive2://pseduoDisHadoop:10000>
至此我们,单机版搭建完毕,下一篇我们来操作一下DDL和DML。
总结
附
Call From java.net.UnknownHostException: ubuntu
hdfs dfs -mkdir -p /user/hive/warehouse mkdir: Call From java.net.UnknownHostException: ubuntu: ubuntu: Name or service not known to pseduoDisHadoop:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
解决方式
是由于hostname绑定问题,修改hostname如下,重启hadoop即可。
donaldhan@ubuntu:/bdp/hadoop/hadoop-2.7.1$ cat /etc/hostname
pseduoDisHadoop
Beeline验证后的Hadoop访问权限问题
如果遇到Beeline验证后的Hadoop访问权限问题,我们可以修改hadoop相应的配置,目前我没有遇到。
开启hadoop的webhdfs配置 hdfs-site.xml 文件配置
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
配置hadoop访问用户
core-site.xml 配置, 这个配置很重要 name页签中的hadoop登录hdfs的具体的用户名,如果写错的话,我们使用hadoop用户,访问hive的时候会没有权限访问。
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>