开发者学堂课程【分布式数据库 HBase快速入门:HBase&MR 集成官方案例】学习笔记,与课程紧密联系,让用户快速学习知识。
课程地址:https://developer.aliyun.com/learning/course/101/detail/1754
HBase&MR 集成官方案例
内容介绍
一、MapReduce
二、官方 HBase-MapReduce
一、MapReduce.
通过 HBase 的相关 JavaAPI,我们可以实现伴随 HBase 操作的 MapReduce 过程,比如使用 MapReduce将数据从本地文件系统导入到 HBase 的表中,比如我们从 HBase 中读取一些原始数据后使用 MapReduce 做数据分析。
二、官方 HBase-MapReduce
1.查看 HBase 的 MapReduce 任务的执行
$ bin/ hbase mapredcp
2.环境变量的导入
(1)执行环境变量的导入(临时生效,在命令行执行下述操作)
$ export HBASE_HOME=/opt/module/hbase-1.3.1
$ export HADOOP_HOME=/ opt/module/hadoop-2.7.2
$ export HADOOP_CLAsSPATH=`${HBASE_HOME} /bin/ hbase mapredcp '
(2)永久生效:在/etc/profile 配置
export HBASE_HOME=/ opt/module/ hbase-1.3.1
export HADOOP_HOME=/ opt/module/ hadoop-2.7.2
并在hadoop-env.sh 中配置:(注意:在for循环之后配)
export HADOOP_CLASSPATH=SHADOOP_CLASSPATH:/opt/module/hbase/lib/*
Last login: wed Aug 8 13:44:28 2018 from 192.168.9.1[atguigu@hadoop102~~]$ cd /opt/module/hbase/
[atguigu@hadoop102 hbase]$ bin/hbase mapredcp
SLF4:"c1ass path contains muT1tiple sLF4] bindings.
SLF4: Found binding in [jar:file:/opt/modu le/hbase/1b/s1f4j-1og4j12-1.7.5.jar ! /org/s1f4j/imp1 /staticLoggerBinder.classj
SLF4]: Found binding in [jar:file: /opt/modu le/hadogp-2,7.2/share/hadoop/common/11b/s1f4j-1og4j12-1.7.10.jar ! /org/s1f4j7imp1/staticioggerBinder.cTass]
SLF4]: Actua1 bindina is of tvgecoues.n1m1#mult1ple_bindings for an explanation.
/opt/module/hbase/1ib/zookeeper-3.4.6.jar:7opt/moduie /hbase/11b/netty-a11-4.0.23.Fina1.jar:/opt/module/hbase/1ib/hbase-c1ient-1.3.1.jar: /opt/module/hbase/1i1b/metrics-core-2.2.0.jar:/opt/module /hbase/11b/hbase-prefix-tree-1.3.1.jar:/opt/modu 1e /hbase/1ib/hbase-common-1.3.1.jar:/opt/module /hbase/1ib/protobuf-java-2.5.0.jar:/opt/module /hbase/1ib/guava-12.0.1.jar:/opt/modu1e/hbase/1ib/htrace-core-3.1.0-incubating.jar :/opt/module /hbase/1i1b/hbase-protocoi-1.3.1.jar:/opt/modu1e/hbase/1ib/hbase-hadoop-compat-1.3.1.jar: /opt/modu le /hbase/1ib/hbase-server-1.3.1.jar
[atguigu@hadoop102 hbase]s lll
3.运行官方的 MapReduce 任务
--案例一:统计student表中有多少行数据
$ /opt/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jarrowcounter student
--案例二:使用MapReduce将本地数据导入到HBase
1)在本地创建一个tsv格式的文件:fruit.tsv
#export JSVC_HOME=${JSVC_HOME}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}
# Extra Java CLASSPATH elements. Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
if["$HADOOP__CLASSPATH"]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/module/hbase/lib/*
# The maximum amount of heap to use, in MB. Default is 1000.
#export HADOOP_HEAPSIZE=
#export HADOOP NAMENODE INIT HEAPSIZE=""
#Extra Java runtime options. Empty by default.
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true"
# Command specific options appended to HADOOP_0PTS when specified
Export HADOOP_NANENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS}
-Dhdfs.audit.log
export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS"
export HADOOP_SECONDARYNANENODE_OPTS="-Dhadoop.security.logger=$ (HADOoP_SECURITY_LOGCER:-INFO,RFAS]-Dhdfs.
1001 Apple Red
1002 Pear Yellow
1003 Pineapple Yellow
2)创建 HBase 表
hbase (main) : 001:0>create 'fruit' , 'info'
3)在HDFS中创建input_fruit文件夹并上传 fruit.tsv文件,
$ /opt/module/hadoop-2.7.2/bin/hdfs dfs -mkdir /input_fruit/
$ / opt/module/hadoop-2.7.2/bin/hdfs dfs -put fruit.tsv 7input_fruit/
4)执行MapReduce到HBase 的 fruit表中s
s /opt/module/hadoop-2.7.2/bin/yarn jar lib/hbase-server-1.3.1.jar
SLF4:Found binding in [jar:file:/opt/modu1e/hadoop-2.7.2/share/hadoop/common/11b/s1f4j-1og4j
12-1.7.10.jar ! /org/s1f4j/imp1/staticioggerBinder.c1ass]
SLF4]: Found binding in[jar:file:/opt7module/hbase/1ib/s1f4j-1og4j12-1.7.5.jar ! /org/s1f4j/imp1/staticLoggerBinder.class]
SLF4: see http: //www.s1f4j.org/codes.htm1#mu1tip1e_bindings for an_ exp1anation.SLF4]: Actua7 binding is of type_[org.s1f4j.imp7.Log4jLoggerFactory]
stopping namenodes on [hadoopi02]
hadoop1o2: stopping namenode
hadoop102: stopping datanode
hadoop103: stopping datanode
hadoop104: stopping datanode
stopping secondary namenodes [hadoop104]
hadoop104: stopping secondarynamenode
SLF4J: class path contains mu1tiple SLF4] bindings.
SLF4]: Found binding_in [jar:file: /opt/modu le/hadoop-2.7.2/share/hadoop /common/1ib/s1f4j-1og4j12-1.7.10.jar ! /org/s1f4j7imp1/staticioggerBinder.class]
SLF4]: Found binding inLjar:file:/opt7module/hbase/1ib/s1f4j-1og4j12-1.7.5.jar ! /org/s1f4j/imp1/staticLoggerBinder.c7ass]
SLF4]: see http: //www.s1f4j.org/codes.htm1#mu7tiple_bindings for an_ explanation.SLF4]: Actua7 binding is of type [org.s1f4j.imp7.Log4jLoggerFactory]
================正在关闭YARN================
stopping yarn daemons
stopping resourcemanager
hadoop1o2: stopping nodemanager
hadoop103: stopping nodemanager
hadoop104: stopping nodemanager
no proxyserverto stop
==============正在关闭JobHistoryserver==============
stopping historyserver
[atguigu@hadoop1o2 hbase]$ uti7.sh
===============atguigu@hadoop102==============
9998 jps
===============atguigu@hadoop103==============
7201 ps
==============atguiqu@hadoop104==============
9917 Jps
[atguigu@hadoop102 hbase]$ startcluster . sh
=============开始启动所有节点服务==============
=============正在启动zookeeper=============
ZooKeeper JMX enab7ed by defau1t
Using config: /opt/modu1e/zookeeper-3.4.10/bin/../conf/zoo.cfg
starting zookeeper ... STARTED
ZooKeeper JMX enabled by default
Using config: /opt/modu1e/zookeeper-3.4.10/bin/ ../conf/zoo.cfg
starting zookeeper ... STARTED
zookeeper JMXenabled by default
using config: /opt/modu1e/zookeeper-3.4.10/bin/ . ./conf/zoo.cfg
starting zookeeper ... STARTED
===========正在启动HDFS=========
SLF4: class path contains mu1tiple SLF4] bindings.
SLF4]: Found binding_ in [jar:file:/opt/modu le/hadoop-2.7.2/share/hadoop /common/1ib/s1f4j-l1og4i12-1.7.10.jar ! /org/s1f4j/imp1/staticioggerBinder.c1ass]
SLF4]: Found binding in[jar:file:/opt7/module /hbase/1ib/s1f4j-1og4j12-1.7.5.jar ! /org/s1f4j/imp1/staticLoggerBinder.class]
SLF4]: see http: //www.s1f4j.org/codes.htm1#mu1tiple_bindings_for an_ exp1anation.
SLF4]: Actua1 binding is of type [org.s1f4j.imp1.Log4jLoggerFactory]
starting namenodes on [hadoop102]
5)使用scan命令查看导入后的结果;
hbase (main) :001:0>scan 'fruit'
Tota1 time spent by aYl maps in occupied slots (ms)=5222Tota1 time spent by aii reduces in occupied siots (ms)=oTota1 time spent by ai1 map tasks (ms)=5222
Tota1 vcore-mi11iseconds taken by a1i'map tasks=5222
Tota1 megabyte-mi11iseconds taken by a17'map tasks=534732
Map-Reduce Framework
Map input records=3Map output records=3Input sp1it byte=96spi71ed Records=o
Failed shuffles=oMerged Map outputs=o
Gc time ei1apsed (ms)=272CPu time spent (ms)=2510
Physica1 memory (bytes snapshot=222109696virtua1 memory'(bytes) snapshot=3001012224Tota1 committed heap usage (bytes)=152043520
ImportTsv
Bad Lines=o
File Input Format countersFile output Format counters
Bytes Read=54
Bytes written=o
[atguigu@hadoop102 hbasejs