Hadoop MapReduce编程:计算极值

简介:

现在,我们不是计算一个最大值了(想了解如何计算最大值,可以参考Hadoop MapReduce编程:计算最大值),而是计算一个最大值和一个最小值。实际上,实现Mapper和Reducer也是非常简单的,但是我们要输出结果,需要能够区分出最大值和最小值,并同时输出结果,这就需要自定义自己的输出类型,以及需要定义输出格式。

测试数据

数据格式,如下所示:

01 SG 253654006139495 253654006164392 619850464
02 KG 253654006225166 253654006252433 743485698
03 UZ 253654006248058 253654006271941 570409379
04 TT 253654006282019 253654006286839 23236775
05 BE 253654006276984 253654006301435 597874033
06 BO 253654006293624 253654006315946 498265375
07 SR 253654006308428 253654006330442 484613339
08 SV 253654006320312 253654006345405 629640166
09 LV 253654006330384 253654006359891 870680704
10 FJ 253654006351709 253654006374468 517965666

上面文本数据一行一行存储,一行包含4部分,分别表示:

  1. 国家代码
  2. 起始时间
  3. 截止时间
  4. 随机成本/权重估值

各个字段之间以空格号分隔。我们要计算的结果是,求各个国家(以国家代码标识)的成本估值的极值(最大值和最小值)。

编程实现

首先,我们应该考虑需要实现哪些内容,才能满足我们的编程需要。
Mapper中输入数据类型分别为:K1=>LongWritable,V1=>Text,K1表示文本文件中行偏移量,V1表示一行的文本内容;输出数据类型分别为:K2=>Text,V2=>LongWritable,K2表示我们解析出来的国家代码,是字符串类型,V2表示成本估算值,是一个数字类型。
而Reducer的输入即为<K2, list>,输出我们设计为,K3=>Text,是国家代码,V3是我们自己定义的类型,这个类中应该包含最大值和最小值。
另外,因为Reducer运行阶段我们得到最终的输出,而且是输出到HDFS中。我们设计了输出对象的类型,但是还需要设计一个用来描述输出数据的格式,其中包含了应该如何将最终的数据写入到HDFS中。
有关Mapper的实现,可以参考Hadoop MapReduce编程:计算最大值,这里,为了计算极值,需要实现如下内容:

  1. Reducer输出类型
  2. Reducer输出规格说明的定义
  3. Mapper实现
  4. Reducer实现
  5. 配置Job

下面,我们详细说明如何去实现:

  • Reducer输出类型

Reducer的输出Value类型应该定义两个字段:最大值和最小值,这样才能在最终的输出中同时看到同一个国家代码对应的极值数据。我们定义了Extremum类来代表极值,实现代码如下所示:

01 package org.shirdrn.kodz.inaction.hadoop.extremum;
02
03 import java.io.DataInput;
04 import java.io.DataOutput;
05 import java.io.IOException;
06
07 import org.apache.hadoop.io.LongWritable;
08 import org.apache.hadoop.io.Writable;
09
10 public class Extremum implements Writable {
11
12 private LongWritable maxValue;
13 private LongWritable minValue;
14
15 public Extremum() {
16 super();
17 maxValue = new LongWritable(0);
18 minValue = new LongWritable(0);
19 }
20
21 public Extremum(long min, long max) {
22 super();
23 this.minValue = new LongWritable(min);
24 this.maxValue = new LongWritable(max);
25 }
26
27 @Override
28 public void readFields(DataInput in) throws IOException {
29 minValue.readFields(in);
30 maxValue.readFields(in);
31 }
32
33 @Override
34 public void write(DataOutput out) throws IOException {
35 minValue.write(out);
36 maxValue.write(out);
37 }
38
39 public LongWritable getMaxValue() {
40 return maxValue;
41 }
42
43 public void setMaxValue(LongWritable maxValue) {
44 this.maxValue = maxValue;
45 }
46
47 public LongWritable getMinValue() {
48 return minValue;
49 }
50
51 public void setMinValue(LongWritable minValue) {
52 this.minValue = minValue;
53 }
54
55 @Override
56 public String toString() {
57 StringBuilder builder = new StringBuilder();
58 builder.append(minValue.get()).append("\t").append(maxValue.get());
59 return builder.toString();
60 }
61 }

我们自己定义的类型必须实现Hadoop定义的Writable接口,这样才能够实用Hadoop的序列化机制,最终将数据写入到HDFS。该接口定义了两个方法,分别对应于序列化和反序列化操作。
这个自定义类型中,封装了最大值和最小值两个字段。

  • Reducer输出规格说明的定义

Reducer输出,实际上就是我们写个这个Job的输出。我们定义了ExtremumOutputFormat类,该类描述Reducer输出规格的,你可以参考Hadoop自带的TextOutputFormat类,重写自带的getRecordWriter方法,来实现我们自己输出结果的操作。ExtremumOutputFormat类实现代码如下所示:

01 package org.shirdrn.kodz.inaction.hadoop.extremum;
02
03 import java.io.DataOutputStream;
04 import java.io.IOException;
05 import java.io.UnsupportedEncodingException;
06
07 import org.apache.hadoop.conf.Configuration;
08 import org.apache.hadoop.fs.FSDataOutputStream;
09 import org.apache.hadoop.fs.FileSystem;
10 import org.apache.hadoop.fs.Path;
11 import org.apache.hadoop.io.Text;
12 import org.apache.hadoop.io.compress.CompressionCodec;
13 import org.apache.hadoop.io.compress.GzipCodec;
14 import org.apache.hadoop.mapreduce.RecordWriter;
15 import org.apache.hadoop.mapreduce.TaskAttemptContext;
16 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
17 import org.apache.hadoop.util.ReflectionUtils;
18
19 public class ExtremumOutputFormat extends TextOutputFormat<Text, Extremum> {
20
21 @Override
22 public RecordWriter<Text, Extremum> getRecordWriter(TaskAttemptContext job)
23 throws IOException, InterruptedException {
24
25 Configuration conf = job.getConfiguration();
26 boolean isCompressed = getCompressOutput(job); // 是否压缩输出结果
27 String fieldSeparator= conf.get("mapred.textoutputformat.separator", "\t");
28 CompressionCodec codec = null;
29 String extension = "";
30 if (isCompressed) {
31 Class<? extends CompressionCodec> codecClass = getOutputCompressorClass(job, GzipCodec.class);
32 codec = (CompressionCodec) ReflectionUtils.newInstance(codecClass, conf);
33 extension = codec.getDefaultExtension();
34 }
35 Path file = getDefaultWorkFile(job, extension);
36 FileSystem fs = file.getFileSystem(conf);
37 FSDataOutputStream fileOut = fs.create(file, false);
38 if (!isCompressed) {
39 return new ExtremumRecordWriter(fileOut, fieldSeparator);
40 } else {
41 DataOutputStream out = newDataOutputStream(codec.createOutputStream(fileOut));
42 return new ExtremumRecordWriter(out, fieldSeparator);
43 }
44 }
45
46 public static class ExtremumRecordWriter extends RecordWriter<Text, Extremum> {
47
48 private static final String CHARSET = "UTF-8";
49 protected DataOutputStream out;
50 private final byte[] fieldSeparator;
51 private static final byte[] NEWLINE;
52 static {
53 try {
54 NEWLINE = "\n".getBytes(CHARSET);
55 } catch (UnsupportedEncodingException uee) {
56 throw new IllegalArgumentException("can't find " + CHARSET + " encoding.");
57 }
58 }
59
60 public ExtremumRecordWriter(DataOutputStream out) {
61 this(out, "\t");
62 }
63
64 public ExtremumRecordWriter(DataOutputStream out, String fieldSeparator) {
65 super();
66 this.out = out;
67 try {
68 this.fieldSeparator = fieldSeparator.getBytes(CHARSET);
69 } catch (UnsupportedEncodingException e) {
70 throw new IllegalArgumentException("can't find " + CHARSET + " encoding.");
71 }
72 }
73
74 @Override
75 public synchronized void close(TaskAttemptContext context) throwsIOException,
76 InterruptedException {
77 out.close();
78 }
79
80 @Override
81 public synchronized void write(Text key, Extremum value) throws IOException,
82 InterruptedException {
83 if(key != null) {
84 out.write(key.getBytes(), 0, key.getLength());
85 out.write(fieldSeparator);
86 }
87 if(value !=null) {
88 out.write(value.getMinValue().toString().getBytes());
89 out.write(fieldSeparator);
90 out.write(value.getMaxValue().toString().getBytes());
91 }
92 out.write(NEWLINE);
93 }
94
95 }
96 }

我们实现的ExtremumOutputFormat类,在getRecordWriter方法中返回一个ExtremumRecordWriter实例,这个实例就是用来执行写入输出结果的,上面输出结果的格式就是“国家代码最小值最大值”,各个字段时间使用TAB分隔,一共三列。

  • Mapper实现

Mapper实现就是解析一行文本数据,抽取出国家代码和成本估值,直接列出我们实现的ExtremunGlobalCostMapper累代码,如下所示:

01 package org.shirdrn.kodz.inaction.hadoop.extremum;
02
03 import java.io.IOException;
04
05 import org.apache.hadoop.io.LongWritable;
06 import org.apache.hadoop.io.Text;
07 import org.apache.hadoop.mapreduce.Mapper;
08
09 public class ExtremunGlobalCostMapper extends
10 Mapper<LongWritable, Text, Text, LongWritable> {
11
12 private LongWritable costValue = new LongWritable(1);
13 private Text code = new Text();
14
15 @Override
16 protected void map(LongWritable key, Text value, Context context)
17 throws IOException, InterruptedException {
18 // a line, such as 'SG 253654006139495 253654006164392 619850464'
19 String line = value.toString();
20 String[] array = line.split("\\s");
21 if (array.length == 4) {
22 String countryCode = array[0];
23 String strCost = array[3];
24 long cost = 0L;
25 try {
26 cost = Long.parseLong(strCost);
27 } catch (NumberFormatException e) {
28 cost = 0L;
29 }
30 if (cost != 0) {
31 code.set(countryCode);
32 costValue.set(cost);
33 context.write(code, costValue);
34 }
35 }
36 }
37
38 }
  • Reducer实现

Reducer实现也不是很复杂,需要注意的是,计算出最小值和最大值之后,将它们包装到非Hadoop定义的类型的实例中,这里是Extremum类。我们实现的ExtremumGlobalCostReducer类,代码如下所示:

01 package org.shirdrn.kodz.inaction.hadoop.extremum;
02
03 import java.io.IOException;
04 import java.util.Iterator;
05
06 import org.apache.hadoop.io.LongWritable;
07 import org.apache.hadoop.io.Text;
08 import org.apache.hadoop.mapreduce.Reducer;
09
10 public class ExtremumGlobalCostReducer extends
11 Reducer<Text, LongWritable, Text, Extremum> {
12
13 @Override
14 protected void reduce(Text key, Iterable<LongWritable> values,
15 Context context) throws IOException, InterruptedException {
16 long max = 0L;
17 long min = Long.MAX_VALUE;
18 Iterator<LongWritable> iter = values.iterator();
19 while (iter.hasNext()) {
20 LongWritable current = iter.next();
21 if (current.get() > max) {
22 max = current.get();
23 }
24 if (current.get() < min) {
25 min = current.get();
26 }
27 }
28 Extremum extremum = new Extremum(min, max);
29 context.write(key, extremum);
30 }
31 }
  • 配置Job

配置Job,需要按照我们定义的输出值类型,以及输出规格说明来进行配置,我们实现的MapReduce程序的配置逻辑,实现类为ExtremumCostDriver,代码如下所示:

01 package org.shirdrn.kodz.inaction.hadoop.extremum;
02
03 import java.io.IOException;
04
05 import org.apache.hadoop.conf.Configuration;
06 import org.apache.hadoop.fs.Path;
07 import org.apache.hadoop.io.LongWritable;
08 import org.apache.hadoop.io.Text;
09 import org.apache.hadoop.mapreduce.Job;
10 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
11 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
12 import org.apache.hadoop.util.GenericOptionsParser;
13
14 public class ExtremumCostDriver {
15
16 public static void main(String[] args) throws IOException,
17 InterruptedException, ClassNotFoundException {
18
19 Configuration conf = new Configuration();
20 String[] otherArgs = new GenericOptionsParser(conf, args)
21 .getRemainingArgs();
22 if (otherArgs.length != 2) {
23 System.err.println("Usage: extremumcost <in> <out>");
24 System.exit(2);
25 }
26
27 Job job = new Job(conf, "extremum cost");
28
29 job.setJarByClass(ExtremumCostDriver.class);
30 job.setMapperClass(ExtremunGlobalCostMapper.class);
31 job.setReducerClass(ExtremumGlobalCostReducer.class);
32
33 job.setMapOutputKeyClass(Text.class);
34 job.setMapOutputValueClass(LongWritable.class);
35 job.setOutputKeyClass(Text.class);
36 job.setOutputValueClass(Extremum.class);
37 job.setOutputFormatClass(ExtremumOutputFormat.class);
38
39 job.setNumReduceTasks(2);
40
41 FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
42 FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
43
44 int exitFlag = job.waitForCompletion(true) ? 0 : 1;
45 System.exit(exitFlag);
46 }
47 }

这里,一定要正确设置对应阶段的Key和Value输出类型,以及我们定义的输出规格描述类型。另外,我们设置了启动2个Reduce任务,最终会输出2个结果文件。

运行程序

下面看运行程序的过程:

  • 编译代码(我直接使用Maven进行),打成jar文件
1 shirdrn@SYJ:~/programs/eclipse-jee-juno/workspace/kodz-all/kodz-hadoop/target/classes$ jar -cvf global-extremum-cost.jar -C ./ org
  • 拷贝上面生成的jar文件,到NameNode环境中
1 scp shirdrn@172.0.8.212:~/programs/eclipse-jee-juno/workspace/kodz-all/kodz-hadoop/target/classes/global-extremum-cost.jar ./xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -copyFromLocal /opt/stone/cloud/dataset/data_10m /user/xiaoxiang/datasets/cost/
  • 上传待处理的数据文件
1 xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -copyFromLocal /opt/stone/cloud/dataset/data_10m /user/xiaoxiang/datasets/cost/
  • 运行我们编写MapReduce任务,计算极值
1 xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop jar global-extremum-cost.jar org.shirdrn.kodz.inaction.hadoop.extremum.ExtremumCostDriver /user/xiaoxiang/datasets/cost /user/xiaoxiang/output/extremum/cost
  • 运行输出

运行过程控制台输出内容,如下所示:

01 13/03/22 21:38:46 INFO input.FileInputFormat: Total input paths to process : 1
02 13/03/22 21:38:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library
03 13/03/22 21:38:46 WARN snappy.LoadSnappy: Snappy native library not loaded
04 13/03/22 21:38:46 INFO mapred.JobClient: Running job: job_201303111631_0012
05 13/03/22 21:38:47 INFO mapred.JobClient: map 0% reduce 0%
06 13/03/22 21:39:03 INFO mapred.JobClient: map 21% reduce 0%
07 13/03/22 21:39:06 INFO mapred.JobClient: map 28% reduce 0%
08 13/03/22 21:39:18 INFO mapred.JobClient: map 48% reduce 4%
09 13/03/22 21:39:21 INFO mapred.JobClient: map 57% reduce 9%
10 13/03/22 21:39:34 INFO mapred.JobClient: map 78% reduce 14%
11 13/03/22 21:39:37 INFO mapred.JobClient: map 85% reduce 19%
12 13/03/22 21:39:49 INFO mapred.JobClient: map 100% reduce 23%
13 13/03/22 21:39:52 INFO mapred.JobClient: map 100% reduce 28%
14 13/03/22 21:40:01 INFO mapred.JobClient: map 100% reduce 30%
15 13/03/22 21:40:04 INFO mapred.JobClient: map 100% reduce 33%
16 13/03/22 21:40:07 INFO mapred.JobClient: map 100% reduce 66%
17 13/03/22 21:40:10 INFO mapred.JobClient: map 100% reduce 100%
18 13/03/22 21:40:15 INFO mapred.JobClient: Job complete: job_201303111631_0012
19 13/03/22 21:40:15 INFO mapred.JobClient: Counters: 29
20 13/03/22 21:40:15 INFO mapred.JobClient: Job Counters
21 13/03/22 21:40:15 INFO mapred.JobClient: Launched reduce tasks=2
22 13/03/22 21:40:15 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=98141
23 13/03/22 21:40:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
24 13/03/22 21:40:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
25 13/03/22 21:40:15 INFO mapred.JobClient: Launched map tasks=7
26 13/03/22 21:40:15 INFO mapred.JobClient: Data-local map tasks=7
27 13/03/22 21:40:15 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=111222
28 13/03/22 21:40:15 INFO mapred.JobClient: File Output Format Counters
29 13/03/22 21:40:15 INFO mapred.JobClient: Bytes Written=4339
30 13/03/22 21:40:15 INFO mapred.JobClient: FileSystemCounters
31 13/03/22 21:40:15 INFO mapred.JobClient: FILE_BYTES_READ=260079964
32 13/03/22 21:40:15 INFO mapred.JobClient: HDFS_BYTES_READ=448913653
33 13/03/22 21:40:15 INFO mapred.JobClient: FILE_BYTES_WRITTEN=390199096
34 13/03/22 21:40:15 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=4339
35 13/03/22 21:40:15 INFO mapred.JobClient: File Input Format Counters
36 13/03/22 21:40:15 INFO mapred.JobClient: Bytes Read=448912799
37 13/03/22 21:40:15 INFO mapred.JobClient: Map-Reduce Framework
38 13/03/22 21:40:15 INFO mapred.JobClient: Map output materialized bytes=130000084
39 13/03/22 21:40:15 INFO mapred.JobClient: Map input records=10000000
40 13/03/22 21:40:15 INFO mapred.JobClient: Reduce shuffle bytes=116610358
41 13/03/22 21:40:15 INFO mapred.JobClient: Spilled Records=30000000
42 13/03/22 21:40:15 INFO mapred.JobClient: Map output bytes=110000000
43 13/03/22 21:40:15 INFO mapred.JobClient: CPU time spent (ms)=121520
44 13/03/22 21:40:15 INFO mapred.JobClient: Total committed heap usage (bytes)=1763442688
45 13/03/22 21:40:15 INFO mapred.JobClient: Combine input records=0
46 13/03/22 21:40:15 INFO mapred.JobClient: SPLIT_RAW_BYTES=854
47 13/03/22 21:40:15 INFO mapred.JobClient: Reduce input records=10000000
48 13/03/22 21:40:15 INFO mapred.JobClient: Reduce input groups=233
49 13/03/22 21:40:15 INFO mapred.JobClient: Combine output records=0
50 13/03/22 21:40:15 INFO mapred.JobClient: Physical memory (bytes) snapshot=1973850112
51 13/03/22 21:40:15 INFO mapred.JobClient: Reduce output records=233
52 13/03/22 21:40:15 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4880068608
53 13/03/22 21:40:15 INFO mapred.JobClient: Map output records=10000000
  • 验证输出结果
001 xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -cat/user/xiaoxiang/output/extremum/cost/part-r-00000
002 121
003 xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -cat/user/xiaoxiang/output/extremum/cost/part-r-00001
004 112
005 xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -cat/user/xiaoxiang/output/extremum/cost/part-r-00000
006 AD 43328 999974516
007 AF 11795 999996180
008 AL 11148 999998489
009 AR 33649 999953989
010 AT 5051 999999909
011 AZ 3726 999996557
012 BA 89066 999949773
013 BE 28187 999925057
014 BG 64672 999971528
015 BI 50687 999978516
016 BM 14127 999991925
017 BO 61786 999995482
018 BS 52428 999980931
019 BW 78739 999935985
020 BY 39216 999998496
021 CD 51328 999978139
022 CF 5084 999995342
023 CH 17996 999997524
024 CL 359 999967083
025 CN 17985 999975367
026 CR 17435 999971685
027 CV 16166 999990543
028 CX 74232 999987579
029 CZ 1400 999993908
030 DE 1676 999985416
031 DK 84072 999963312
032 DM 57727 999941706
033 DO 9162 999945597
034 ER 60594 999980425
035 ET 21273 999987033
036 FI 19268 999966243
037 FK 39074 999966573
038 FM 9493 999972146
039 FO 47655 999988472
040 GB 12629 999970658
041 GD 7713 999996318
042 GF 42623 999982024
043 GH 9904 999941039
044 GL 16512 999948726
045 GN 2779 999951804
046 GP 64525 999904645
047 GR 44233 999999672
048 GT 159 999972984
049 HK 24464 999970084
050 HU 23165 999997568
051 ID 19812 999994762
052 IL 14553 999982184
053 IN 25241 999914991
054 IR 5173 999986780
055 IT 19399 999997239
056 JM 30194 999982209
057 JO 5149 999977276
058 KH 5010 999975644
059 KN 597 999991068
060 KP 61560 999967939
061 KR 13295 999992162
062 KZ 5565 999992835
063 LA 4002 999989151
064 LC 8509 999962233
065 LI 54653 999986863
066 LK 12556 999989876
067 LS 6702 999957706
068 LU 17627 999999823
069 LY 41618 999992365
070 MD 8494 999996042
071 MH 1050 999989668
072 ML 30858 999990079
073 MN 4236 999969051
074 MP 8422 999995234
075 MR 14023 999982303
076 MT 91203 999982604
077 MV 15186 999961206
078 MX 15807 999978066
079 MZ 14800 999981189
080 NA 7673 999961177
081 NC 9467 999961053
082 NE 1802 999990091
083 NG 5189 999985037
084 NI 29440 999965733
085 NO 20466 999993122
086 NU 17175 999987046
087 PA 51054 999924435
088 PE 10286 999981176
089 PG 119327 999987347
090 PK 1041 999954268
091 PM 1435 999998975
092 PW 40353 999991278
093 PY 9586 999985509
094 RE 5640 999952291
095 RO 7139 999994148
096 RS 1342 999999923
097 RU 11319 999894985
098 RW 815 999980184
099 SB 15446 999972832
100 SD 14060 999963744
101 SH 1276 999983638
102 SL 35087 999999269
103 SN 117358 999990278
104 SR 12974 999975964
105 ST 3796 999980447
106 SV 20563 999999945
107 SX 41237 999903445
108 SZ 18993 999992537
109 TC 46396 999969540
110 TG 58484 999977640
111 TK 11880 999971131
112 TM 25103 999958998
113 TO 40829 999947915
114 TW 5437 999975092
115 UZ 2620 999982762
116 VA 774 999975548
117 VC 9514 999991495
118 VE 12156 999997971
119 VG 32832 999949690
120 VI 895 999990063
121 VU 1375 999953162
122 WF 62709 999947666
123 YT 29640 999994707
124 ZA 8399 999998692
125 ZM 25699 999973392
126 ZW 77984 999928087
127 xiaoxiang@ubuntu3:/opt/stone/cloud/hadoop-1.0.3$ bin/hadoop fs -cat/user/xiaoxiang/output/extremum/cost/part-r-00001
128 AE 1870 999938630
129 AG 7701 999991085
130 AI 8609 999989595
131 AM 16681 999976746
132 AO 17126 999989628
133 AQ 40493 999995031
134 AS 3096 999935982
135 AU 13311 999937089
136 AW 1734 999965784
137 BB 27334 999987345
138 BD 27928 999992272
139 BF 20745 999999220
140 BH 4980 999994900
141 BJ 11385 999977886
142 BN 33440 999986630
143 BR 56601 999989947
144 BT 84033 999977488
145 BZ 27250 999975972
146 CA 54534 999978275
147 CC 753 999968311
148 CG 10644 999788112
149 CI 17263 999998864
150 CK 7710 999968719
151 CM 12402 999998369
152 CO 7616 999999167
153 CU 30200 999976352
154 CW 6597 999987713
155 CY 5686 999982925
156 DJ 30348 999997438
157 DZ 31633 999973610
158 EC 15342 999920447
159 EE 3834 999949534
160 EG 60885 999980522
161 ES 540 999949155
162 FJ 14016 999990686
163 FR 14682 999988342
164 GA 54051 999982099
165 GE 25641 999991970
166 GI 9387 999995295
167 GM 12637 999967823
168 GQ 13033 999988635
169 GU 68211 999919056
170 GW 34868 999962551
171 GY 4633 999999881
172 HN 18264 999972628
173 HR 18192 999986688
174 HT 2732 999970913
175 IE 23339 999996686
176 IM 47842 999987831
177 IO 2457 999968575
178 IQ 75 999990126
179 IS 24153 999973585
180 JP 8463 999983684
181 KE 45373 999996012
182 KG 5037 999991556
183 KI 1146 999994328
184 KM 41461 999989895
185 KW 20441 999924295
186 KY 23539 999977105
187 LB 15169 999963014
188 LR 76821 999897202
189 LT 346 999999688
190 LV 17147 999945411
191 MA 22225 999922726
192 MC 4131 999978886
193 MG 15255 999996602
194 MK 21709 999968900
195 MM 33329 999987977
196 MO 19139 999977975
197 MQ 57391 999913110
198 MS 38541 999974690
199 MU 51654 999988632
200 MW 16153 999991903
201 MY 1575 999995010
202 NF 9028 999989399
203 NL 22375 999949789
204 NP 15809 999972410
205 NR 11740 999956464
206 NZ 4921 999998214
207 OM 20128 999967428
208 PF 8544 999959978
209 PH 58869 999981534
210 PL 7105 999996619
211 PR 34529 999906386
212 PT 50257 999993404
213 QA 33588 999995061
214 SA 54320 999973822
215 SC 64674 999973271
216 SE 13044 999972256
217 SG 20304 999977637
218 SI 17186 999980580
219 SK 7472 999998152
220 SM 5636 999941188
221 SO 52133 999973175
222 SY 36761 999988858
223 TD 98111 999999303
224 TH 11173 999968746
225 TJ 19322 999983666
226 TN 40668 999963035
227 TP 40353 999986796
228 TR 4495 999995112
229 TT 32136 999984435
230 TV 40558 999971989
231 TZ 5852 999992734
232 UA 12571 999970993
233 UG 4142 999976267
234 UM 5465 999998377
235 US 52833 999912229
236 UY 12782 999989662
237 VN 45107 999974393
238 WS 19808 999970242
239 YE 3291 999984650

可见,结果符合预期。

目录
相关文章
|
3月前
|
分布式计算 资源调度 Hadoop
大数据-80 Spark 简要概述 系统架构 部署模式 与Hadoop MapReduce对比
大数据-80 Spark 简要概述 系统架构 部署模式 与Hadoop MapReduce对比
99 2
|
1月前
|
数据采集 分布式计算 Hadoop
使用Hadoop MapReduce进行大规模数据爬取
使用Hadoop MapReduce进行大规模数据爬取
|
3月前
|
分布式计算 资源调度 Hadoop
Hadoop-10-HDFS集群 Java实现MapReduce WordCount计算 Hadoop序列化 编写Mapper和Reducer和Driver 附带POM 详细代码 图文等内容
Hadoop-10-HDFS集群 Java实现MapReduce WordCount计算 Hadoop序列化 编写Mapper和Reducer和Driver 附带POM 详细代码 图文等内容
145 3
|
3月前
|
SQL 分布式计算 关系型数据库
Hadoop-24 Sqoop迁移 MySQL到Hive 与 Hive到MySQL SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
Hadoop-24 Sqoop迁移 MySQL到Hive 与 Hive到MySQL SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
152 0
|
3月前
|
SQL 分布式计算 关系型数据库
Hadoop-23 Sqoop 数据MySQL到HDFS(部分) SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
Hadoop-23 Sqoop 数据MySQL到HDFS(部分) SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
66 0
|
3月前
|
SQL 分布式计算 关系型数据库
Hadoop-22 Sqoop 数据MySQL到HDFS(全量) SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
Hadoop-22 Sqoop 数据MySQL到HDFS(全量) SQL生成数据 HDFS集群 Sqoop import jdbc ETL MapReduce
86 0
|
8月前
|
分布式计算 Hadoop
Hadoop系列 mapreduce 原理分析
Hadoop系列 mapreduce 原理分析
92 1
|
7月前
|
分布式计算 Hadoop Java
Hadoop MapReduce编程
该教程指导编写Hadoop MapReduce程序处理天气数据。任务包括计算每个城市ID的最高、最低气温、气温出现次数和平均气温。在读取数据时需忽略表头,且数据应为整数。教程中提供了环境变量设置、Java编译、jar包创建及MapReduce执行的步骤说明,但假设读者已具备基础操作技能。此外,还提到一个扩展练习,通过分区功能将具有相同尾数的数字分组到不同文件。
74 1
|
7月前
|
数据采集 SQL 分布式计算
|
8月前
|
分布式计算 Hadoop Java
Hadoop MapReduce 调优参数
对于 Hadoop v3.1.3,针对三台4核4G服务器的MapReduce调优参数包括:`mapreduce.reduce.shuffle.parallelcopies`设为10以加速Shuffle,`mapreduce.reduce.shuffle.input.buffer.percent`和`mapreduce.reduce.shuffle.merge.percent`分别设为0.8以减少磁盘IO。
88 1

相关实验场景

更多