hdfs伪分布式部署

hdfs伪分布式部署

Releases Archive中选择要部署的版本,我们以Release 3.2.2 available版本为例

参考文档:Hadoop: Setting up a Single Node Cluster.

一、部署

1.软件要求

补充:组件名称大写-数字,如:SPARK-2908,表明该组件是有问题的

2.tar包解压

点击下载:hadoop-3.2.2.tar.gz

下载后通过rz命令上传至Linux系统

1
2
3
[hadoop@hadoop001 ~]$ ll software/
total 805204
-rw-r--r--. 1 hadoop hadoop 395448622 Nov 21 10:03 hadoop-3.2.2.tar.gz

解压hadoop到app目录下,创建软连接

1
2
3
4
5
6
7
8
[hadoop@hadoop001 ~]$ tar -xzvf software/hadoop-3.2.2.tar.gz -C app/
[hadoop@hadoop001 ~]$ cd app
[hadoop@hadoop001 app]$
[hadoop@hadoop001 app]$ ln -s /home/hadoop/app/hadoop-3.2.2 hadoop
[hadoop@hadoop001 app]$ ll
total 2
lrwxrwxrwx. 1 hadoop hadoop 29 Nov 25 16:28 hadoop -> /home/hadoop/app/hadoop-3.2.2
drwxr-xr-x. 9 hadoop hadoop 4096 Jan 3 2021 hadoop-3.2.2

3.查看文件目录

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[hadoop@hadoop001 app]$ cd hadoop/
[hadoop@hadoop001 hadoop]$ ll
total 216
drwxr-xr-x. 2 hadoop hadoop 4096 Jan 3 2021 bin #命令执行脚本
drwxr-xr-x. 3 hadoop hadoop 4096 Jan 3 2021 etc #配置文件
drwxr-xr-x. 2 hadoop hadoop 4096 Jan 3 2021 include
drwxrwxr-x. 2 hadoop hadoop 4096 Nov 21 10:14 input
drwxr-xr-x. 3 hadoop hadoop 4096 Jan 3 2021 lib
drwxr-xr-x. 4 hadoop hadoop 4096 Jan 3 2021 libexec
-rw-rw-r--. 1 hadoop hadoop 150569 Dec 5 2020 LICENSE.txt
drwxrwxr-x. 2 hadoop hadoop 4096 Nov 21 10:48 logs
-rw-rw-r--. 1 hadoop hadoop 21943 Dec 5 2020 NOTICE.txt
drwxr-xr-x. 3 hadoop hadoop 4096 Nov 21 11:01 output
-rw-rw-r--. 1 hadoop hadoop 1361 Dec 5 2020 README.txt
drwxr-xr-x. 3 hadoop hadoop 4096 Jan 3 2021 sbin #启动停止脚本
drwxr-xr-x. 4 hadoop hadoop 4096 Jan 3 2021 share

大部分的大数据项目解压后目录:bin

4.手动配置Java环境变量(必须)

1
[hadoop@hadoop001 hadoop]$ vi etc/hadoop/hadoop-env.sh 
1
2
3
4
# The java implementation to use. By default, this environment
# variable is REQUIRED on ALL platforms except OS X!
# export JAVA_HOME=/usr/java/latest
export JAVA_HOME=/usr/java/jdk1.8.0_45

5.执行bin/hadoop查看hdfs使用说明

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
[hadoop@hadoop001 hadoop]$ bin/hadoop
Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS]
or hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS]
where CLASSNAME is a user-provided Java class

OPTIONS is none or any of:

buildpaths attempt to add class files from build tree
--config dir Hadoop config directory
--debug turn on shell script debug mode
--help usage information
hostnames list[,of,host,names] hosts to use in slave mode
hosts filename list of hosts to use in slave mode
loglevel level set the log4j level for this command
workers turn on worker mode

SUBCOMMAND is one of:

Admin Commands:

daemonlog get/set the log level for each daemon

Client Commands:

archive create a Hadoop archive
checknative check native Hadoop and compression libraries availability
classpath prints the class path needed to get the Hadoop jar and the required libraries
conftest validate configuration XML files
credential interact with credential providers
distch distributed metadata changer
distcp copy file or directories recursively
dtutil operations related to delegation tokens
envvars display computed Hadoop environment variables
fs run a generic filesystem user client
gridmix submit a mix of synthetic job, modeling a profiled from production load
jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not
this command.
jnipath prints the java.library.path
kdiag Diagnose Kerberos Problems
kerbname show auth_to_local principal conversion
key manage keys via the KeyProvider
rumenfolder scale a rumen input trace
rumentrace convert logs into a rumen trace
s3guard manage metadata on S3
trace view and modify Hadoop tracing settings
version print the version

Daemon Commands:

kms run KMS, the Key Management Server

SUBCOMMAND may print help when invoked w/o parameters or with -h.

6.修改配置文件:伪分布式部署

  • 前置修改: /etc/host

通过命令ifconfig找到本机ip地址

1
2
3
4
5
6
7
8
9
[hadoop@hadoop001 hadoop]$ ifconfig
eth0 Link encap:Ethernet HWaddr 00:0C:29:E2:5A:5E
inet addr:XXX.XXX.XXX.XXX//这个ip地址
inet6 addr: fea2::24c:29fs:fee2:5a7e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:2742 errors:0 dropped:0 overruns:0 frame:0
TX packets:3033 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:267707 (261.4 KiB) TX bytes:1724144 (1.6 MiB)

vi /etc/host修改域名与ip的对应关系(注意,在后面追加即可,前面的信息不要修改)

1
2
3
4
5
6
[hadoop@hadoop001 hadoop]$ cat /etc/host
cat: /etc/host: No such file or directory
[hadoop@hadoop001 hadoop]$ cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
XXX.XXX.XXX.XXX hadoop001
  • etc/hadoop/core-site.xml
1
[hadoop@hadoop001 hadoop]$ vi etc/hadoop/core-site.xml 
1
2
3
4
5
6
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

我的域名:hadoop001,所以

1
2
3
4
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop001:9000</value>
</property>

个人补充:可顺便添加以下代码更改tmp目录地址

1
2
3
4
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp/hadoop-${user.name}</value>
</property>
  • etc/hadoop/hdfs-site.xml
1
[hadoop@hadoop001 hadoop]$ vi etc/hadoop/hdfs-site.xml 
1
2
3
4
5
6
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

个人补充:可顺便添加以下代码(我的域名:hadoop001)

1
2
3
4
5
6
7
8
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop001:9868</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>hadoop001:9869</value>
</property>

个人补充:

  • etc/hadoop/workers
1
2
[hadoop@hadoop001 hadoop]$ vi etc/hadoop/workers
hadoop001
  • etc/hadoop/hadoop-env.sh
1
2
3
4
[hadoop@hadoop001 hadoop]$ vi etc/hadoop/hadoop-env.sh
# Where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/tmp
export HADOOP_PID_DIR=/home/hadoop/tmp

7.设置SSH私钥取消密码

通过命令ssh localhost检查是否可以用ssh免密连接到localhost

  • 成功:
1
2
[hadoop@hadoop001 hadoop]$ ssh localhost
Last login: Sat Oct 9 17:05:54 2021 from localhost
  • 失败:
1
2
3
[hadoop@hadoop001 hadoop]$ ssh localhost
hadoop@localhost's password:
Permission denied, please try again.

执行以下命令:ssh-keygen,然后回车两次,若有Overwrite (y/n)?,则输入 y回车

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[hadoop@hadoop001 hadoop]$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
/home/hadoop/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
9c:1c:53:05:cd:dc:23:1b:51:62:06:b0:92:57:66:da hadoop@hadoop001
The key's randomart image is:
+--[ RSA 2048]----+
| ..BB*+ . |
| . O o*.o |
| o * E +. |
| = + . |
| S |
| |
| |
| |
| |
+-----------------+
[hadoop@hadoop001 hadoop]$ ll ~/.ssh/
total 16
-rw-------. 1 hadoop hadoop 796 Nov 21 10:34 authorized_keys
-rw-------. 1 hadoop hadoop 1675 Nov 25 17:04 id_rsa
-rw-r--r--. 1 hadoop hadoop 398 Nov 25 17:04 id_rsa.pub
-rw-r--r--. 1 hadoop hadoop 798 Nov 21 10:29 known_hosts

添加ssh密钥到authorized_keys中,更改权限

1
2
[hadoop@hadoop001 hadoop]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[hadoop@hadoop001 hadoop]$ chmod 600 ~/.ssh/authorized_keys

测试:

1
2
[hadoop@hadoop001 hadoop]$ ssh localhost
Last login: Thu Nov 25 16:57:28 2021 from localhost

二、执行

我的系统中环境变量配置了HADOOP_HOME:

1
2
3
4
5
6
7
export HADOOP_HOME=/home/hadoop/app/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH

本地执行

以下介绍为在本地执行一个MapReduce任务:

1.格式化文件系统

1
[hadoop@hadoop001 hadoop]$ hdfs namenode -format

2.启动NameNode节点和DataNode节点

The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

1
2
3
4
[hadoop@hadoop001 hadoop]$ start-dfs.sh 
Starting namenodes on [hadoop001]
Starting datanodes
Starting secondary namenodes [hadoop001]

启动后,可以用jps命令查看:

1
2
3
4
5
[hadoop@hadoop001 hadoop]$ jps
6530 Jps
6355 SecondaryNameNode
6197 DataNode
6089 NameNode

个人补充:jps后发现DataNode节点丢失,没在运行。原因大概是我格式化太多次namenode导致csid不同步,网上解决办法是data和name文件夹的dfs/data/cruurt/VERSION的id进行同步。最终个人解决方法如下:

  1. 找到自己临时文档/tmp/即core.site.xml文件中的/home/hadoop/data/tmp路径

    1
    2
    3
    4
    5
    [hadoop@hadoop001 hadoop]$ cat etc/hadoop/core-site.xml 
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/tmp/hadoop-${user.name}</value>
    </property>
  2. 删除home/hadoop/tmp目录下文件,重新格式化Namenode

    1
    2
    [hadoop@hadoop001 hadoop]$ rm -rf tmp/*
    [hadoop@hadoop001 hadoop]$ hdfs namenode -format
  3. 重启hdfs,问题解决

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    [hadoop@hadoop001 hadoop]$ start-dfs.sh 
    Starting namenodes on [hadoop001]
    Starting datanodes
    Starting secondary namenodes [hadoop001]
    2021-11-25 18:17:14,032 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    [hadoop@hadoop001 hadoop]$ jps
    14148 SecondaryNameNode
    13991 DataNode
    13883 NameNode
    14270 Jps
    [hadoop@hadoop001 hadoop]$

3.通过浏览器访问NameNode

  • NameNode - http://localhost:9870/
  • hadoop 2.x版本是50070端口,现在版本是9870端口

4.在hdfs中创建目录执行MapReduce jobs

1
2
3
4
5
6
7
8
9
[hadoop@hadoop001 hadoop]$ hdfs dfs  -ls /
[hadoop@hadoop001 hadoop]$ hdfs dfs -mkdir /user
[hadoop@hadoop001 hadoop]$ hdfs dfs -ls /
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2021-11-25 19:58 /user
[hadoop@hadoop001 hadoop]$ hdfs dfs -mkdir /user/hadoop
[hadoop@hadoop001 hadoop]$ hdfs dfs -ls /user
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2021-11-25 19:59 /user/hadoop

5.复制input文件夹下的文件到hdfs系统中

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[hadoop@hadoop001 hadoop]$ hdfs dfs  -mkdir input
[hadoop@hadoop001 hadoop]$ hdfs dfs -put etc/hadoop/*.xml input
[hadoop@hadoop001 hadoop]$ hdfs dfs -ls /user/hadoop
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2021-11-25 20:05 /user/hadoop/input
[hadoop@hadoop001 hadoop]$ hdfs dfs -ls /user/hadoop/input
Found 9 items
-rw-r--r-- 1 hadoop supergroup 9213 2021-11-25 20:05 /user/hadoop/input/capacity-scheduler.xml
-rw-r--r-- 1 hadoop supergroup 975 2021-11-25 20:05 /user/hadoop/input/core-site.xml
-rw-r--r-- 1 hadoop supergroup 11392 2021-11-25 20:05 /user/hadoop/input/hadoop-policy.xml
-rw-r--r-- 1 hadoop supergroup 1068 2021-11-25 20:05 /user/hadoop/input/hdfs-site.xml
-rw-r--r-- 1 hadoop supergroup 620 2021-11-25 20:05 /user/hadoop/input/httpfs-site.xml
-rw-r--r-- 1 hadoop supergroup 3518 2021-11-25 20:05 /user/hadoop/input/kms-acls.xml
-rw-r--r-- 1 hadoop supergroup 682 2021-11-25 20:05 /user/hadoop/input/kms-site.xml
-rw-r--r-- 1 hadoop supergroup 758 2021-11-25 20:05 /user/hadoop/input/mapred-site.xml
-rw-r--r-- 1 hadoop supergroup 690 2021-11-25 20:05 /user/hadoop/input/yarn-site.xml
[hadoop@hadoop001 hadoop]$

可以看到,在第一句命令中,input的路径并没有写成/user/hadoop/input,因为执行后会在当前用户的路径下执行

6.执行MapReduce任务

1
2
3
4
5
6
7
8
9
10
11
[hadoop@hadoop001 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar grep input output 'dfs[a-z.]+'
2021-11-25 20:10:12,608 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-11-25 20:10:13,702 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2021-11-25 20:10:13,807 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2021-11-25 20:10:13,807 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2021-11-25 20:10:14,497 INFO input.FileInputFormat: Total input files to process : 9
......
File Input Format Counters
Bytes Read=232
File Output Format Counters
Bytes Written=90

7.查看执行结果

  • 直接在hdfs系统上看:

    1
    2
    3
    4
    5
    6
    7
    8
    [hadoop@hadoop001 hadoop]$ hdfs dfs  -ls /user/hadoop/output
    Found 2 items
    -rw-r--r-- 1 hadoop supergroup 0 2021-11-25 20:10 /user/hadoop/output/_SUCCESS
    -rw-r--r-- 1 hadoop supergroup 90 2021-11-25 20:10 /user/hadoop/output/part-r-00000
    [hadoop@hadoop001 hadoop]$ hdfs dfs -cat output/*
    cat: `output/output': No such file or directory
    1 dfsadmin
    1 dfs.replication
  • 拿到linux系统上看:

    1
    2
    3
    4
    [hadoop@hadoop001 hadoop]$ hdfs dfs -get output output
    [hadoop@hadoop001 hadoop]$ cat output/*
    1 dfsadmin
    1 dfs.replication

8.停止服务

1
2
3
4
5
[hadoop@hadoop001 hadoop]$ stop-dfs.sh 
Stopping namenodes on [hadoop001]
Stopping datanodes
Stopping secondary namenodes [hadoop001]
[hadoop@hadoop001 hadoop]$

YARN上执行

想要在YARN上执行MapReduce任务,需要设置参数运行ResourceManager守护进程和NodeManager守护进程。下面执行的基础是已经执行上述本地执行1~4的步骤。

1.修改配置文件

  • etc/hadoop/mapred-site.xml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    <property>
    <name>mapreduce.application.classpath</name>
    <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
    </configuration>
  • etc/hadoop/yarn-site.xml

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    <configuration>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.env-whitelist</name>
    <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
    </property>
    </configuration>

    个人:HADOOP_CONF_DIR还没配置

2.启动ResourceManager守护进程和NodeManager守护进程

1
2
3
4
5
6
7
8
9
10
[hadoop@hadoop001 hadoop]$ start-yarn.sh 
Starting resourcemanager
Starting nodemanagers
[hadoop@hadoop001 hadoop]$ jps
20592 Jps
18579 NameNode
20392 ResourceManager
18843 SecondaryNameNode
18686 DataNode
20495 NodeManager

3.通过浏览器访问访问ResourceManager

  • ResourceManager - http://localhost:8088/

4.执行MapReduce任务

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
[hadoop@hadoop001 hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar grep input output 'dfs[a-z.]+'
2021-11-26 08:15:32,639 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-11-26 08:15:34,024 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2021-11-26 08:15:35,285 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1637885511879_0001
2021-11-26 08:15:36,280 INFO input.FileInputFormat: Total input files to process : 9
2021-11-26 08:15:36,379 INFO mapreduce.JobSubmitter: number of splits:9
2021-11-26 08:15:36,974 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1637885511879_0001
2021-11-26 08:15:36,976 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-11-26 08:15:37,319 INFO conf.Configuration: resource-types.xml not found
2021-11-26 08:15:37,320 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-11-26 08:15:37,878 INFO impl.YarnClientImpl: Submitted application application_1637885511879_0001
2021-11-26 08:15:37,945 INFO mapreduce.Job: The url to track the job: http://hadoop001:8088/proxy/application_1637885511879_0001/
2021-11-26 08:15:37,945 INFO mapreduce.Job: Running job: job_1637885511879_0001
2021-11-26 08:15:55,539 INFO mapreduce.Job: Job job_1637885511879_0001 running in uber mode : false
2021-11-26 08:15:55,550 INFO mapreduce.Job: map 0% reduce 0%
2021-11-26 08:16:43,204 INFO mapreduce.Job: map 44% reduce 0%
2021-11-26 08:16:44,369 INFO mapreduce.Job: map 67% reduce 0%
2021-11-26 08:17:05,488 INFO mapreduce.Job: map 100% reduce 0%
2021-11-26 08:17:06,495 INFO mapreduce.Job: map 100% reduce 100%
2021-11-26 08:17:07,508 INFO mapreduce.Job: Job job_1637885511879_0001 completed successfully
2021-11-26 08:17:07,638 INFO mapreduce.Job: Counters: 55
File System Counters
FILE: Number of bytes read=128
FILE: Number of bytes written=2350649
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=29993
HDFS: Number of bytes written=232
HDFS: Number of read operations=32
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Killed map tasks=1
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=333545
Total time spent by all reduces in occupied slots (ms)=18937
Total time spent by all map tasks (ms)=333545
Total time spent by all reduce tasks (ms)=18937
Total vcore-milliseconds taken by all map tasks=333545
Total vcore-milliseconds taken by all reduce tasks=18937
Total megabyte-milliseconds taken by all map tasks=341550080
Total megabyte-milliseconds taken by all reduce tasks=19391488
Map-Reduce Framework
Map input records=781
Map output records=4
Map output bytes=114
Map output materialized bytes=176
Input split bytes=1077
Combine input records=4
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=176
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =9
Failed Shuffles=0
Merged Map outputs=9
GC time elapsed (ms)=6070
CPU time spent (ms)=7920
Physical memory (bytes) snapshot=1882726400
Virtual memory (bytes) snapshot=27148435456
Total committed heap usage (bytes)=1269469184
Peak Map Physical memory (bytes)=211144704
Peak Map Virtual memory (bytes)=2715578368
Peak Reduce Physical memory (bytes)=106315776
Peak Reduce Virtual memory (bytes)=2720391168
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=28916
File Output Format Counters
Bytes Written=232
2021-11-26 08:17:07,682 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://hadoop001:9000/user/hadoop/output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:164)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:277)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1565)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1562)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1562)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1583)
at org.apache.hadoop.examples.Grep.run(Grep.java:94)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.examples.Grep.main(Grep.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)

通过浏览器查看任务

5.停止服务

1
2
3
4
5
[hadoop@hadoop001 hadoop]$ stop-yarn.sh 
Stopping nodemanagers
localhost: WARNING: nodemanager did not stop gracefully after 5 seconds: Trying to kill with kill -9
Stopping resourcemanager
[hadoop@hadoop001 hadoop]$ jps