spark部署时修改spark-env.sh

情景

部署Spark时在环境变量只添加了

1
export SPARK_HOME=/home/hadoop/app/spark

可以运行Spark

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[hadoop@hadoop001 conf]$ spark-shell --master local[2]
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/01/11 01:06:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://hadoop001:4040
Spark context available as 'sc' (master = local[2], app id = local-1641834407938).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.2.0
/_/

Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.

但是用textFile(path)方法默认读取本地文件系统,而不是hdfs系统。

1
2
3
4
//访问本地文件系统
sc.textFile(file:///path)
//访问hdfs文件系统
sc.textFile(hdfs://hostname:port/path)

image-20220111014835002

解决方法

修改spark-env.sh文件:

1
2
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop

image-20220111015125093

不需要添加hdfs://前缀默认访问hdfs目录