关于服务器上测试自定义编译的spark没反应问题

背景

服务器上部署了Spark,并配置了环境变量。根据需要,在IDEA上编译了Spark源码,导出tgz包后,到服务器上解压部署。为区分两个Spark,服务器上的称为S1,新编译的为S2,并修改了S2的Welcome文本。

演示

服务器上S1的位置,以及部署了环境变量:

1
2
[hadoop@hadoop001 ~]$ echo $SPARK_HOME
/home/hadoop/app/spark

S2的位置:

1
2
[hadoop@hadoop001 spark-3.2.0-bin-custom-spark]$ pwd
/home/hadoop/source/spark-3.2.0-bin-custom-spark

我尝试启动S2进行测试,在S2上测试,于是在S2目录下执行命令

1
2
[hadoop@hadoop001 spark-3.2.0-bin-custom-spark]$ ./bin/spark-shell

image-20220122031909067

此时,Welcome文本与S1默认的文本相同,所以此时是启动了S1。

尝试以绝对路径的方式启动S2:

image-20220122032500493

结果一样,启动的是S1。

原因

阅读脚本spark-shell内容

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#
# Shell script for starting the Spark Shell REPL

cygwin=false
case "$(uname)" in
CYGWIN*) cygwin=true;;
esac

# Enter posix mode for bash
set -o posix

if [ -z "${SPARK_HOME}" ]; then
source "$(dirname "$0")"/find-spark-home
fi

export _SPARK_CMD_USAGE="Usage: ./bin/spark-shell [options]

可以看出,脚本启动/usr/bin/env 路径的bash,然后查找是否设置环境变量${SPARK_HOME},如果设置了,运行${SPARK_HOME}路径下的脚本;如果${SPARK_HOME}没有设置,才运行当前路径下的bin目录下的脚本。

解决方法

所以需要删除环境变量${SPARK_HOME},${SPARK_HOME或者指向当前版本路径。

通过echo $SPARK_HOME命令验证环境变量是否正确或者为空值。

删除环境变量:

  • unset VAL:暂时的,只会在当前环境有效
  • export -n VAL:删除指定的变量。变量实际上并未删除,只是不会输出到后续指令的执行环境中。
  • 修改配置文件,默认保存在~/.bash_profile:需要退出重连才生效

再次测试:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[hadoop@hadoop001 spark-3.2.0-bin-custom-spark]$ vi ~/.bash_profile 
[hadoop@hadoop001 spark-3.2.0-bin-custom-spark]$ source ~/.bash_profile
[hadoop@hadoop001 spark-3.2.0-bin-custom-spark]$ echo $SPARK_HOME
/home/hadoop/app/spark
[hadoop@hadoop001 spark-3.2.0-bin-custom-spark]$ exit
logout
[root@hadoop001 ~]# su - hadoop
[hadoop@hadoop001 ~]$ echo $SPARK_HOME

[hadoop@hadoop001 ~]$ source/spark-3.2.0-bin-custom-spark/bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
22/01/21 19:51:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://hadoop001:4040
Spark context available as 'sc' (master = local[*], app id = local-1642794689485).
Spark session available as 'spark'.
Welcome to new Spark
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.2.0
/_/

Using Scala version 2.12.14 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.

scala>