Spark可以不依赖Hadoop运行。如果运行的结果(包括中间结果)不需要存储到HDFS,并且集群管理器不采用YARN的情况下是可以不依赖hadoop的。
版本规划
项目 | 版本号 |
hadoop | 2.7.7 |
spark | 2.1.0 |
scala | 2.11.8 |
zk | 3.4.13 |
java | 1.8.0 |
kafka | 2.12-2.1.0 |
mongoDB | 4.2.0-rc2 |
kafka及mongoDB在后续章节中会使用到,这里先列出版本号
涉及端口
端口 | 用途 |
8080 | spark-ui |
7077 | master url port |
6066 | rest url port |
1、集群环境规划
IP | 主机名 | Master | Worker | ZK |
172.*.*.6 | master | Y | N | Y |
172.*.*.7 | slave1 | N | Y | Y |
172.*.*.8 | slave2 | N | Y | Y |
172.*.*.9 | slave2 | N | Y | Y |
2、修改主机名
172.*.*.6设置为master
vi /etc/sysconfig/network HOSTNAME=master #重启生效或者下面临时使用命令生效 hostname master
172.*.*.7设置为slave1
vi /etc/sysconfig/network HOSTNAME=slave1 #重启生效或者下面临时使用命令生效 hostname slave1
172.*.*.8设置为slave2
vi /etc/sysconfig/network HOSTNAME=slave2 #重启生效或者下面临时使用命令生效 hostname slave2
172.*.*.9设置为slave3
vi /etc/sysconfig/network HOSTNAME=slave3 #重启生效或者下面临时使用命令生效 hostname slave3
3、配置ip与主机名映射
6、7、8上分别配置
vi /etc/hosts 172.16.14.6 master 172.16.14.7 slave1 172.16.14.8 slave2 172.16.14.9 slave3
4、配置免密登录
#三台集群中分别生成密钥 ssh-keygen -t rsa #将公钥拷贝到master的authorized_keys中 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys #赋予authorized_keys 600权限 chmod 600 authorized_keys #最终authorized_keys文件内容如下 [root@localhost .ssh]# cat authorized_keys ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAtEvxRj/3xPCtnO38Gy4Y/Y4gj6XX5s+G2hwG5xx19PiDQEKeW3BYUDE616OVdecStBo3X+0Plr2ioirI/3WGlUkm0todr/irpksy0MTpvsjCNUnCWGUHGFMUmrcw1LSiNLhoOSS02AcIq+hw3QJO0w0Wo0EN8xcOhrYwuAByoVv3CvqWd/2Vce2rNOXxLNSmc9tR0Dl3ZqOAq+2a55GM7cETj+eiexDeF5zEVJ2vykQdH3+sZ2XLrQu4WXOMn70xFosk7E1lwJ14QLy6lpfRcWnB1JVKJx9mglze6v3U35g59Vu/LP7t3ebW+dJIOD3/Attb5HcvN8MNfQVOX3JD4w== root@master ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAuU9KJmlmYCx7D+vfMCl2Fj/kz1mfWBrChco0jmZtbygpYY8MUSjmfnsC/wefWKMnFtEruJb+RrgBLxVY6lNzvVKXh+iVPhrjubzj54FoZjepR+1EEznIvwkKa+Y4fkcSJjmcSq/Wvjvz34j3/wVoa1qZtbQing+GzC8Xt0y5rQ6fD1gzD4Oniu43fHAeQDxpo2cVNnTdO2HEe56ZfhIctVRP63rc2CoEuD7d0Ea2WhV0Uruqri/ZKFHVAQQqQ7z/jdCgzTdTXJ5t5hpyeaK8+mYhUKEyOF3xrACW1Is6grUjhbjUxTLt2y2Ytw1d5voFxCUJ6MQcy91KFE/9Lfefyw== root@slave1 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEArucVUJdZBYXJD0r9WiX6VnR5S3F7BhoR7hB8UTkXs+WRJGEX9E44yjH+BjIJAPn2v/XwOCdqzSZrGPzLL/BG+XRhGN5NGmdplv8xI3C93hC5kZewRHrHlcAG5Kv4mcHlU+ugcWiyQbIaQvLaFXaq48ZVQHYrzXrz3ZT6QDpsaZtSeW4Z4KWeFmL+AwNyAqxK0nxYXR1zNQJ1r0IdApKmP1WNvbcblB2UKx5G7VMxOs62WY0R9LGdJK6Mmmr5QPlWlpn/g5vXlBvgD80pM6iixFAyz8q19aMQjErTWuULNvX8tdcm+StJV52N8EsiuNMOs+xLVO7L00yxZRtwrXKGgQ== root@slave2 #将master的authorized_keys远程传输到slave1/slave2 scp ~/.ssh/authorized_keys root@slave1:~/.ssh/ scp ~/.ssh/authorized_keys root@slave2:~/.ssh/ #检查远程免密登录 ssh slave1 ssh slave2 ssh slave3
5、集群配置
master配置(spark-env.sh)
export SCALA_HOME=/opt/middleware/scala-2.11.8 export JAVA_HOME=/usr/local/jdk export SPARK_MASTER_IP=master export SPARK_MASTER_HOST=master export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=2 export SPARK_WORKER_MEMORY=3g export SPARK_MASTER_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181 -Dspark.deploy.zookeeper.dir=/opt/middleware/zookeeper-3.4.13"
slave1配置(spark-env.sh)
export SCALA_HOME=/opt/middleware/scala-2.11.8 export JAVA_HOME=/usr/local/jdk export SPARK_MASTER_IP=master export SPARK_MASTER_HOST=master export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=2 export SPARK_WORKER_MEMORY=3g export SPARK_MASTER_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181 -Dspark.deploy.zookeeper.dir=/opt/middleware/zookeeper-3.4.13"
slave2配置(spark-env.sh)
export SCALA_HOME=/opt/middleware/scala-2.11.8 export JAVA_HOME=/usr/local/jdk export SPARK_MASTER_IP=master export SPARK_MASTER_HOST=master export SPARK_MASTER_PORT=7077 export SPARK_WORKER_CORES=2 export SPARK_WORKER_MEMORY=3g export SPARK_MASTER_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181 -Dspark.deploy.zookeeper.dir=/opt/middleware/zookeeper-3.4.13"