step

Hadoop install

1. 环境

WSL + Hadoop

  1. 需要单独创建 Hadoop 用户。
sudo useradd -m hadoop -s /bin/bash  #创建hadoop用户,并使用/bin/bash作为shell
sudo passwd hadoop                   #为hadoop用户设置密码
sudo adduser hadoop sudo             #为hadoop用户增加管理员权限
su - hadoop                          #切换当前用户为用户hadoop
sudo apt-get update                  #更新hadoop用户

2. 配置 Java 环境

首先安装 JDK ,需要申请账户,jdk-8u261-linux-x64.tar.gz

创建文件夹,用于存放 JDK

su # 切换到超级用户, hadoop 没有权限创建文件夹 
mkdir /usr/lib/jvm

解压到 /usr/lib/jvm 目录下,注意中间的路径需要设置成自己的文件所在位置。我的所在 F 盘中。

sudo tar zxvf /mnt/f/jdk-8u261-linux-x64.tar.gz -C /usr/lib/jvm

将文件 jdk1.8.0_261 重命名为 java 。

cd /usr/lib/jvm
mv  jdk1.8.0_261 java          

切换到 hadoop 用户下,修改配置文件。添加 Java 环境变量。

su hadoop
vim ~/.bashrc

在其中添加内容:

#Java Environment
export JAVA_HOME=/usr/lib/jvm/java
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH

使得环境变量生效,并判断 Java 环境是否配置成功。

source ~/.bashrc 
java -version 

成功后会出现如下内容:

hadoop@LAPTOP-PJ3DJQFQ:~$ java -version
java version "1.8.0_261"
Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)

3. 配置 Hadoop

下载地址:hadoop-3.2.1.tar.gz

将 Hadoop 解压到 /usr/local 目录下:

sudo tar -zxvf  /mnt/f/dowload/hadoop-3.2.1.tar.gz -C /usr/local

修改文件命名,并修改该文件的权限。

sudo mv  hadoop-3.2.1  hadoop
sudo chown -R hadoop ./hadoop

添加环境变量。

vim ~/.bashrc

添加内容如下:

#Hadoop Environment
export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

使环境变量生效并判断是否成功。

source ~./bashrc
hadoop version

出现如下内容表示配置成功。

hadoop@LAPTOP-PJ3DJQFQ:~$ hadoop version
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-3.2.1.jar
vim core-site.xml
vim hdfs-site.xml

4. 配置 SSH

在下载 ssh 之前需要将 wsl 内置的 ssh 卸载,因为内置的 ssh 有一些问题。

sudo apt-get remove openssh-server
sudo apt-get remove openssh-client

sudo apt-get install openssh-server
sudo apt-get install openssh-client

WSL 和 windows 公用一套端口,22 端口已被 windows 中 TCP 占用。

所以需要修改端口,可以修改成 23 ,修改方法如下。

sudo sed -i '/Port /c Port 23' /etc/ssh/sshd_config
sudo sed -i '/ListenAddress 0.0.0.0/c ListenAddress 0.0.0.0' /etc/ssh/sshd_config

我修改后依旧不生效,采用解决方法是直接设置端口参数。

ssh localhost -p 23

可以成功访问,结果如下:

hadoop@LAPTOP-PJ3DJQFQ:~$ ssh localhost -p 23
hadoop@localhost's password:
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 4.4.0-18362-Microsoft x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Sep 16 08:56:18 CST 2020

  System load:    0.52      Processes:              17
  Usage of /home: unknown   Users logged in:        0
  Memory usage:   52%       IPv4 address for eth0:  10.251.235.217
  Swap usage:     0%        IPv4 address for wifi2: 192.168.137.1


30 updates can be installed immediately.
5 of these updates are security updates.
To see these additional updates run: apt list --upgradable


Last login: Wed Sep 16 08:51:50 2020 from 127.0.0.1

需要密码访问,可以设置成不需要密码。首先退出 ssh 然后进入 .ssh 文件中生成密钥。

exit 
cd ~/.ssh/ 
ssh-keygen -t rsa

ssh-keygen -t rsa 命令过后需要敲击三次空格,第一次表示将 KEY 存于默认位置,第二三次都是确定 passpharse 。

最终出现如下内容表示成功。

hadoop@LAPTOP-PJ3DJQFQ:~/.ssh$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:5WRr+wG0Vrb4/bRFKrA5gsWFbL5oj119L169+ITFJCQ hadoop@LAPTOP-PJ3DJQFQ
The key's randomart image is:
+---[RSA 3072]----+
|           E .   |
|       . .  o    |
|        + * o. . |
|       + B * .+  |
|        S X .  o.|
|       + + O .oo.|
|      + o * =.+.=|
|     . + o o ++=+|
|      . o   .oo=o|
+----[SHA256]-----+

通过后续的命令可以将密钥存入文本中实现不要密钥登陆。

hadoop@LAPTOP-PJ3DJQFQ:~/.ssh$ touch authorized_keys
hadoop@LAPTOP-PJ3DJQFQ:~/.ssh$ chmod 600 authorized_keys
hadoop@LAPTOP-PJ3DJQFQ:~/.ssh$ cat ./id_rsa.pub >> ./authorized_keys
hadoop@LAPTOP-PJ3DJQFQ:~/.ssh$ ssh localhost -p 23
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 4.4.0-18362-Microsoft x86_64)

 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage

  System information as of Wed Sep 16 09:02:55 CST 2020

  System load:    0.52      Processes:              17
  Usage of /home: unknown   Users logged in:        0
  Memory usage:   48%       IPv4 address for eth0:  10.251.235.217
  Swap usage:     0%        IPv4 address for wifi2: 192.168.137.1


30 updates can be installed immediately.
5 of these updates are security updates.
To see these additional updates run: apt list --upgradable


Last login: Wed Sep 16 08:56:18 2020 from 127.0.0.1

但是在执行 hadoop 时默认的启动端口依旧是 22 。所以需要修改为 23

export HADOOP_SSH_OPTS="-p 23"

添加完毕后依旧无法启动,需要添加 java 环境变量。

vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh

将如下内容添加至其中:

export JAVA_HOME=/usr/lib/jvm/java

继续运行后出现如下错误

2020-09-16 21:22:10,169 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable

因为是 WARN 级别,这个错误可以忽视。也可以在 hadoop-env.sh 文件中添加如下内容,错误消失。

export HADOOPOPTS=“-Djava.library.path=${HADOOPHOME}/lib/native”

启动 Hadoop 。

hadoop@LAPTOP-PJ3DJQFQ:/usr/local/hadoop/etc/hadoop$ start-dfs.sh
hadoop@LAPTOP-PJ3DJQFQ:/usr/local/hadoop/etc/hadoop$ jps
3654 DataNode
4328 Jps
3802 SecondaryNameNode

参考

  1. Ubuntu20.04安装Hadoop和Hive
  2. MySQL 安装
  3. 配置SSH
Tagged with 大数据

Posted September 14, 2020


WIJE picweijiew . github