hadoop installation

Hadoop Installation#

Untitled

Template Machine Configuration#

First, install vbox and download the CentOS image.

Install the minimal system and cancel the EFI partition, boot EFI, and root (/) partition.

Configure the network.

Untitled

There is a slight difference between here and the video. It should be due to the use of different images, and the generated network names are different.

Network Configuration#

First, modify the network configuration of the CentOS virtual machine as follows:

vi /etc/sysconfig/network-scripts/ifcfg-enp0s3

Untitled

Configure the resolution type as a bridged network card.

The gateway is the gateway.

The ipaddr is the IP address of the virtual machine.

dns1 is an optional gateway address.

Since the gateway of the workshop is 192.168.8.1, your physical machine IP address is also 192.168.8.x.

The virtual machine needs to write 192.168.8.x, which is the same as the physical IP address.

If you switch from this network to the dormitory network, the virtual machine may not be able to access the Internet. It is recommended to configure multiple network cards according to the dormitory network.

After the configuration is complete, restart the network service.

systemctl restart network

Use ip addr to view the current configuration.

Untitled

If it is stuck and finally there is an error, the restart service fails, most likely your network configuration is problematic.

Untitled

The vbox network configuration is as above. In the device's network configuration, network card 1 is a bridged network card, and the name is the network card of your computer that is connected to the Internet.

Untitled

My network card is ax201.

After the establishment is successful, the physical machine can ping the virtual machine.

Untitled

Connect SSH#

Please make sure you have configured the network of the virtual machine.

After the configuration is complete, enter the IP address of your virtual machine computer and the root user and password in the SSH terminal to establish a connection directly.

Untitled

At this point, the network configuration is complete, and you can follow the tutorial. The subsequent cluster network configuration should be similar to this.

Install screenfetch (optional)#

# Get the file
wget -O screenfetch https://git.io/vaHfR

# Add execute permission
chmod +x screenfetch

Used to view computer configuration.

System Update (optional)#

sudo yum check-update
sudo yum update

Install necessary software packages#

yum install -y epel-release
yum install -y psmisc nc net-tools rsync vim lrzsz ntp libzstd openssl-static tree iotop git

Disable Firewall#

systemctl stop firewalld
systemctl disable firewalld

Modify the host file#

vim /etc/hosts

192.168.8.101  neko01
192.168.8.102  neko02

Create a regular user and set privileges#

Add a user

useradd maomao
passwd maomao

Add root privileges

vim /etc/sudoers

Use shift + g to move to the end

Untitled

Add user information below root, then save and exit with wq!

Create directories in /opt#

cd /opt
mkdir module
mkdir software

Enter ls to view the result

Untitled

Install JDK and configure environment variables#

It is recommended to install finalshell, which comes with FTP.

1. Import jar files#

jdk-8u212-linux-x64.tar.gz

hadoop-3.1.3.tar.gz

2. Install JDK#

Extract the tar file

-zcvf for packaging, -zxvf for extraction

tar -zxvf jdk-8u212-linux-x64.tar.gz -C ../module

If you forget to add -C, use mv to move it

mv jdk1.8.0_212/ ../module/

3. Add environment variables#

It is not recommended to modify this file directly

vim /etc/profile

Principle

for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do
    if [ -r "$i" ]; then
        if [ "${-#*i}" != "$-" ]; then
            . "$i"
        else
            . "$i" >/dev/null
        fi
    fi
done

Create environment variables

cd /etc/profile.d

Create a file

sudo touch java.sh
sudo vim java.sh

# Configure JDK environment
# Declare JAVA_HOME variable
JAVA_HOME=/opt/module/jdk1.8.0_212
# Declare PATH variable
PATH=$PATH:$JAVA_HOME/bin

# Promote PATH and JAVA_HOME to system global variables
export JAVA_HOME PATH

Reload the profile file

source /etc/profile

Test

[maomao@nekopara profile.d]$ java -version

java version "1.8.0_212"
Java(TM) SE Runtime Environment (build 1.8.0_212-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.212-b10, mixed mode)

Configure Hadoop#

First, extract the files to module

If you are a regular user, you need to elevate privileges with sudo

Configure the bin and sbin directories of Hadoop into the environment variables#

# Configure JDK environment
# Declare JAVA_HOME variable
JAVA_HOME=/opt/module/jdk1.8.0_212

# Configure Hadoop environment
# Declare Hadoophome

HADOOP_HOME=/opt/module/hadoop-3.1.3

# Declare PATH variable
# Environment variable fusion
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# Promote PATH, JAVA_HOME, and HADOOP_HOME to system global variables
export JAVA_HOME PATH HADOOP_HOME

Refresh the cache

source /etc/profile

Verify

[root@nekopara profile.d]# hadoop version

Hadoop 3.1.3
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r ba631c436b806728f8ec2f54ab1e289526c90579
Compiled by ztang on 2019-09-12T02:47Z
Compiled with protoc 2.5.0
From source with checksum ec785077c385118ac91aadde5ec9799
This command was run using /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-common-3.1.3.jar

If there is a similar output, it means that the environment variable is configured correctly.