Configuration of Hadoop By Ansible

Aman Rathi
3 min readDec 12, 2020

--

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. Hadoop consist of Namenode and a large no of datanodes which handles all the data storage, and configuring each data nondes manually is a time consuming task. So, we use some configuration management tools to configure the nodes, here we are using ansible.

We will create a role in ansible which will dynamically fetch the ip’s of nodes and configure them automatically. So, before moving further lets’s have a look on the road map of the role.

Road Map of Role

  1. Download the software of hadoop and jdk
  2. Install the software
  3. Copy the modified files
  4. Create Directtory for data_storage
  5. Format the namenode once
  6. Start the service

Lets’ s move step step by step

Step1. Downloading of software

Here we first download the software from the publicly available websites on our nodes, here im downloading the jdk from my amazon s3 and hadoop from apache site.

- name: downloading the file  
get_url:
url: "{{ item }}"
dest: /home/ec2-user/
loop:
- https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1-1.x86_64.rpm
- https://lalitbucket6033.s3.ap-south-1.amazonaws.com/jdk-8u171-linux-x64.rpm

Step2. Installing the software

After downloading we will Install the softwares in both data ans namenodes

- name: installing softwares  
command:
cmd: "{{ item }}"
args: warn: no
loop:
- rpm -ivh /home/ec2-user/jdk-8u171-linux-x64.rpm --force
- rpm -ivh /home/ec2-user/hadoop-1.2.1-1.x86_64.rpm --force

Step3. Copying the edited files

Here this task copy the edited files according to the namenode and datanodes separately. You can refer the below github link for this files, where i used jinja module to modify these files.

- name: copying the files  
template:
src: ../templates/{{node_name}}/{{item}}
dest: /etc/hadoop/{{item}}
loop:
- core-site.xml
- hdfs-site.xml

Step4: Creating the Directory for data_storage

Here this task will create a different direc in the datanodes for storing data and in namenode for creating a format table to handle datanodes details.

- name: creating direc  
file:
path: "{{ direc_name }}"
state: directory
mode: '0755'

Step4. Formatting the Namenode

Namenodes needs to be formatted for the very first time, so that it will be ready to store info about the datanodes.

- name: formatting namenode
shell: "echo Y|hadoop namenode -format"
args:
warn: no
when:
node_name == "namenode"

Step5. Starting the nodes

This is the final task which will helps us to start the nodes or the services.

- name: starting the services  
command: sudo hadoop-daemon.sh start "{{ node_name }}"
args:
warn: no
ignore_errors: yes

Now let’s deply this role by creating a simple playbook, I had launched 2 datanodes with tags(name = datanode) and 1 namenode with tag(name = namenode).

- hosts: tag_name_namenode
vars:
direc_name: /home/ec2-user/nn
node_name: namenode
tasks:
- name: configuring namenode
include_role:
name: ansible-hadoop

- hosts: tag_name_datanode
vars:
direc_name: /home/ec2-user/nn
node_name: datanode
tasks:
- name: configuring datanode
include_role:
name: ansible-hadoop

Output

That’s all for this article, if u have any query and suggestion regarding this feel free to comment below.

Thanku You!!

--

--