Configuration of Hadoop By Ansible
Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. Hadoop consist of Namenode and a large no of datanodes which handles all the data storage, and configuring each data nondes manually is a time consuming task. So, we use some configuration management tools to configure the nodes, here we are using ansible.
We will create a role in ansible which will dynamically fetch the ip’s of nodes and configure them automatically. So, before moving further lets’s have a look on the road map of the role.
Road Map of Role
- Download the software of hadoop and jdk
- Install the software
- Copy the modified files
- Create Directtory for data_storage
- Format the namenode once
- Start the service
Lets’ s move step step by step
Step1. Downloading of software
Here we first download the software from the publicly available websites on our nodes, here im downloading the jdk from my amazon s3 and hadoop from apache site.
- name: downloading the file
get_url:
url: "{{ item }}"
dest: /home/ec2-user/
loop:
- https://archive.apache.org/dist/hadoop/core/hadoop-1.2.1/hadoop-1.2.1-1.x86_64.rpm
- https://lalitbucket6033.s3.ap-south-1.amazonaws.com/jdk-8u171-linux-x64.rpm
Step2. Installing the software
After downloading we will Install the softwares in both data ans namenodes
- name: installing softwares
command:
cmd: "{{ item }}"
args: warn: no
loop:
- rpm -ivh /home/ec2-user/jdk-8u171-linux-x64.rpm --force
- rpm -ivh /home/ec2-user/hadoop-1.2.1-1.x86_64.rpm --force
Step3. Copying the edited files
Here this task copy the edited files according to the namenode and datanodes separately. You can refer the below github link for this files, where i used jinja module to modify these files.
- name: copying the files
template:
src: ../templates/{{node_name}}/{{item}}
dest: /etc/hadoop/{{item}}
loop:
- core-site.xml
- hdfs-site.xml
Step4: Creating the Directory for data_storage
Here this task will create a different direc in the datanodes for storing data and in namenode for creating a format table to handle datanodes details.
- name: creating direc
file:
path: "{{ direc_name }}"
state: directory
mode: '0755'
Step4. Formatting the Namenode
Namenodes needs to be formatted for the very first time, so that it will be ready to store info about the datanodes.
- name: formatting namenode
shell: "echo Y|hadoop namenode -format"
args:
warn: no
when:
node_name == "namenode"
Step5. Starting the nodes
This is the final task which will helps us to start the nodes or the services.
- name: starting the services
command: sudo hadoop-daemon.sh start "{{ node_name }}"
args:
warn: no
ignore_errors: yes
Now let’s deply this role by creating a simple playbook, I had launched 2 datanodes with tags(name = datanode) and 1 namenode with tag(name = namenode).
- hosts: tag_name_namenode
vars:
direc_name: /home/ec2-user/nn
node_name: namenode tasks:
- name: configuring namenode
include_role:
name: ansible-hadoop
- hosts: tag_name_datanode
vars:
direc_name: /home/ec2-user/nn
node_name: datanode tasks:
- name: configuring datanode
include_role:
name: ansible-hadoop
Output
That’s all for this article, if u have any query and suggestion regarding this feel free to comment below.
Thanku You!!