Hadoop Cluster Management

Manage Nodes

Add Data Node in Hadoop Cluster

You can scale a running cluster without stopping any Hadoop services. To add additional data nodes into a running cluster, follow the below steps

Step 1: Install Big Data Agent on machines which we want to add as Data node.

Step 2: Click Add Node(s) under Manage nodes dropdown menu in management page of corresponding cluster.

Datanode addition dialog

Step 3:Provide IP address or host name of machines which we want to add as Data node. If we want to add more number of data nodes, click the ADD NODE. Then Click Next for validation.

Datanode details dialog

Step 6:After successful validation, click Next to select Configuration type.

Default: All default properties will be set for all cluster nodes.
Recommended: The Cluster Manager will automatically set configuration properties based on hardware specification of nodes such as RAM capacity.

Cluster configuration dialog

Also you can modify the recommended properties,

Modifyrecommendedconfig dialog

Step 7: Once properties are verified, click Create, so that Cluster Manager starts to transfer packages to Data nodes and services will be automatically started.

Clustercreationafterconfig dialog

Package transfer dialog

Datanode status dialog

Remove Data node from a Hadoop cluster

To remove data node from a Hadoop cluster follow the below steps.

Step 1:In management page of corresponding a cluster, stop the services of the Data node by selecting the stop node option of corresponding node.

Stop node dialog

Step 2: After all services stopped in the node, remove option will be displayed. Select remove option as shown in below image.

Cluster removal dialog

NOTE

Stop services manually and remove as mentioned above is not recommended way as data lose will occur in this procedure. So you have to decommission data nodes properly and remove to avoid data lose.

Add Journal Node into Hadoop Cluster

Journal node service will be running in first three nodes for name node Higher Availability. When the Data node which runs journal node service failed, the cluster is no longer in Higher Availability mode. In such situation to add a data node with journal node configuration, follow the below steps

Step 1:Install Big Data Agent in a machine which you want to add as journal node in cluster.

Step 2: Click Add Journal Node under Manage nodes menu in Management page of corresponding cluster.

Adding journalnode dialog

Step 3:Enter the IP address or host name of the machine, select the node type as Data Node (Journal Node) and click done.

Journalnode details dialog

NOTE

Adding the journal node will restart the Hadoop cluster.

Add Name Node

When active Name node fails, standby Name node becomes Active Name node. We can add the new standby name node into the cluster to maintain Name node Higher Availability.

Step 1:Install the installer Agent in a machine which you want to add as standby Name node.

Step 2: Click Add Journal Node under Manage nodes menu from management page of corresponding cluster.

Addition of journalnode dialog

Step 3: Enter IP address of host name of the machine, select the node type as Name Node and click done.

Namenode details dialog

Step 4: Cluster Manager configured the Hadoop cluster with additional name node and starts the services of Hadoop cluster.

Cluster configuration dialog

NOTE

Adding the journal node will restart the Hadoop cluster.

Add Data Node or Journal Node in Linux Secure Cluster

To add DataNode or JournalNode to the Kerberos enabled secure cluster formed using the cluster manager that is installed in a Linux machine, execute setspn command in Active Directory machine by following the given details.

NOTE

This manual command execution is required only in case the secure cluster is created with the Cluster Manager installed in the Linux machines. For other clusters and Cluster Manager in windows, this manual step is not required.

While adding data nodes to the secure cluster via Linux Cluster Manager,

Generate command for each data node you are adding by replacing the hostname and execute the same in the Active Directory machine.

setspn -s HTTP/[HostName] web_[OUName]

While adding journal node to the secure cluster via Linux Cluster Manager,

Generate the commands for the journal node you are adding by replacing the hostname.

Step 1: Remove setspn, if already added for the host.

setspn -d HTTP/[HostName] web_[OUName]

Step 2: Execute setspn command for the host.

setspn -s HTTP/[HostName] web_[OUName]

NOTE

OUName - Organization Unit name in Active Directory.

Manage Hadoop Services

To stop or start all Hadoop services use Services dropdown menu.

Stopping services dialog

Ecosystem

By default all ecosystem and supporting services such Sqoop, Pig, Hive, Spark, HBase, IPython and Oozie are available in name nodes. Hiveserver2, Oozie Server, HBase Thrift Server, IPython and Spark Thrift Server are running in Name nodes by default. To add and manage ecosystem in other nodes, use action menu in each node as shown in below image.

Package installation dialog

Cluster status dialog

Add Existing Cluster

A cluster that has been already created can be managed Managing Existing Cluster option in Cluster Manager home page.

Adding existing cluster dialog

Click the Managing Existing Cluster option and enter a user defined name and a Name node IP or host name and click done.

Managing cluster dialog

After successful validation of the Name node, a cluster entry is added to manage the cluster.

Cluster validation dialog

Log Aggregation

Collect logs option allow you to collect all type of application and Hadoop logs in a single click. It will be useful for issue reporting and internal log analysis.

Click Collect Logs from cluster manager home page. It will download logs from all cluster nodes as a single Zip file.

Cluster logs dialog

Backup and Restore

Apart from handling data loss and fault tolerance by replication factor in a cluster, you can take backup HDFS and HBase table data to another cluster or to Azure Blob storage. Cluster Manager provides restore option from backup data as well.

NOTE

Backup jobs will be scheduled using Oozie server that is running in the cluster.

Step 1: Click Back and Restore option under Option menu in management page.

Backup and restore dialog

Step 2: Select HBase or HDFS backup as required and select backup type and provide required fields and click Done.

HBase backuo dialog

HBase backup details dialog

Based on provided details, an Oozie coordinator job will be submitted.

Oozie job status dialog

When the scheduled backup job is running, you can suspend, kill, perform restore operation and view history.

Backup operation dialog

Python Package Manager

Python and IPython support for Spark, by default Python will be configured in each cluster nodes. You can manage the packages using Python package manager option.

Step 1: Click Python package manager option under Option menu in management page.

Python package manager dialog

Step 2: In the page, you can install / uninstall and update individual or multiple packages at a time.

Install python package dialog

Step 3: In install package option, you can install it from web or local file.

Install package option dialog

Customization of Hadoop and all Hadoop ecosystem configuration files

When creation of cluster, you would have advanced configuration to set Hadoop properties. Similarly you can edit Hadoop and all Hadoop ecosystem configuration files after cluster creation.

Step 1: Click Configuration under Administration options.

Administration options dialog

Step 2: Update existing or add new properties of Hadoop and ecosystem configuration files based on individual node or overall to the cluster, review and restart services.

Cluster review dialog

NOTE

Some of configuration changes may require service restart.