Troubleshooting Hadoop cluster

Cluster deployment in Azure with 3.2.0.20 setup fails due to Security Update

Due to May 2018 Security update in Azure, procedure to connect Virtual machine created using Azure Market place image via Remote connection(RDP) got changed. To address this the following workaround has to be done in your local machine where Cluster Manager is installed. You can deploy clusters in Azure using 3.2.0.20 Syncfusion Big Data Cluster Manager only after doing this workaround.

Change Local Group Policy in Local machine
  1. Open Local Group Policy Editor by executing gpedit.msc in Run window.
    Run dialog
  1. Open Encryption Oracle Remediation setting in Credentials Delegation as shown in below image.
    Encryption Oracle Remediation dialog
  1. Change the Encryption Oracle Remediation policy to Enabled, and Protection Level to Vulnerable.
    Encryption Oracle Remediation dialog

For more information, please check the following URL,
https://blogs.technet.microsoft.com/mckittrick/unable-to-rdp-to-virtual-machine-credssp-encryption-oracle-remediation/

NOTE

This workaround is required for cluster deployment in Azure till next Big Data Platform release.

Ports are not available

When getting ports are not available error in validation page, make sure that any of Hadoop services are not running in the node or the required ports are not being used by other program.

Cluster valaidation dialog

If the ports are used by existing Hadoop service or any other program, stop it manually and refresh the node as shown in below image. To stop Hadoop service manually, use task manager to kill all Java processes belongs to the Hadoop services.

Cluster restart dialog

Big Data agent is not installed or running

You will get this error if the agent is not installed or running in the node or the Cluster Manager cannot communicate the agent service that will be running in a default port 60008.

Cluster validation dialog

To ensure this, follow the steps to ensure the agent service is running properly.

  • Open the Run dialog and type services.msc
  • Ensure the service “Syncfusion Big Data Agent” is running. If it’s not running start the service and use refresh option in validation of Cluster Manager to validate again.

Servies window

DNS and reverse DNS are not getting resolved

Hadoop services require proper network configuration so that DNS and reverse DNS will work properly for cluster nodes communication. If the network is not configured properly you will get this error.

Cluster vaildation dialog

By default Windows OS resolves host names in a same network, if you have any different network configuration like VLAN settings, you have to configure DNS server properly. You can ensure all nodes are in same network and reachable by host name.

Insufficient disk space

We recommend that to have minimum 50 GB free space for each cluster nodes. At least 2 GB is needed for Hadoop packages. You will get this error in validation page. Free up required disk space and retry.

Cluster vaildation dialog

Version mismatch

Cluster Manager and Big Data Agent is each nodes need to be of same version for proper cluster management and monitoring. If the build version of Cluster Manager and Agent of any node is not same, you will get this error in cluster validation page. Ensure the build versions are same and retry.

Cluster vaildation dialog

Multiple type of network

You can setup cluster either in wired or wireless network but not mixed type. If any of node is configured in both wired and wireless network, you will get this error in cluster validation page. Ensure all nodes are not having mixed type of network and retry.

Cluster vaildation dialog

Upgrade PowerShell version

To create cluster via automatic mode, you need to have PowerShell version to be 3.0 or higher in all the nodes. You will get this error if you have lower PowerShell version.

ALert notification dialog

To create secure cluster you need to have PowerShell version to be 3.0 or higher in the cluster manager installed machine.

From Windows 8 and Windows Server 2012, by default it has higher version. In case of using Windows 7 or Windows Server 2008 machines, you need to upgrade the PowerShell to higher version.

OS Default PowerShell version Packages need to be installed in the same order
Windows 7 Power shell 2.0 Windows 7 service pack
Windows Management Framework 3.0
Windows Server 2008 R2 Power shell 2.0 Windows server 2008 R2 service pack 1
Windows Management Framework 3.0

Pseudo node cluster - HBase services fails to run

In Pseudo node cluster if HMaster and HRegionServer services are not running, please ensure the case(Upper/Lower) specified for host name in etc/hosts file and in the property ‘hbase.zookeeper.quorum’ exist in hbase-site.xml are same. If the case is different, internal Zookeeper service of HBase fails to start which leads to HBase services to shutdown.
E.g. In the following case, internal Zookeeper fail to start,

Host name entry in "C:\Windows\System32\drivers\etc\hosts" 172.16.150.3 machine1
Value of property 'hbase.zookeeper.quorum' in hbase-site.xml Machine1

Cluster Manager UI is not rendering properly in Internet Explorer

Make sure there is no security enhancement enabled to restrict JavaScript and compatibility view is not enabled by default for internally hosted sites in IE.

Collect Logs does not gather any logs and Configuration page fails to load

If the Cluster Manager is installed in a machine that is not included in cluster nodes, then features Collect Logs and Configuration might not work properly.

The root cause would be Cluster Manager machine is not reachable by the cluster nodes to upload the log files and configuration files. This can be resolved by adding the host information of cluster manager in “etc/hosts” of all the nodes that belong to the cluster.

Oozie server not running on Ubuntu 16.04

The latest version of JDK 7 does not have Timezone data on Ubuntu 16.04 version. Due to this while starting Oozie server, it will get failed with a message like “Invalid Time Zone: UTC”.

Solution:

We need to update TimeZone data for JDK 7 on Ubuntu 16.04 by executing the following commands,

 sudo apt-add-repository ppa:justinludwig/tzdata
 sudo apt-get update
 sudo apt-get install tzdata-java*

Update Oozie share lib jar files

If any changes done in Oozie’s share lib folder(/user/share/lib/ on your HDFS) like adding a new jar file, it does not get reflected in Oozie server unless the Oozie server is restarted.

Please find following command to update Oozie share lib folder.

Windows:

 ..\SDK\Oozie\bin> oozie.cmd admin -sharelibupdate -oozie http://< host >:11000/oozie

Linux:

 ../SDK/Oozie/bin$ ./oozie admin -sharelibupdate -oozie http://< host >:11000/oozie