Syncfusion Hadoop cluster in Microsoft Azure virtual machines

You can easily create and deploy Syncfusion Hadoop cluster in Microsoft Azure virtual machines environment in minutes. The Syncfusion Big Data Cluster Manager allows you to effectively manage the resources in Microsoft Azure with the options to shut down, restart and destroy the virtual machines as required.

Please note that you will need to assign Admin privileges to the Azure account that will be used to create the cluster. For detailed steps, please refer here.

Create cluster on Azure

Step 1: Login into cluster manager application, navigate to Azure tab and click “CREATE”.

Step 2: Select cluster type.

Step 3: Enter Active Directory credentials, web client ID, secret key, azure AD tenant ID and click “Next”.

Step 4: Select an Azure subscription under your account, enter a unique resource group name and then select the required region, data node count, virtual machine size and click “Next”.

NOTE

All resources such as DNS, public IP, storage account, VNET, availability set, load balancer and virtual machines created for Azure deployment will be under the resource group mentioned.

Step 5: Select one of the below 2 storage types,

  • HDFS
    The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware that uses local disks of virtual machines.

  • Azure Blob
    Windows Azure Storage Blob (WASB) is an extension built on top of the HDFS APIs. This enables multiple clusters plus other applications to access a single storage (Container) at the same time. Azure Blob has 2 storage tiers, hot or cool, depending on how often the data is accessed. You can configure blob with Hadoop in any of the following ways,

    1. Allow application to create a new blob storage account or select an existing storage account from your subscription.
    2. Allow application to create a new container or select an already existing container which may have data already.

Step 6: Enter username and password for virtual machines and cluster manager application and click “Create cluster”

NOTE

If it is a Secure cluster, you need to provide Azure AD Domain service details and select VNet that configured with AD Domain service in next wizard.

Step 7: Once after deploying, Azure VM specific details and cluster manager hosted link will be updated in Azure page.

NOTE

Time taken for deploying Hadoop cluster in Azure VM environment vary based on the region and the number of nodes.


Scale Azure cluster

You can add Data Nodes dynamically to a running Azure cluster.

Step 1: Click “Add Data Nodes” in VM details page of an Azure cluster.

Step 2: Provide all the required fields in “Add Data node” dialog and click “Create”.
This will create required resources and virtual machines and then add them as Data Nodes to the cluster.

Manage cluster and resources on Azure

  • You can stop the cluster creation in Azure VM at any time during deployment stage by using “Cancel” option available. Selecting “Cancel” option will simply destroy all the resources created under the given resource group.

  • You can shut down and restart the cluster and can destroy it completely as required.

Azure Scheduler

You can start and stop Azure virtual machines with Hadoop cluster at scheduled intervals as required.

The following different types of scheduler are supported,

  • Once
  • Daily
  • Weekly
  • Monthly

Create a new Scheduler

After creating an Azure cluster, go to “View Details” of the cluster and click “Schedule”.

You will get Scheduler wizard and select type, schedule mode, time zone, start date, end date and cluster duration. Once filled as required, click submit to create a Scheduler to auto start and stop Azure virtual machines with Syncfusion Hadoop cluster.

‘Partial’ schedule mode allows to schedule selected data nodes to start and stop as needed. This allows the cluster to have maximum computation power only when needed like when a big job is scheduled to run which requires all data nodes to be in running state. Storage account of ‘Blob’ type has Access Tier support. You can select the option to switch Access Tier as Cool in shutdown state and Hot when running to save the cost.

Manage an existing Scheduler

You can edit, disable, enable and delete an existing Scheduler whenever required.

NOTE

Azure Scheduler feature will be supported in selected regions. In order to schedule Hadoop cluster in Azure VMs through cluster manager application, you need to select regions that has support for Scheduler when creating new Azure cluster. Regions that support for Azure Scheduler are indicated by Scheduler icon.

Offer ID

Offer ID is necessary to get the rate card details for billing.

Steps to get Offer ID

Step 1: Open the subscription page

Step 2: Click the subscription for which the offer id required.

Step 3: You can find the offer id for your subscription in the bottom of the page.

Port details and procedure to connect client application

Connect client application to Azure ecosystem services

Syncfusion Hadoop cluster on Microsoft Azure is a closed one. You can connect to ecosystem services through load balancer.

What is load balancer?

The load balancer gives you control over how inbound communication is managed. This communication can include traffic that’s initiated from Internet hosts or virtual machines in other cloud services or virtual networks. This control is represented by an endpoint (also called an input endpoint).
An endpoint listens on a public port and forwards traffic to an internal port. You can map the same ports for an internal or external endpoint or use a different port for them.

For example, you can have a web server configured to listen to port 81 while the public endpoint mapping is port 80.
The creation of a public endpoint triggers the creation of a load balancer instance.

Connect ecosystem services

For example IPython server of NameNode1 is running on 14003 and mapped to port number 8002 in load balancer. You can connect IPython server like below,
https://samplefqdnname.eastus2.cloudapp.azure.com:8002

Where can I get FQDN name of an Azure cluster in cluster manager application?

You can get FQDN name from cluster manager following the below steps,
Step 1: Login into cluster manager application, navigate to Azure tab and you can see the list of clusters created in Azure. Click view details in grid which you want to connect.

Step 2: In view details page, you can get FQDN name from Azure details grid.

Port details

The following tables shows mapped ports details in load balancer.

NN1 – NameNode1

NN2 – NameNode2

Services Actual Port number Load-balancer port number </br> (NN1,NN2)
IPython server 14003 8002,8003
HttpFS 14002 8000,8001
Hive Thrift 10000 8004,8005
HBase Rest service 14004 8006,8007
Spark SQL 10001 8008,8009

Prerequisites for Azure deployment

  • Active Directory user with Admin privileges for Azure Subscription.
  • Enable programmatic deployment for Syncfusion Big data platform image.
  • Create Web application for Azure automation.
  • Create VNet and enable Active Directory Domain Service (For Kerberos enabled cluster alone).

Steps to create the above mentioned prerequisites as follows.

Create Active Directory user with Admin privileges

Step 1: Sign in to Microsoft Azure portal https://portal.azure.com using your Microsoft account.

Step 2: Select Active Directory service in Azure portal and navigate into existing default directory.

Step 3: Under ‘All users’ tab, add new user with role “Global administrator” and note down the username and password. The password you get here is a temporary one. So login again into the portal using the new user credentials to change the password.

NOTE

Ensure same value updated in both fields ‘Name’ and ‘User name’.

Step 4: Select ‘Subscriptions’ in azure portal and select subscription name for which you want to add the user.

Step 5: Select ‘Access control (IAM)’ option from the left side panel and click ‘Add’ in the access control window.

Step 6: In add permission panel that is visible at right end, select the ‘Role’ as ‘Owner’ and select the new user created and save it.

Enable programmatic deployment

Step 1: Sign in to Microsoft Azure portal https://portal.azure.com using your Microsoft account for Azure.

Step 2: Browse for Market Place from left panel of dashboard, search “Syncfusion Big Data Platform” in virtual machines category.

Step 3: Select the virtual machine, go to “Configure Programmatic Deployment”, enable for your subscriptions and save the changes.

Create Web application for Azure automation

Create Web application

Step 1: Navigate to the Azure portal and select “Azure Active Directory”.

Step 2: Select the Active Directory that you want to use for creating the new application.
If you have more than one Active Directory, you usually want to create the application in the directory where your subscription resides.
You can only grant access to resource in your subscription for applications in the same directory as your subscription.

Step 3: To view the applications in your directory, click ‘Enterprise applications’.

Step 4: If you have not created an application in that directory before. Click ‘New application’ to create a new one.

Step 5: Select type of application “Application you’re developing” as shown below and it takes to app registration page.

Step 6: Provide a name for the application, select ‘Web app / API’ application type and sign-on URI. (The sign-on URI can be used for authentication process but here we are not going to implement it, it will just validate for a valid URI) and click ‘Create’.

Get client ID from the application and add permission

Step 1: After creating a Web application, go to ‘Azure Active Directory’ -> ‘App registrations’ and select your application as shown below.

Step 2: Copy the Application ID that will be useful in cluster creation process.

Step 3: Go to ‘Settings’ -> ‘Owners’ and click ‘Add owner’ and select the user.

Step 4: In the same settings page, go to ‘Required permissions’ and click ‘Add’.

Step 5:: Select ‘Windows Azure Service Management API’ and select the Delegated permission as shown below.

Step 6: Save the changes.

Step 7: Select ‘Subscriptions’ in azure portal and select subscription name for which you used for adding the user.

Step 8: Select ‘Access control (IAM)’ option from the left side panel and click ‘Add’ in the access control window.

Step 9: In add permission panel that is visible at right end, select the ‘Role’ as ‘Owner’ and select the newly created application and save it.

Get authentication key from web application

Tenant ID and application secret key are required to deploy cluster in Azure.

Follow the steps stated below to get the application secret key,

Step 1: To generate authentication key from created web client application, Select Settings in the Web application created.

Step 2: Select Keys in Settings panel.

Step 3: Enter description, duration, value and select Save. Based on the provided information Key will get generated.

Copy the information from VALUE column which is the application secret key used for cluster creation. You cannot retrieve the same key later.

Get tenant ID:

Follow the steps stated below to get the tenant ID,

Step 1: Select Azure Active Directory.

Step 2: Select Properties.

Step 3: Copy the value from Directory ID. This is the tenant ID that is going to be used for cluster creation.

Create VNet and enable the Active Directory Domain Service (For Kerberos enabled cluster alone)

Azure Kerberos Cluster – Configure Azure Domain Service

Step 1: Select ‘Azure AD Domain Services’ and click Add button to enable the domain service for default Active Directory.

Step 2: Enter Resource group name to configure the Active Directory Domain service.

Step 3: Select the virtual network name to provide gateway between the domain service and virtual network for connecting domain service with other VM resource.

NOTE

To create a new virtual network in that region, use the ‘Create new’ option.

Step 4: Click the ‘Add members’ button and select the existing users from the Default active directory to configure ‘AAD DC Administrators’ group.

Click OK to move on to the Summary page of the wizard.

Step 5: In Summary page of the wizard, review your configuration settings for the managed domain. You can go back to any previous steps to make changes if necessary and click OK.

You can see a notification that shows the progress of your Azure AD Domain Services deployment. Click the notification to see detailed progress of the deployment.

After deployment, you can find the DNS server IP.

Step 6: Select ‘Secure LDAP’ option from left side panel. Enable the ‘Secure LDAP’ toggle button appeared in right side screen.

Step 7: Create a self-signed certificate using powershell with following commands.

$lifetime=Get-Date

New-SelfSignedCertificate -Subject *.domainname.onmicrosoft.com ` -NotAfter $lifetime.AddDays(365) -KeyUsage DigitalSignature, KeyEncipherment ` -Type SSLServerAuthentication -DnsName *.domainname.onmicrosoft.com

Step 8: Export the pfx file with password for the above created certificate and upload here with its password. Click on ‘Save’ button to enable Secure Ldap for domain service, proceed now to create the Kerberos cluster.

Step 9: Select Active Directory service in left panel and navigate into existing default directory.

Step 10: Under ‘All users’ tab, add new user within group AAD DC Administrators and note down the username and password. The password you get here is a temporary one. So login the portal using new user credentials to reset the password. These credentials are going to used during Cluster Creation process.

NOTE

Ensure same value updated in both fields ‘Name’ and ‘User name’.

Step 11: Login Azure AD Access Panel with newly created user credentials.

Step 12: On the top right corner, select the username created. Choose Profile from menu, and update the password. Now, the user is ready to use.

Step 13: Before starting with secure cluster deployment, ensure that Azure Active Directory Domain Service network does not contain any Peering with any other network.

NOTE

Remember to ensure this every time before recreating Azure Kerberos cluster. If you are going to reuse the Active Directory Domain Service for recreating Kerberos cluster, then you must delete Organizational Unit in Namenode1 Virtual Machine.

###Steps to delete organizational unit

Step 1: Open Namenode1 Virtual Machine using Active Directory Domain username and password which was used to create while creating the Kerberos cluster.

Step 2: Open Active Directory Users and Computers popup and delete the organizational unit ‘AzureClu’.