Syncfusion Big Data Platform Sandbox

Syncfusion Big Data Platform Sandbox is a user friendly Apache Hadoop Environment available in azure market place which can be accessed globally.

Prerequisite

Microsoft Azure account - You need to have azure account or create free azure account by following this link

Steps

Start by logging into the Azure Portal with your Azure account: https://portal.azure.com/
Search Syncfusion big data sandbox image in azure market place by following the below steps or directly navigate to the image creation prompt by using the URL
Syncfusion Big Data Platform Sandbox

Step 1: Starting in the upper left, click New (+) > See all option as like below screenshot.

Azure-Sandbox dialog

Step 2: Search “Syncfusion Big Data Sandbox” in the market place filter and select the Syncfusion sandbox.

Azure-Sandbox dialog

Step 3: Resource Manager is a constant deployment model for the Syncfusion Big Data Sandbox blade. Click Create.

Azure-Sandbox depolyment dialog

Step 4: In the basics blade, provide virtual machine details like name, login username, login password, select subscription, location and select an existing resource group or type the name for a new one , click OK to continue to the next section.

Azure-Sandbox machine details dialog

Step 5: Select a VM size from the list of sizes and click Select to continue.

Azure-Sandbox dialog

NOTE

Syncfusion sandbox requires minimum 14 GB RAM with 2 cores ex: Standard_DS11_v2.

Step 6: In Setting Blade VM contains default name for storage account, virtual network and security group. You can also change it manually and then click OK to continue.

Azure-Sandbox storage account dialog

NOTE

VM Creation with security group will affect the remote connection establishment of Hadoop cluster so that you need to add the remote accessing port number in azure security group or create VM with security group as None.

Step 7: Click OK when you see validation passed message in Summary of Syncfusion Big Data Sandbox.

Azure-Sandbox validation dialog

Step 8: Purchase details blade contains offer details of sandbox image, Once read the section ‘Terms of use’, click Purchase to start deployment.

Azure-Sandbox deployment dialog

Step 9: While Azure creates the virtual machine, you can track the progress by clicking on Virtual Machines on left. When the VM has been created, the status will change to Running.

Azure-Sandbox status dialog

Step 10: On the blade for the virtual machine, click Connect. Start working on Syncfusion sandbox by login to the RDP.

Overview

Syncfusion Big Data Sandbox helps you to access Hadoop Environment in various aspects.

  • Syncfusion Big Data Platform
  • Syncfusion Big Data Studio
  • Connect Hadoop Cluster on On-Premises Machine

Syncfusion Big Data Platform

The Syncfusion Big Data Platform includes a complete production environment that can helps you to run Hadoop jobs in a scalable manner on a full cluster. It allowing us to manage and monitor Hadoop clusters globally.

  • Explore the Cluster Manager application URL http://localhost:81 in browser where you can see the Hadoop Pseudo Node Cluster running and also you can access the application globally through public IP-address of the virtual machine i.e. http://<publicip>:81.
    To know more about Public IP click here.

Azure-Sandbox cluster validation dialog

  • Click on the Cluster Name which navigate you to the cluster management page where you can see the supported Hadoop ecosystem and running services of Syncfusion sandbox image.

Azure-Sandbox cluster validation dialog

  • You can view the job status like running job, failed job and other status using cluster manager application.

Azure-Sandbox cluster status dialog

  • Monitoring page in cluster manager application helps you to view the status of cluster like heap size used, running threads, capacity and other details of cluster.

Azure-Sandbox cluster Monitoring dialog

  • Using Cluster Manager Application you can perform following operation easily.

    • Start and stop the Hadoop services.
    • Backup and Restore operations in HDFS and HBase.
    • Job submission in Oozie.
    • Job submission in Sqoop.

To know more click here

Big Data Management Studio

Big Data Management studio provides a user friendly Hadoop environment in windows machine, in sandbox when you start the machine this application is automatically started and connected with azure pseudo node cluster running in internal IP.

Using this Big Data Studio you can perform the following operation easily.

  • Interactive shell for command execution in Hadoop, Pig, Hive, Spark, HBase.
  • You can perform file operations such as create, copy, move, upload, download, and view file in HDFS.

To know more click here

Connect Hadoop Cluster on On-Premises Machine

Install Big Data studio application in your on-premises machine from link https://www.syncfusion.com/. In Big Data studio Dashboard, start the Big Data Studio application manually by clicking on the LAUNCH STUDIO icon.

Click on Add Cluster button in left side panel, then popup will be open. In that popup select Azure option and enter the public IP address of sandbox and cluster manager username and password, and connect.

 Adding cluster  dialog

NOTE

For connecting sandbox cluster with on-premises machine you need to have Public IP address for the machine with DNS name and also make sure that Virtual machine security group is none otherwise you need to add the security group rule for following port.

Application Port number
Syncfusion Big Data Cluster Manager 81,82
Syncfusion Big Data Agent 60008
Syncfusion Big Data Remote Agent 60006

To access Hadoop and its ecosystem services you have to add firewall rule in Azure virtual machine for the port and add inbound rule in network security group if it is created.

To know more about Hadoop port details please visit here
To know more about Firewall inbound rule creation please visit here
To know more about Azure network security group please visit here

Port details of Hadoop and its ecosystem services

Following are the ports used by Syncfusion Big Data Hadoop Cluster to communicate between Hadoop and its ecosystem services.

Ecosystem Services PortNo

Hadoop

Name Node Web UI

50070

Name Node

9000

Data Node Web UI

50075

Data Node

50010

Data Node IPC

50020

Journal Node

8480

Journal Node RPC

8485

Resource Manager Web UI

8088

Resource Manager Tracker

8031

Resource Manager

8032

Resource Manager Admin

8033

Node Manager

8042

Node Manager Localizer

8040

Job History Server Web UI

19888

Job History

10020

Job History Admin

10033

Hive

Hive Server 2

10000

Meta Store

9083

Postgre SQL Server

1527

Zookeeper

Quorum Peer Main

2181

ZookeeperServer1

3888

ZookeeperServer2

2888

ZKFC

8019

Map reduce

ShuffleR

8080

ShuffleD

13562

Oozie

Rest Service

11000

Spark

Thrift service

10001

HBase

HMaster

60000

HRegion Server

60020

HBaseThriftServer

10003

HBaseRestServer Web

8085

HBase Rest NGINX Server

14004

HBase Rest Server

10005

IPython

IPython Spark

10002

IPython NGINX Server

14002

SecureIPythonServer

10012

Others

HttpFS NGINX Server

14002

HttpFS

14000

Remote Agent

60006

Installer Agent

60008

View public IP of virtual machine

Step 1: Select “Virtual machines” from left side blade in Azure Portal.

Azure-Sandbox VM dialog

Step 2: Select the virtual machine name from the list of virtual machines.

Step 3: Click “Overview”, where you can able to view the essentials of virtual machines. Public IP address section is high lightened in below screen shot.

Azure-Sandbox validation dialog