Syncfusion Big Data Platform Sandbox

Syncfusion Big Data Platform Sandbox is a user friendly Apache Hadoop Environment available in azure market place which can be accessed globally.

Prerequisite

Microsoft Azure account - You need to have azure account or create free azure account by following this link

Steps

Start by logging into the Azure Portal with your Azure account: https://portal.azure.com/
Search Syncfusion big data sandbox image in azure market place by following the below steps or directly navigate to the image creation prompt by using the URL
Syncfusion Big Data Platform Sandbox

Step 1: Starting in the upper left, click New (+) > See all option as like below screenshot.

Azure-Sandbox dialog

Step 2: Search “Syncfusion Big Data Sandbox” in the market place filter and select the Syncfusion sandbox.

Azure-Sandbox dialog

Step 3: Resource Manager is a constant deployment model for the Syncfusion Big Data Sandbox blade. Click Create.

Azure-Sandbox depolyment dialog

Step 4: In the basics blade, provide virtual machine details like name, login username, login password, select subscription, location and select an existing resource group or type the name for a new one , click OK to continue to the next section.

Azure-Sandbox machine details dialog

Step 5: Select a VM size from the list of sizes and click Select to continue.

Azure-Sandbox dialog

NOTE

Syncfusion sandbox requires minimum 14 GB RAM with 2 cores ex: Standard_DS11_v2.

Step 6: In Setting Blade VM contains default name for storage account, virtual network and security group. You can also change it manually and then click OK to continue.

Azure-Sandbox storage account dialog

NOTE

VM Creation with security group will affect the remote connection establishment of Hadoop cluster so that you need to add the remote accessing port number in azure security group or create VM with security group as None.

Step 7: Click OK when you see validation passed message in Summary of Syncfusion Big Data Sandbox.

Azure-Sandbox validation dialog

Step 8: Purchase details blade contains offer details of sandbox image, Once read the section ‘Terms of use’, click Purchase to start deployment.

Azure-Sandbox deployment dialog

Step 9: While Azure creates the virtual machine, you can track the progress by clicking on Virtual Machines on left. When the VM has been created, the status will change to Running.

Azure-Sandbox status dialog

Step 10: On the blade for the virtual machine, click Connect. Start working on Syncfusion sandbox by login to the RDP.

Overview

Syncfusion Big Data Sandbox helps you to access Hadoop Environment in various aspects.

Syncfusion Big Data Platform
Syncfusion Big Data Studio
Connect Hadoop Cluster on On-Premises Machine

Syncfusion Big Data Platform

The Syncfusion Big Data Platform includes a complete production environment that can helps you to run Hadoop jobs in a scalable manner on a full cluster. It allowing us to manage and monitor Hadoop clusters globally.

Explore the Cluster Manager application URL http://localhost:81 in browser where you can see the Hadoop Pseudo Node Cluster running and also you can access the application globally through public IP-address of the virtual machine i.e. http://<publicip>:81.
To know more about Public IP click here.

Azure-Sandbox cluster validation dialog

Click on the Cluster Name which navigate you to the cluster management page where you can see the supported Hadoop ecosystem and running services of Syncfusion sandbox image.

Azure-Sandbox cluster validation dialog

You can view the job status like running job, failed job and other status using cluster manager application.

Azure-Sandbox cluster status dialog

Monitoring page in cluster manager application helps you to view the status of cluster like heap size used, running threads, capacity and other details of cluster.

Azure-Sandbox cluster Monitoring dialog

Using Cluster Manager Application you can perform following operation easily.
- Start and stop the Hadoop services.
- Backup and Restore operations in HDFS and HBase.
- Job submission in Oozie.
- Job submission in Sqoop.

To know more click here

Big Data Management Studio

Big Data Management studio provides a user friendly Hadoop environment in windows machine, in sandbox when you start the machine this application is automatically started and connected with azure pseudo node cluster running in internal IP.

Using this Big Data Studio you can perform the following operation easily.

Interactive shell for command execution in Hadoop, Pig, Hive, Spark, HBase.
You can perform file operations such as create, copy, move, upload, download, and view file in HDFS.

To know more click here

Connect Hadoop Cluster on On-Premises Machine

Install Big Data studio application in your on-premises machine from link https://www.syncfusion.com/. In Big Data studio Dashboard, start the Big Data Studio application manually by clicking on the LAUNCH STUDIO icon.

Click on Add Cluster button in left side panel, then popup will be open. In that popup select Azure option and enter the public IP address of sandbox and cluster manager username and password, and connect.

Adding cluster dialog

NOTE

For connecting sandbox cluster with on-premises machine you need to have Public IP address for the machine with DNS name and also make sure that Virtual machine security group is none otherwise you need to add the security group rule for following port.

Application	Port number
Syncfusion Big Data Cluster Manager	81,82
Syncfusion Big Data Agent	60008
Syncfusion Big Data Remote Agent	60006

To access Hadoop and its ecosystem services you have to add firewall rule in Azure virtual machine for the port and add inbound rule in network security group if it is created.

To know more about Hadoop port details please visit here
To know more about Firewall inbound rule creation please visit here
To know more about Azure network security group please visit here

Port details of Hadoop and its ecosystem services

Following are the ports used by Syncfusion Big Data Hadoop Cluster to communicate between Hadoop and its ecosystem services.

Ecosystem	Services	PortNo
Hadoop	Name Node Web UI	50070
	Name Node	9000
	Data Node Web UI	50075
	Data Node	50010
	Data Node IPC	50020
	Journal Node	8480
	Journal Node RPC	8485
	Resource Manager Web UI	8088
	Resource Manager Tracker	8031
	Resource Manager	8032
	Resource Manager Admin	8033
	Node Manager	8042
	Node Manager Localizer	8040
	Job History Server Web UI	19888
	Job History	10020
	Job History Admin	10033
Hive	Hive Server 2	10000
	Meta Store	9083
	Postgre SQL Server	1527
Zookeeper	Quorum Peer Main	2181
	ZookeeperServer1	3888
	ZookeeperServer2	2888
	ZKFC	8019
Map reduce	ShuffleR	8080
Map reduce	ShuffleD	13562
Oozie	Rest Service	11000
Spark	Thrift service	10001
HBase	HMaster	60000
	HRegion Server	60020
	HBaseThriftServer	10003
	HBaseRestServer Web	8085
	HBase Rest NGINX Server	14004
	HBase Rest Server	10005
IPython	IPython Spark	10002
	IPython NGINX Server	14002
	SecureIPythonServer	10012
Others	HttpFS NGINX Server	14002
	HttpFS	14000
	Remote Agent	60006
	Installer Agent	60008

View public IP of virtual machine

Step 1: Select “Virtual machines” from left side blade in Azure Portal.

Azure-Sandbox VM dialog

Step 2: Select the virtual machine name from the list of virtual machines.

Step 3: Click “Overview”, where you can able to view the essentials of virtual machines. Public IP address section is high lightened in below screen shot.

Azure-Sandbox validation dialog