Sqoop
Apache Sqoop is a tool that allows for the easy transfer of data between HDFS and relational data stores. Such transfers can be in either direction. For additional details on Sqoop please refer the Sqoop website. Big Data Studio offers a simple UI to configure Sqoop jobs.
Big Data studio provides support with the databases like MySql,Microsoft SQL server,PostgreSQL and Oracle.
Sqoop JDBC Connectors
Sqoop requires JDBC connector jar files to be present in lib folder (SDK\Sqoop\lib) for execution of import and export jobs. Big Data Studio provides support to auto install the connector jars from driver’s providers (SQLServer, MySQL and Oracle). Select the required connectors, check the license and then click “Install” button to proceed with Installation.
Add connection
Adding a connection to a server is straightforward. Click the “Add” button under Sqoop – connection group and enter the server details in the pop up shown like below and click the “Save” button after choosing appropriate JDBC driver from the drop down
Add Sqoop job
Syncfusion Big Data Studio provided two ways to add a Sqoop job.
- Using command line query
- Using Connection details
Using command line query
To add Sqoop job, click “Add” button under Sqoop tab. Then check the “Command Line” checkbox and click the “Next” button.
Provide the Sqoop command query in query editor and click “Save & Run” button to run the job immediately. You can also save them and run it later.
Using connection details
To add Sqoop job, click “Add” button under Sqoop tab. Provide job name, choose database server connection and the type of job (import / export) and click “Next” button.
Provide the database name. Check Import all tables, if you would like to import all tables or else specify the table name explicitly for importing. Click “Next” button once done to move to next form.
In the final form, provide the HDFS directory location and the number of mappers required and add the arguments by clicking the link next to the Arguments textbox.
It will display a predefined list of Sqoop Arguments; you can provide input directly.
Click the link next to the Incremental textbox. It will display a “Incremental Arguments” form.
- Select an Incremental type either append or lastmodified.
- Add the Check column which is targetcolumn to check for incremental change.
- Add the Last value which is last imported value in incremental check column.
- Click “OK” to continue.
Click “Save & Run” button to run the job immediately. You can also save them and run it later.
Similarly you can also create an Export job and execute the same.
Other features
Other features available under Sqoop tab are
- Stop Sqoop job
- Re-run Sqoop job
- Edit/Delete the Sqoop job
- Edit the database connection
Stop Sqoop job
Sqoop job can be stopped by clicking on “Stop” button under Sqoop tab when the Sqoop job is in ACCEPTED or RUNNING state.
Re-run Sqoop job
Click the “Run Job” button under Sqoop tab for job resubmission.
Edit/Delete the Sqoop job
Big Data Studio provides the feature of editing the already saved or executed job by clicking on “Edit” button under Sqoop tab or from Sqoop job details browser.
Also we can rerun the already submitted job by clicking on “Run Job” button under Sqoop tab.
Edit the database connection
Big Data studio also provides the option for edit and deleting the connection details by clicking on “Edit/Delete” button from Sqoop connection list browser or from under Sqoop tab