Using Job Submission cmdlets
New cmdlets submit jobs to clusters remotely using web services. They work from any client machine where cmdlets are installed - no need to RDP into the cluster to submit jobs. The cmdlets support following job types:
- Map reduce
Prepare work environment
Submit Hive query
- Submitting Hive query to remote cluster consist of two steps. First you select the cluster which you will submit jobs to by using “Use” command.
- Then you send Hive queries to the selected cluster and results of the query output or errors will be returned back to the PowerShell console. This is similar to standard Hadoop console environment (hive.cmd) but works directly from your client machine
hive "select * from hivesampletable limit 10"
- The command above will produce following output.
Submitting Hive query..
Started Hive query with Job Id : job_201309260027_0046
Hive query completed Successfully
8 18:54:20 en-US Android Samsung SCH-i500 California United States 13.9204007 0 0
23 19:19:44 en-US Android HTC Incredible Pennsylvania United States NULL 0 0
23 19:19:46 en-US Android HTC Incredible Pennsylvania United States 1.4757422 0 1
23 19:19:47 en-US Android HTC Incredible Pennsylvania United States 0.245968 0 2
28 01:37:50 en-US Android Motorola Droid X Colorado United States 20.3095339 1 1
28 00:53:31 en-US Android Motorola Droid X Colorado United States 16.2981668 0 0
28 00:53:50 en-US Android Motorola Droid X Colorado United States 1.7715228 0 1
28 16:44:21 en-US Android Motorola Droid X Utah United States 11.6755987 2 1
28 16:43:41 en-US Android Motorola Droid X Utah United States 36.9446892 2 0
- Known issue: When query has special characters like ‘%’ submitting it using above example will fail. Templeton web service will not be able to correctly handle that special character. The work around is to create script file on the storage account and submit
query as File using –File parameter. For example:
hive –File "wasb://yourcontainer@yourstorageaccount/yourjobfolder/query.hql"
- The same limitation and workaround applies to other job submission cmdlets described below. We are working on fixing it in next update.
Submit map reduce job
- Submitting map reduce job requires a few more steps. First, we’ll capture cluster name in the variable.
$clustername = "yourclustername"
- Next step is to create job definition object which will capture arguments of the job about to be submitted to the cluster. In this example we’ll submit standard word count job which is distributed in hadoop-examples.jar. You can copy hadoop-examples.jar
file to “/example/jars/” folder in your default storage account from your cluster file system. It is located in "C:\apps\Samples\hadoop-examples.jar"
$wordCountJob = New-AzureHDInsightMapReduceJobDefinition -JarFile "/example/jars/hadoop-examples.jar" -ClassName "wordcount" -Arguments "/example/data/gutenberg/davinci.txt", "/example/output/WordCount"
- Now we’ll submit the job to the cluster, wait for its completion and print output of the job to the console.
| Start-AzureHDInsightJob -Cluster $clustername `
| Wait-AzureHDInsightJob -WaitTimeoutInSeconds 3600 `
| Get-AzureHDInsightJobOutput –StandardError
- The output file produced by the job is stored in the “/example/output/WordCount” location as specified by job arguments.
- In similar manner you can create job definitions for Pig, Hive and Streaming jobs using New-AzureHDInsightPigJobDefinition, New-AzureHDInsightHiveJobDefinition, New-AzureHDInsightStreamingMapReduceJobDefinition and then submit them to cluster using Start-AzureHDInsightJob,
Wait-AzureHDInsightJob and Get-AzureHDInsightJobOutput cmdlets.
List jobs running or completed on the cluster
- At any time you can inspect status of the jobs by using Get-AzureHDInsightJob cmldet.
Get-AzureHDInsightJob $clustername | ft -a
- You can also cancel execution of the job using Stop-AzureHDinsightJob cmdlet.
$sreds = Get-Credential
Stop-AzureHDInsightJob $clustername -Credential $creds –JobId $jobid