A fairly common requests from customers is the ability to initiate job execution on your HDInsight cluster programmatically. Some of these scenarios include:
- Scheduled execution of a job (every night at midnight, update the recommendation database).
- Incorporating job execution into a larger application (allow a client to configure and kick off web log processing).
- Building end user query tools.
In order to enable these scenarios, an HDInsight cluster exposes a WebHCat endpoint. WebHCat is a REST API to provide metadata management and remote job submission to the Hadoop cluster. You can find updated documentation
. Note, WebHCat has also been referred to as "Templeton" so expect to see some references to that.
WebHCat surfaces the following capabilities:
Using WebHCat .Net Client
Within a .NET application, you can easily use Microsoft.Hadoop.WebClient client library to submit and monitor jobs.
- Create an instance of WebHCatHttpClient passing in your server configuration, namely:
username/password: cluster credentials
- Invoke the job type you want. Initially, a basic reply will be sent back with the job id. You can either poll the job status, or use the WaitForJobToCompleteAsync method in order to obtain a task which will be leveraged when the job completes.
The following sample code shows how you can do this (in C#)
httpClient = new WebHCatHttpClient(new Uri("https://yourazurecluster.azurehdinsight.net:563"), "username", "password");
string outputDir = "basichivejob";
var t1 = httpClient.CreateHiveJob(@"select * from awards;", null, null, outputDir, null);
var response = t1.Result;
var output = response.Content.ReadAsAsync<JObject>();
string id = output.Result.GetValue("id").ToString();
CreateHiveJob will submit the job, and will return with the job id that is read out using Json.net. We then subscribe to the completion of the job using WaitForJobToComplete. This will block until the job actually completes. At this point, we could use the
to retrieve the output, or use any of the standard storage management tooling.