Oozie Client Overview

Oozie is a workflow scheduler system to manage Apache Hadoop jobs. It allows creation of Directed Acyclical Graphs (DAGs) of actions called workflows. A workflow can be triggered on a recurring basis or on the availability of data. Oozie supports variety of job types including Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop jobs.
Oozie can be used in variety of scenarios including:
  • Composing multiple interdependent jobs into single execution unit. When map-reduce jobs need to run one after another to produce the output Ozzie can be used to automate this sequencing including cases with acyclical graph dependencies.
  • Automating running jobs on schedule or on availability of data.
  • Notification of users over email about job status.
Oozie client library provides set of convenient .Net classes for submission and management of Oozie workflows. The client library works with Hadoop clusters over HTTP using Oozie REST APIs.

Using Oozie .Net Client

Oozie .Net client is new addition to Microsoft.Hadoop.WebClient NuGet package. It resides in Microsoft.Hadoop.WebClient.OozieClient namespace. Using classes in this namespace you can do following steps to submit and manage Oozie workflows:
  • Install the WebClient NuGet package (either via the Package Manager or the console)
install-package Microsoft.Hadoop.WebClient

  • Create an instance of OozieClient passing in your server credentials:
var client = new OozieHttpClient(myAzureCluster, myAzureUserName, myAzurePassword);
  • Upload Oozie workflow definition files to your cluster ASV or HDFS file system. Oozie workflows are defined as Xml files. You can learn more about their format in Oozie documentation. Uploading definition files is easy using uploader/downloader helper class:
fileUploaderDownloader.UploadDirectory(new DirectoryInfo(resourcesFullPath), appPath);
  • Create Oozie workflow job. During submission of the job you can customize properties of the job using JobProperties class:
 var oozieJobProperties = new OozieJobProperties(myAzureUserName,
                 string.Format("asv://{0}@{1}", myAzureStorageContainer, myAzureStorageAccount), 
                 jobTrackerHost, appPath, inputPath, outputPath);
 var submitJob = client.SubmitJob(oozieJobProperties.ToDictionary());
 string id = HttpClientTools.GetTaskResults(submitJob).Value<string>("id");
  • Once you have Id of the Oozie job you can monitor its status or directly start it:
client.StartJob(id).Wait();
  • StartJob method starts the job and returns when it starts executing. You can check status of the job periodically using GetJobInfo method:
var status = HttpClientTools.GetTaskResults(client.GetJobInfo(jobId)). Value<string>("status");
  • Download results from output folder after the job completes using WebHDFS client, uploader/downloader helper class or other tools.

Last edited Apr 9, 2013 at 4:42 AM by mwinkle, version 4

Comments

yolasca Mar 28, 2014 at 12:13 PM 
I'm getting the same error, at code line "string id = HttpClientTools.GetTaskResults(submitJob).Value<string>("id")". I think it is because of certificates, but I have not achieved it works yet.

rujain Jan 16, 2014 at 1:17 PM 
My submitJob step is getting errored out. The exception says that "One or more errors occurred". Could you help me on how to check whats failing.