Installation

Prepare work environment

Provision New Cluster

  • Capture credentials for the cluster in local variables.
    $creds = Get-Credential
  • Get storage account key so you can use it later. Replace italic names with the names of your resources in this and further script snippets.
    $key1 = (Get-AzureStorageKey yourblobstorage).Primary
  • Create new cluster. The cmdlet will return Cluster object in about 10-15 minutes when cluster is finished provisioning.
    New-AzureHDInsightCluster -Name yourclustername -Location "North Europe" 
    -DefaultStorageAccountName yourblobstorage.blob.core.windows.net -DefaultStorageAccountKey $key1 
    -DefaultStorageContainerName "yourcontainer" -Credential $creds -ClusterSizeInNodes 4

Provision New Customized Cluster

You can also provision cluster and configure it to connect to more than one Azure Blob storage or custom Hive and Oozie metastores. This advanced feature allows you to separate lifetime of your data and metadata from the lifetime of the cluster.
  • First, get storage key of the storage accounts you want to connect your cluster to. In this case 2 storage accounts.
    $key1 = Get-AzureStorageKey yourblobstorage | %{ $_.Primary }
    $key2 = Get-AzureStorageKey yoursecondblobstorage | %{ $_.Primary }
  • Then create your custom SQL Azure Database and Server following this blog post or get credentials to the existing one. Capture credentials for the SQL server databases in the variables.
    $oozieCreds = Get-Credential
    $hiveCreds = Get-Credential
  • Using these connectivity information you can create cluster config object, pipe additional storage and metastores configuration into in and then finally pipe this config object into New-AzureHDInsightCluster cmdlet to create cluster based on the custom configuration.
    New-AzureHDInsightClusterConfig -ClusterSizeInNodes 4 `
         | Set-AzureHDInsightDefaultStorage -StorageAccountName yourblobstorage.blob.core.windows.net -StorageAccountKey $key -StorageContainerName "yourcontainer" `
         | Add-AzureHDInsightStorage -StorageAccountName yoursecondbobstorage.blob.core.windows.net -StorageAccountKey $key2 `
         | Add-AzureHDInsightMetastore -SqlAzureServerName "yoursqlserver.database.windows.net" -DatabaseName "yourOozieDatabase" -Credential $oozieCreds -MetastoreType OozieMetastore `
         | Add-AzureHDInsightMetastore -SqlAzureServerName "yoursqlserver.database.windows.net" -DatabaseName "yourHiveDatabase" -Credential $oozieCreds -MetastoreType HiveMetastore `
         | New-AzureHDInsightCluster -Credential $creds -Name yourclustername -Location "North Europe"

Create cluster with custom Hadoop configuration values and shared libraries

In version 0.10 of the cmdlets two new capabilities are made available:
  1. Customizing Hadoop configuration values. Following configuration files are supported:
    1. core-site.xml
    2. hdfs-site.xml
    3. mapred-site.xml
    4. capacity-scheduler.xml
    5. hive-site.xml
    6. oozie-site.xml
  2. Adding shared libraries to the following hadoop services
    1. Hive
    2. Oozie
These configuration changes are preserved through lifetime of the cluster and not affected by node reimages that Azure platform periodically performs for maintenance or error recovery reasons.
  • New customization options for cluster creation are made available by means of Add-AzureHDInsightConfigValues cmdlet. In the first step we’ll add new configuration value for Hive using AzureHDInsightHiveConfiguration object. We’ll do it using PowerShell dictionary syntax where key value pairs for the configuration file can be specified inside “@{}” brackets as specified below.
$configvalues = new-object 'Microsoft.WindowsAzure.Management.HDInsight.Cmdlet.DataObjects.AzureHDInsightHiveConfiguration'
$configvalues.Configuration = @{ “hive.exec.compress.output”=”true” } 
  • Next we’ll add shared library (Avro SerDe for example) to the Hive shared library folder. This is done using dedicated container in the Azure Blob Storage account. During cluster provisioning HDInsight service will copy contents of the container to the Hive shared library folder on the head and data nodes. It is important to keep content of this storage container static during the lifetime of the cluster. HDInsight service will periodically request files from this container when reimages of the data nodes happen. In order to ensure consistency of the file versions across data nodes of the cluster it is recommended to lock versions of the files stored in the container. The only mechanism of updating shared libraries on the cluster is recreation of the cluster. We’ll specify storage account with libraries using AdditionalLibraries property of the config values object.
$configvalues.AdditionalLibraries = new-object 'Microsoft.WindowsAzure.Management.HDInsight.Cmdlet.DataObjects.AzureHDInsightDefaultStorageAccount'
$configvalues.AdditionalLibraries.StorageAccountName = "yourstorageaccount.blob.core.windows.net" 
$configvalues.AdditionalLibraries.StorageAccountKey = (Get-AzureStorageKey yourstorageaccount).Primary
$configvalues.AdditionalLibraries.StorageContainerName = "hivelibs"
  • Before submitting request to create cluster with custom parameters we need to capture credentials for the cluster in the variable
$creds = Get-Credential
  • Now we can use prepared configuration values to create cluster configuration, add storage account and cluster size information to it and submit cluster creation operation
New-AzureHDInsightClusterConfig -ClusterSizeInNodes 4 `
    | Set-AzureHDInsightDefaultStorage -StorageAccountName yourstorageaccount.blob.core.windows.net        -StorageAccountKey (Get-AzureStorageKey yourstorageaccount).Primary -StorageContainerName "yourstoragecontainer" `
    | Add-AzureHDInsightConfigValues -Hive $configvalues `
    | New-AzureHDInsightCluster -Credential $creds -Name yourclustername -Location "North Europe"
  • This operation will block until cluster is created.
  • Known issue. In the current version of the service some of the errors caused by incorrect values of the cluster parameters will not be properly reported back to the user. In case you see unknown errors while using custom configuration for the cluster please double check your parameters. We are working on the fix for this issue in next update to the service.

List Clusters

  • List all clusters in the current subscription.
    Get-AzureHDInsightCluster
  • Show details of the specific cluster in the current subscription.
    Get-AzureHDInsightCluster yourclustername

Delete Cluster

  • Delete cluster by name in the current subscription.
    Remove-AzureHDInsightCluster yourclustername

Last edited Dec 12, 2013 at 2:29 AM by maxluk, version 14

Comments

No comments yet.