reading documents and images in HDInsight

Oct 23, 2013 at 9:44 AM
I'm trying to read files with the FileStream object:
FileStream fs = File.Open(inputLine, FileMode.Open, FileAccess.Read);

I always get error with filename error; I try this:
hdfs://localhost/path/filename.docx
hdfs://localhost:8020/path/filename.docx
//path/filename.docx

I don't konw if FileStream is appropiate for reading files from hdfs. Is correct using FileStream object?

should I use WebHDFSClient API?
or should I use HdfsFileSystem (http://blogs.msdn.com/b/carlnol/archive/2013/02/08/hdinsight-net-hdfs-file-access.aspx) ?

the data will be stored in ASV; would hdfs still work, or I need to use a BlobStorage API?

Thank you,
Eladio
Oct 25, 2013 at 4:20 PM
Edited Oct 25, 2013 at 4:21 PM
Here is some information about using azcpoy:

http://blogs.msdn.com/b/windowsazurestorage/archive/2012/12/03/azcopy-uploading-downloading-files-for-windows-azure-blobs.aspx

Here is a PowerShell sample:
$storageAccountName = "<WindowsAzureStorageAccountName>"   
$containerName = "<WindowsAzureBlobStorageContainerName>"             
    
### Create the storage account context object
Select-AzureSubscription $subscriptionName
$storageAccountKey = Get-AzureStorageKey $storageAccountName | %{ $_.Primary }
$storageContext = New-AzureStorageContext –StorageAccountName $storageAccountName –StorageAccountKey $storageAccountKey  
    
### Download the output to local computer
Get-AzureStorageBlobContent -Container $ContainerName -Blob example/data/WordCountOutput/part-r-00000 -Context $storageContext -Force
    
### Display the output
cat ./example/data/WordCountOutput/part-r-00000 | findstr "there"


Here is a sample using the Microsoft.WindowsAzure.Storage namespace:
        // Print the MapReduce job output
        Stream stream = new MemoryStream();

        CloudStorageAccount storageAccount = CloudStorageAccount.Parse("DefaultEndpointsProtocol=https;AccountName=" + storageAccountName + ";AccountKey=" + storageAccountKey);
        CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();
        CloudBlobContainer blobContainer = blobClient.GetContainerReference(containerName);
        CloudBlockBlob blockBlob = blobContainer.GetBlockBlobReference("example/data/WordCountOutput/part-r-00000");

        blockBlob.DownloadToStream(stream);
        stream.Position = 0;

        StreamReader reader = new StreamReader(stream);
        Console.WriteLine(reader.ReadToEnd());
Oct 25, 2013 at 11:13 PM
Thank you Mumian for your reply.
Do you think this code is appropiate for the Map methods in a C# M&R Program?

Thank you,
Coordinator
Oct 30, 2013 at 2:21 AM
Yes, you can use Blob storage APIs in C# map reduce program.
Oct 30, 2013 at 5:35 PM
hi Maxluk,

Should I use the blob storage API to read files in MapReduce Jobs?
I'm asking this because seems like using the API goes out-of-the-context of the MapReduce Jobs.
Do I have to set all the credentials, etc. etc.?

Thank you,