Environment variable not set: HADOOP_HOME

Apr 22, 2013 at 11:54 PM
I keep getting error:
"The environment is not suitable:

Environment variable not set: HADOOP_HOME

Environment variable not set: HADOOP_HOME"

while trying to execute against HDInsight hosted on Azure.
Followed configuration guide from: https://hadoopsdk.codeplex.com/wikipage?title=Running%20jobs%20on%20Azure%20HDInsight%20service&referringTitle=Map%2fReduce

Any ideas?

Also, I noticed that you can't run your app from path that has blanks (eg \Visual Studio 2012\Projects) because it can't find MRRunner. I checked the details and it was looking for \Visual%20%Studio%202012\Projects path, which is not valid.
Apr 23, 2013 at 2:17 AM
It's a bug in MapReduce NuGet package and we are working on fixing it. In the meantime here is workaround.

What happens is it can't find the MRLib folder that contains mrrunner and other files that support streaming command. MRLib is automatically added to your project by MapReduce NuGet package, but you need to change settings on the files in this folder in your solution to make sure this folder is copied to the output folder for your app. Set "Copy To Output Directory" property on all files in the MRLib folder in your solution to "Copy Always". That will ensure that your app can find those files and submit the job to Hadoop cluster.

There is another problem that it will still fail to locate files if the app is launched from the directory with spaces in the path. We'll address that issue as well. For now run it from folder without spaces.

To track these problems I created issues on the issue list:
https://hadoopsdk.codeplex.com/workitem/14
https://hadoopsdk.codeplex.com/workitem/15
Apr 23, 2013 at 10:50 AM
I did this already, but it didn't solve the problem.
In EnvironmentUtils.CheckHadoopEnvironment(), which is called from the ExecuteCore method in StreamingJobExecutorBase, there is a check for the environment variables that do not exists locally (HADOOP_HOME for example) when running on Azure HDInsight.

Also, I found an error in WebHcatMapReduceStreamingExecutor.Execute. When trying to parse exitValue with:
queueResultReader.Value<int>("exitValue");

if job fails on cluster, exitCode might be empty (got such case) so method above crashes.