Skip to content

Windows Azure HDInsight Thoughts

2013 November 14
by Brian Mitchell

It’s been a bit over a week since the general availability of HDInsight Service.   I’ve been kicking the tires and thought I would share some thoughts.  Right off the bat I can tell you that PowerShell integration with HDInsight is going to be a huge hit!  The ease of use and the responsiveness of the PowerShell environment is absolutely awesome.

What is HDInsight?

HDInsight is the 100% Apache compatible Hadoop version that runs on Microsoft technology in Windows Azure.

Why use HDInsight Service?

First and foremost, there is a deep integration between the Microsoft BI tools that your users are already used to and HDInsight Service.  Second, the PowerShell extensibility makes creating, managing, and shutting down a HDInsight Service cluster so easy a caveman can do it.   Third, the development experience with HDInsight means that your developers can reuse their existing .NET skill set in addition to using Java.

Microsoft BI Integration

Need to do some post map-reduce mashing up of your data?  Bring it into Microsoft Excel with Power Query (ETL for the BI Masses).  In two steps, you’ll be choosing the data from HDInsight that you want to bring into excel.  This just works.

image

Here are the instructions on connecting Excel to Windows Azure HDInsight with Power Query.

PowerShell Management

After you install and configure PowerShell for HDInsight, you can manage your Windows Azure HDInsight environment from your desktop.   This means that you can configure an HDInsight cluster, submit Hive and Pig queries, and extract the data to your BI environment all from the comfort of your corporate environment.  This means that you can use the tools you use today to manage schedules and handle your operations.   The PowerShell toolset surprised me with its ease of use.  Here is an example of configuring a cluster.

image

Awesome feedback in PowerShell about the state of your commands:

image

 

Richer Development Experience

Want to have more control over your environment and use Visual Studio at the same time?  Check out this tutorial Submit Hive Jobs using HDInsight .NET SDK.  Below is a snippet of what I have going on in my VS environment with a MapReduce Job being submitted.  I’ll do some additional posts about some of the pros and cons of the .NET development experience soon.

 

image

No comments yet

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS