Skip to content

Getting Started with HDInsight Service

2012 December 20
by Brian Mitchell

I know a few other people have written about setting up HDInsight, but I wanted to start off my Hadoop related posts at the beginning and build from there. I’ll do other posts on the positioning of HDInsight in the Microsoft ecosystem, but I was setting up a new cluster today and thought this would be a good time to start blogging about some of the practicalities of using the service. You will want to start out by going to https://www.hadooponazure.com and setting up an account by requesting access. Once you have done that, we can set up a new cluster. During the preview, HDInsight Service will automatically set up a three node cluster for you. Just in case it isn’t patently obvious where to click to get a new cluster started, I provided an arrow for you.

On the next screen, you will need to provide additional information such as the name of your cluster and the username and password for the cluster. The interface does a good job providing you back information on availability of the name of your cluster and whether or not your password meets complexity requirements. The green bar on the right tells me I’m good to go. Click on Request Cluster when you are ready to proceed. It’ll take a few minutes to deploy the cluster. In the meantime, get a cup of coffee.

Once your cluster is up and running, click Go to Cluster to go to the management console.

Now you are ready to start working with HDInsight. Here is what the management console looks like.

Next, I would highly suggest setting up a storage account on http://windowsazure.com. The reason for this is that your HDInsight cluster only lasts for a limited time and when it is destroyed, your data goes with it. You only want to pay for the compute time you are actually using, so this makes sense. We need to get used to storing data outside of our HDInsight cluster. One really obviously place is Windows Azure Storage and HDInsight is set up to take advantage of this highly available storage. In your management console, click on Manage Cluster and then on Set up ASV. You will need your Storage Account Name and Passkey which you can find at the bottom of the page for your storage account on Windows Azure (see below). Copy those items over to the Configure Azure Blob Storage page and click Save Settings.

Now you can access any data that you place in Azure Storage. I’ll come back to this in another post to show how to add data to this storage and access from HDInsight. For now, let’s go back to the Management Console and dive into the Interactive Console. The interactive console allows you to manage HDInsight either using JavaScript or Hive commands. Today, we are going to use Hive quickly to see what is preinstalled. Click on the Hive button at the top and type in:

show tables;

Click Evaluate. Hadoop isn’t near as fast as SQL Server for trivial queries, its designed to handle massively large queries. So learn to be patient with many things Hadoop. Eventually we get back our answer that there is one table preinstalled on Hive, the hivesampletable.

Feel free to browse the table:

select * from hivesampletable;

Or get more information about the table and its structure:


You should be getting the picture that working in Hive is a lot like working with any other database system. In the future, we’ll explore more deeply how to manage data in HDInsight Service and how to integrate it with other Microsoft Business Intelligence Services.

 

No comments yet

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS