Querying HDInsight Service from SQL Server
There are a few ways to get data into SQL Server from HDInsight Service. You could use Sqoop to push data from HDInsight to SQL Server, use SSIS to pull data from HDInsight to SQL Server, or set up a linked server on SQL Server to Query Hive. In this post, I’m going to explore the latter option. There are a few steps you need to take to get the configuration set up correctly. At a high level you need to:
- Configure HDInsight to allow ODBC connections.
- Setup ODBC Connection on your SQL Server.
- Setup Linked Server on the SQL Server.
- Query to your heart’s desire.
Configure HDInsight to allow ODBC connections.
Notice the little lock on the open ports tile. This indicates that our ODBC connection is locked down. Click on the Open Ports tile.
Toggle the ODBC Server so that the Status is turned on to Open. That’s it, we’ve configured the cluster to accept ODBC connections.
Setup the ODBC Connection on your SQL Server.
From your SQL Server, click on the downloads tile. (On a separate note, notice the lock on the Open Ports tile?)
Click on the appropriate link for your SQL Server. Since this is your SQL Server, I’m assuming the appropriate link for you is the 64 bit version. If not, shame on you – It’s not 2004 anymore. Follow the directions to get it installed.
On your SQL Server, click start and type in ODBC. This will bring up the Data Sources dialog. Go to the System DSN tab and click Add. Configure it with a Data Source Name, Your Host Information, and a Username. I called mine HiveOnAzure so that I can differentiate the @server, @srvproduct, and @datasrc in the linked server command below.
Setup the Linked Server
Go to SQL Server Management Studio and execute the following query. I replaced the contents of User ID and Password with stars. Use your HDInsight username and password for your cluster here.
EXEC master.dbo.sp_addlinkedserver @server = N'HiveDW', @srvproduct=N'HIVE', @provider=N'MSDASQL', @datasrc=N'HiveOnAzure',
@provstr=N'Provider=MSDASQL.1;Persist Security Info=True;User ID=*****; Password=******;'
Now for the good stuff. If you don’t have any data setup in Hive yet, you can always query the sample table:
select * from openquery (hivedw, 'select * from hivesampletable')
I had a table created already, here is my result set:
Now if I need to load a well-known table on SQL from HDInsight, I can do it without having to use Sqoop or access my cluster: