Case Study: Using SQL PDW to load 300GB of raw data per day.
Another SQL Server Parallel Data Warehouse case study has been released. Please take a look at it and let me know what you think.
When Microsoft wanted to add clickstream data to the company’s enterprise data warehouse (EDW), it found that only about one-sixteenth of the information could be loaded into the EDW each day. To provide a richer data set, the Microsoft Information Technology (Microsoft IT) group deployed a clickstream data warehouse by using Microsoft® SQL Server® 2008 R2 Parallel Data Warehouse, which supports multiple instances of SQL Server running on separate compute nodes. The company can now load complete clickstream data and has found that queries are processed 30 times faster.
What’s really exciting about this white paper is to see an organization that had been struggling with the amount of data they needed to manage on an SMP SQL Server being able to harness it and get more fidelity from it using PDW. Prior to PDW, this team was only sampling 1/16 of the data available to them. Additionally, they are now able to store 90 days of that complete set of data instead of 7 days of partial data. The result? Many queries are running 30 times faster than before. This is not inconsistent with what I’ve seen at other customers.
Skeptics might think that Microsoft went out and purchased the largest appliance they could in order to get the performance numbers they needed for reference. Please not that this is a single data rack, eight node Dell PDW appliance. These kind of performance gains are attainable and within budget of many organizations. Additionally, this is on a pre-RTM reference architecture. Meaning that most likely performance would be better if they were deployed on what reference architectures are currently available to customers.
On a personal note, for those of you who have been reading the blog might notice that Figure 1 in the whitepaper looks familiar. I spent a surprising amount of time creating that diagram, I’m glad to see its getting some reuse out there.