Five things that make PDW different
1. Appliance Model
SQL Server 2008 R2 Parallel Data Warehouse will come shipped with hardware configured and software installed within shock proof pallets. Once the racks arrive at your data center, its simply a matter of providing power, Ethernet connections, and then making a couple configuration choices for connectivity. You can literally be loading data onto the appliance the same day you do the above configurations.
Additional benefits of the Appliance model come later as you need support. There is one SKU for the appliance. This means whenever you call for support, regardless if it is with the hardware or software, there is one number to call and one SKU to give, regardless of how many nodes you have in your appliance.
2. Index Light Design
It is recommended that you use indexes sparingly in SQL PDW. In a truly analytical environment where the DBA cannot predict the queries that will be run against the database ahead of time, indexes add little value. Because it is so difficult to predict the queries being run in a data warehouse, traditional data warehouses tend to over index. The result is that space used by indexes is often greater than the data itself, load batch times are significantly higher due to index maintenance, and both logical and physical fragmentation is introduced and must be managed by the DBA. On the other hand, SQL PDW is a data scan engine. It is designed to place data orderly and efficiently on disk so that you can most effectively return the data sequentially. Fragmentation is reduced by bulk loading and ordering data into staging tables first and then inserting into the destination table.
3. All Loads and Queries are Automatically Highly Parallel
We understand that all queries are automatically highly parallel in SQL PDW, but what about loading data? In traditional Symmetric Multi Processing (SMP) SQL Server, the insert operator of a SQL Server statement is executed serially. When you load a distributed table into SQL PDW, you are loading into eight physical tables on each compute node. For a single data rack appliance, this means that when you load a distributed table, you are actually loading 80 physical tables in the compute rack. This means that instead of a single thread to load that data, we are now using 80 threads to load the data into SQL PDW. Even better, each one of those threads is assigned to its own processor through Soft Numa. Additionally, each one of those statements is guaranteed a minimum amount of memory using resource governor on each compute node. Finally, each one of those physical tables resides on its own LUN, reducing IO contention. This is all out of the box, default configuration advantages of SQL PDW.
4. Hub and Spoke Architecture
By using the Parallel Data Export feature of PDW, you can easily implement a hub-and-spoke architecture. The hub-and-spoke PDW allows your enterprise to design a centralized enterprise data warehouse while maintaining the benefits of independent data marts. An important aspect of this architecture is the ability to separate the management of data from the user workloads, if necessary. Check out this article for more information.
5. Reduced Administration
A one data rack SQL PDW appliance consists of seventeen servers. Making that easy to manage is one of the top goals for SQL PDW. The first step is having an Administration Console that provides you the health of each server in the appliance from both a hardware and software perspective in one view. You never have to go digging around on any compute node to find out if an important component is healthy. From the administration console you can additionally view and manage sessions, queries, locking, backups and restores, and data loads. Finally, the admin console gives you the ability to watch the performance of the appliance by monitoring various Performance Monitor values of each node in the appliance.