SQL Server PDW Table Storage Basics
Within a SQL Server 2008 R2 Parallel Data Warehouse appliance, Compute Node storage is designed to boost IO during data reads. Disk IO is notorious for being a the place where you find server bottlenecks. SQL PDW is designed to alleviate those problems by greatly increasing the number of spindles answer any particular query. Each compute node has its own storage array that includes ten disks that are set up in RAID 10 pairs. Each RAID 10 pair uses multipath so that the compute node can read from both drives in the RAID pair for improved performance. Four of those RAID groups are assigned for user data storage and the fifth RAID group is set aside for transaction logs. During setup of the appliance, nine LUNS are set up for each compute node. Additionally, there is a hot spare in each array in order to quickly replace any disk that goes down during usage.
A single compute node storage array:
Replicated tables have a file group that includes eight files, spread across the first four RAID Groups. The replicated files all are created the same size and have the same growth rate. This allows us to take advantage of the proportional fill algorithm employed by SQL Server. What this means is that the data in the replicated tables will be spread evenly across all disks in a compute node. By spreading the tables across all disks, we can have all eight hard drive head’s access data in parallel at the same time to speed up operations.
Distributed tables have their rows dispersed across several storage locations. Each storage location is called a distribution. Each compute node has eight distributions which are physically implemented within SQL Server as separate file groups. If an appliance has eight compute nodes, that appliance will have 64 distributions. Each file group is assigned a different LUN within the SAN, effectively distributing the data on that compute node. Upon creation of a distributed table, a column is chosen that a deterministic function is applied against. This function will determine one and only one storage location within the appliance for that data.
You may notice that TempDB doesn’t show up in this discussion. That’s because TempDB is located on the local storage to the compute node. I’ll discuss this design in another post. The thing that I’m still wrapping my mind around is that this describes just one compute node. When we distribute a table, it will have a minimum of eight of these babies and thus every distributed table will be broken up into at least 64 physical pieces spread across 64 LUNS. This is minimum. At some point someone will have a 40 node appliance, thus a distributed table will have 320 distributions across the appliance…now that’s spreading out some disk IO.