Reading SQL PDW Query Plans (Part 1)
Querying A Replicated Table
How are my tables designed? Replicated or Distributed?
What Actions are taking place on the Compute Nodes?
Am I causing the results to get stored on the Control Node before streaming back to the client? Do I need to?
Ideally you should already know which tables in your database are replicated and which are distributed. If you don’t, you can quickly check it out in Nexus, where there is a nifty icon next to the table name. Notice the multiple table icon next to the dimension tables in my copy of AdventureWorksDW2008? These icons indicate that the table exists in whole as a replicated table on each compute node. A replicated table is a table that is stored in its entirety on each Compute node. The single table icon next to the fact tables indicates a distributed table. A distributed table is a table whose rows are dispersed across multiple storage locations. Each compute node has eight distributions.
, case when count(*) > 1
from sys.tables tab
join sys.pdw_table_mappings map
on tab.object_id = map.object_id
group by tab.name
Querying from Replicated Table
Let’s take a simple example of querying from a single replicated table, DimCustomer.
The OnOperation can be used to run a query that does not require data movement. A close look at this line tells us that it is being run on a ComputeNode and gives us the logical query being run on against PDW>
The ReturnOperation which is the last step in any query plan. The ReturnOperation does what you think it does, it sends the results back to the requester.