May 13, 2014

Mutiple Groupings in a query and the use of Grouping_ID function

(1) Single Grouping
Most of the time, we group a data set by a certain column(or a combination of columns) and summarize(count/sum/avg/etc) the data for each of the groups. The following query first groups the records in Sales.Orders table using customer IDs, and then it counts for the number of order entries for each customer ID.
     select  custid,  count(*) as NumOfOrders
        from  sales.orders
         group by  custid ;


* Note that in this query we see one occurrence of grouping - grouping the rows by customer id only (Single Grouping)


(2) Multiple Groupings
There are times when you want to summarize the data in a few different levels of grouping within the same query. For instance, you might want to count number of orders for the combination of each customer id & order year as well as customer id as shown in the following query result:  Group by customerid, YEAR(orderdate)  as well as Group By customerid


























>> We can achieve this by running two separate GROUP By queries and combine the result sets.
     select   Custid
              , CAST(YEAR(Orderdate) as VARCHAR(20)) as OrderYear
             ,count(*) as NumOfOrders
        from sales.orders
        group by Custid, YEAR(orderdate)
     UNION ALL
     select  Custid
,'Cust ID Total' as OrderYear
, count(*) NumOfOrders
         from sales.orders
         group by Custid
          ORDER BY Custid, orderYear

>> Or, we can use GROUPING SETS sub-clause and implement the two different groupings within one query.
     select   Custid
             , (YEAR(Orderdate) as OrderYear
              , count(*) as NumOfOrders
from  sales.orders
group by grouping sets(custid
                                                ,(custid, YEAR(orderdate) )  )

Using GROUPING SETS is preferred because the code is a lot simpler and, more importantly, the query yields a better performance. With the option of UNION and two separate Group By groupings, SQL Server fully scans Sales.Orders table twice while it only scans the table once with Grouping Sets option.



(3) GROUPING SETS vs CUBE vs ROLLUP
- Grouping Sets : we manually list up all groupings separated by commas
     select   Custid
              , (YEAR(Orderdate) as OrderYear
               , count(*) as NumOfOrders
from  sales.orders
group by grouping sets(custid)                                   --> group by Custid
                                                , (custid, YEAR(orderdate) )   -->group by custid, YEAR(orderdate)
                                                 , ( )                                              --> Grouping all rows (the entire table)
                                                  )

- Cube: generates all possible combinations of groupings.
     select   Custid
              , (YEAR(Orderdate) as OrderYear
               , count(*) as NumOfOrders
from  sales.orders
group by CUBE ( custid, YEAR(orderdate) )
        --> group by Custid
          --> group by YEAR(orderdate)
        --> group by custid, YEAR(orderdate)
        --> Grouping all rows (the entire table) 
    
- Rollup : Use this option when there is a hierarchy formed by the input elements.
      select   shipcountry
                , shipregion
               , shipcity              
               , count(*) as NumOfOrders 
         from  sales.orders
         group by Rollup( shipcountry, shipregion, shipcity)
        --> group by shipcountry, shipregion, shipcity  
        --> group by shipcountry, shipregion          (rolled up to regional level)
        --> group by shipcountry                            (rolled up to country level) 
        --> group by the entire table                      (rolled up to level of grouping entire table)


(4) The use of Grouping_ID( )
We can use this function to identify the level of grouping done for each of the returned rows. Grouping_ID() function in the following query will return an integer representing the level of grouping for each row.

      select    shipcountry
                  , shipregion
                 , shipcity              
                  , count(*) as NumOfOrders 
                  ,  GROUPING_ID(shipcountry, shipregion, shipcity) as Grp_ID
         from  sales.orders
         group by Rollup( shipcountry, shipregion, shipcity)



The following spreadsheet should give you an idea of how the function calculates its returned integer. Each element gives 0 (zero) if it is part of the grouping. If not, it gives a number. It gives a different number depending on the position of it inside of the function input parenthesis.




 





If the function returns 5, we know that the row should be grouped by Shipregion. 
(5  =  4  for element3 +  0  for element2  +  1  for element1 ). We can list up all the possible integers that can be returned by the function and what each number represents.


(5) Now, we can actually use Group_ID( ) function in such a way that the query result is more readable to the report users. The following query use ROLLUP, and we understand that all the possible integers that can be returned by GROUP_ID( ) in the following query are 0, 1, 3, 7.

select    CASE  WHEN  GROUPING_ID(shipcountry, shipregion, shipcity) = 7 THEN 'GRAND TOTAL' 
                          ELSE shipcountry  END   as ShipCountry
           , CASE  WHEN GROUPING_ID(shipcountry, shipregion, shipcity) = 3 THEN 'Country TOTAL' 

                         ELSE ISNULL(shipregion,'-') END   as shipregion
           , CASE WHEN  GROUPING_ID(shipcountry, shipregion, shipcity) = 1 THEN 'Region TOTAL' 

                         ELSE ISNULL(shipcity,'-')   END   as shipcity
           , COUNT(*)  as NumOfOrders
    from  sales.orders
    group by Rollup( shipcountry, shipregion, shipcity) ;


The query will yield a result as shown below.