Database Notes: 2014

Sep 11, 2014

Database Read-Consistency

It is important for a database to ensure that its users read committed data only in most of cases. Each database has its own way of ensuring the read-consistency.

(1) SQL Server
There is something called "Transaction isolation level" in SQL Server. The transaction isolation level is set to "Read Committed" by default. This default setting enforces that data readers read committed data only.

This default behavior could yield a drawback. If a user is updating records and the update transaction takes a long time to complete, other users won't be able to read the same records right away. The other users' read request will hang until the update transaction completes.

To reduce this type of blocking and waiting, SQL Server gives us another option of transaction isolation level: Read Committed Snapshot(RCSI). It uses tempdb to store the original versions of changed data. With this type of transaction isolation level, data readers can still read committed records while another user changes the same records.

(2) Oracle
There is something called, UNDO, in Oracle database. Undo information is a record of how to undo a change. This undo information is stored in a special tablespace,"UNDO tablespaces." Undo information provides a read-consistency to data readers even if the data that they read is currently being changed by another user (reading the data in the original state).

* A similarity can be observed between SQL Server's RCSI and Oracle's UNDO.

Sep 10, 2014

SQL Server Transaction - ACID Properties and Modes

A transaction is a logical unit of works in a database.

(1) A relational database transaction has ACID properties.

Atomicity : All the changes in a transaction succeed or none of them succeed. There is no partially committed transaction. Either all changes committed or rolled back.
Consistency: All transactions are performed in a way that the database can stay in a consistent state.
Isolation : One transaction occurs in isolation from other transactions, and so the change made in the transaction is not visible to other transaction until it the transaction is committed.
Durability : Databases ensure that all changes made by committed transactions are never lost by using logs that keep track of all the committed changes.

(2) Transaction Modes in SQL Servers

Autocommit mode - a single DML and DDL statement is one transaction, and it will be automatically committed or rolled back (default mode in SQL Server)
Implicit transaction mode - SQL Sever automatically starts a transaction when we issue a DML or DDL statement. We must commit or roll back explicitly to end the transaction (default ANSI/ISO mode/default in Oracle)
Explicit transaction mode - Begin a transaction with "BEGIN TRANSACTION" and end the transaction by commit or roll back.

May 13, 2014

Mutiple Groupings in a query and the use of Grouping_ID function

(1) Single Grouping
Most of the time, we group a data set by a certain column(or a combination of columns) and summarize(count/sum/avg/etc) the data for each of the groups. The following query first groups the records in Sales.Orders table using customer IDs, and then it counts for the number of order entries for each customer ID.
select custid, count(*) as NumOfOrders
from sales.orders
group by custid ;

* Note that in this query we see one occurrence of grouping - grouping the rows by customer id only (Single Grouping)

(2) Multiple Groupings
There are times when you want to summarize the data in a few different levels of grouping within the same query. For instance, you might want to count number of orders for the combination of each customer id & order year as well as customer id as shown in the following query result: Group by customerid, YEAR(orderdate) as well as Group By customerid

>> We can achieve this by running two separate GROUP By queries and combine the result sets.
select Custid
, CAST(YEAR(Orderdate) as VARCHAR(20)) as OrderYear
,count(*) as NumOfOrders
from sales.orders
group by Custid, YEAR(orderdate)
UNION ALL
select Custid
,'Cust ID Total' as OrderYear
, count(*) NumOfOrders
from sales.orders
group by Custid
ORDER BY Custid, orderYear

>> Or, we can use GROUPING SETS sub-clause and implement the two different groupings within one query.
select Custid
, (YEAR(Orderdate) as OrderYear
  , count(*) as NumOfOrders
from sales.orders
group by grouping sets( (custid)
,(custid, YEAR(orderdate) ) )

Using GROUPING SETS is preferred because the code is a lot simpler and, more importantly, the query yields a better performance. With the option of UNION and two separate Group By groupings, SQL Server fully scans Sales.Orders table twice while it only scans the table once with Grouping Sets option.

(3) GROUPING SETS vs CUBE vs ROLLUP
- Grouping Sets : we manually list up all groupings separated by commas
select Custid
, (YEAR(Orderdate) as OrderYear
, count(*) as NumOfOrders
from sales.orders
group by grouping sets( (custid)       --> group by Custid
, (custid, YEAR(orderdate) ) -->group by custid, YEAR(orderdate)
                                                 , ( )    --> Grouping all rows (the entire table)
                                                  )

- Cube: generates all possible combinations of groupings.
select   Custid
, (YEAR(Orderdate) as OrderYear
, count(*) as NumOfOrders
from sales.orders
group by CUBE ( custid, YEAR(orderdate) )
        --> group by Custid
          --> group by YEAR(orderdate)
        --> group by custid, YEAR(orderdate)
        --> Grouping all rows (the entire table)

- Rollup : Use this option when there is a hierarchy formed by the input elements.
   select shipcountry
, shipregion
   , shipcity
, count(*) as NumOfOrders
from sales.orders
group by Rollup( shipcountry, shipregion, shipcity)
--> group by shipcountry, shipregion, shipcity
--> group by shipcountry, shipregion    (rolled up to regional level)
--> group by shipcountry    (rolled up to country level)
--> group by the entire table (rolled up to level of grouping entire table)

(4) The use of Grouping_ID( )
We can use this function to identify the level of grouping done for each of the returned rows. Grouping_ID() function in the following query will return an integer representing the level of grouping for each row.

   select    shipcountry
, shipregion
     , shipcity
    , count(*) as NumOfOrders
, GROUPING_ID(shipcountry, shipregion, shipcity) as Grp_ID
from sales.orders
group by Rollup( shipcountry, shipregion, shipcity)

The following spreadsheet should give you an idea of how the function calculates its returned integer. Each element gives 0 (zero) if it is part of the grouping. If not, it gives a number. It gives a different number depending on the position of it inside of the function input parenthesis.

If the function returns 5, we know that the row should be grouped by Shipregion.
(5 = 4 for element3 + 0 for element2 + 1 for element1 ). We can list up all the possible integers that can be returned by the function and what each number represents.

(5) Now, we can actually use Group_ID( ) function in such a way that the query result is more readable to the report users. The following query use ROLLUP, and we understand that all the possible integers that can be returned by GROUP_ID( ) in the following query are 0, 1, 3, 7.

select    CASE WHEN GROUPING_ID(shipcountry, shipregion, shipcity) = 7 THEN 'GRAND TOTAL'
                          ELSE shipcountry END   as ShipCountry
           , CASE WHEN GROUPING_ID(shipcountry, shipregion, shipcity) = 3 THEN 'Country TOTAL'
                         ELSE ISNULL(shipregion,'-') END   as shipregion
           , CASE WHEN GROUPING_ID(shipcountry, shipregion, shipcity) = 1 THEN 'Region TOTAL'
                         ELSE ISNULL(shipcity,'-')   END   as shipcity
           , COUNT(*) as NumOfOrders
    from sales.orders
    group by Rollup( shipcountry, shipregion, shipcity) ;

The query will yield a result as shown below.

Apr 12, 2014

APPLY operator: a correlated subquery and JOIN blended together.

APPLY operator was first introduced with SQL Server 2005. And here is my own summary about this operator.

Example code

Select S.Supplierid, S.CompanyName as supplier, A.*
     From Production.Suppliers s
             CROSS APPLY ( Select productid, productname, unitprice
                         from Production.Products p
                                               where p.supplierid = s.supplierid
                           order by unitprice, productid
                                               offset 0 rows fetch first 2 rows only ) as A

APPLY operator
(1) It applies a table expression in the right side to each row from a table expression in the left side.
      (correspondingly color-coded in the example code)
(2) And usually a table expression in the right side references to an element from the left table. In other words, the right table expression is correlated to the left table expression.
(3) So APPLY operator seems to have a mixed concept of a correlated subquery and JOIN.
        APPLY operator
     - evaluates the right table expression for each row of the left table and yields a result set
     - and then, joins the evaluated result set back to the row in the left table.
(4) Two different types: Cross Apply and Outer Apply

Re-explained in the context of the provided example code
The left table contains supplier records, and each record has a supplierid.
The right table contains product records.

(1) The Apply operator takes each supplierid from the left table and evaluates the right table for that supplierid, which yields a evaluated result set. The operator now JOIN the result set back to the record of the supplierid in the left table.
(2) The above process is repeated for each supplierid(each row) of the left table(cursor-based operation).
(3) When the whole operation is completed, the result of the example query will show the two products with the lowest unit prices for each suppliers. See the query result below.
(4) **IF a supplier doesn't have any product record in the product table, that supplier will be omitted from the query result. It is because the example query uses CROSS Apply. OUTER Apply will, however, include such a supplier with NULLs in the columns from the right table

** CROSS APPLY vs. OUTER APPLY is similar to INNER JOIN vs. LEFT OUTER JOIN.
* Note : If the right table doesn't reference any element in the left table, the query result will be exactly same as that of CROSS JOIN between the two tables(Cartesian product).

Result of the example query (order by supplierid)

APPLY operator challenged.
Before I got a good grasp of APPLY operator, I was just wondering if I could find a different way to generate the same query result without using APPLY operator. And I tried and came up with the following query. It yields the same result. To be honest, the following query did make more sense to me before I become comfortable with APPLY operator. But I now see the benefits of using APPLY operator, so I will probably use the operator when it is necessary.

Select   S.supplierid, S.companyName as supplier, P.productid, p.productname, p.unitprice
    From   Production.suppliers s
           INNER JOIN ( select ROW_NUMBER() over ( partition by supplierid
                                                                                       order by unitprice, productid) as rownum
                                                 ,supplierid, productid, productname, unitprice
                                            from Production.Products ) P   ON s.supplierid = p.supplierid
    where ISNULL(p.rownum,0) <=2

APPLY w/ Inline Table-valued Function
Apply operator also allows us to use an inline table-valued function as a table expression in the right side. So if we put the right table expression into a table-valued function, we could make our CROSS APPLY query a lot simpler.

(1) Create a table-valued function

   IF OBJECT_ID(N'Production.', N'IF') IS NOT NULL
      DROP FUNCTION Production.GetTop2Product;
   GO

   CREATE FUNCTION Production.GetTop2Product(@supplierid AS INT)
     RETURNS TABLE
   AS
    RETURN

   Select productid, productname, unitprice
       from Production.Products p
       where p.supplierid = @supplierid
       order by unitprice, productid
           offset 0 rows fetch first 2 rows only
   GO

(2) Re-write the query

Select S.Supplierid, S.CompanyName as supplier, A.*
     From Production.Suppliers s
             CROSS APPLY Production.GetTop2Product(s.Supplierid) as A

-------------------------------------------------------------------------------------------------------------
Reference: Querying Microsoft SQL Server 2012 Training Kit by Itzik Ben-Gan, Dejan Sarka, Ron Talmage