Database Notes

Sep 11, 2014

Database Read-Consistency

It is important for a database to ensure that its users read committed data only in most of cases. Each database has its own way of ensuring the read-consistency.

(1) SQL Server
There is something called "Transaction isolation level" in SQL Server. The transaction isolation level is set to "Read Committed" by default. This default setting enforces that data readers read committed data only.

This default behavior could yield a drawback. If a user is updating records and the update transaction takes a long time to complete, other users won't be able to read the same records right away. The other users' read request will hang until the update transaction completes.

To reduce this type of blocking and waiting, SQL Server gives us another option of transaction isolation level: Read Committed Snapshot(RCSI). It uses tempdb to store the original versions of changed data. With this type of transaction isolation level, data readers can still read committed records while another user changes the same records.

(2) Oracle
There is something called, UNDO, in Oracle database. Undo information is a record of how to undo a change. This undo information is stored in a special tablespace,"UNDO tablespaces." Undo information provides a read-consistency to data readers even if the data that they read is currently being changed by another user (reading the data in the original state).

* A similarity can be observed between SQL Server's RCSI and Oracle's UNDO.

Sep 10, 2014

SQL Server Transaction - ACID Properties and Modes

A transaction is a logical unit of works in a database.

(1) A relational database transaction has ACID properties.

Atomicity : All the changes in a transaction succeed or none of them succeed. There is no partially committed transaction. Either all changes committed or rolled back.
Consistency: All transactions are performed in a way that the database can stay in a consistent state.
Isolation : One transaction occurs in isolation from other transactions, and so the change made in the transaction is not visible to other transaction until it the transaction is committed.
Durability : Databases ensure that all changes made by committed transactions are never lost by using logs that keep track of all the committed changes.

(2) Transaction Modes in SQL Servers

Autocommit mode - a single DML and DDL statement is one transaction, and it will be automatically committed or rolled back (default mode in SQL Server)
Implicit transaction mode - SQL Sever automatically starts a transaction when we issue a DML or DDL statement. We must commit or roll back explicitly to end the transaction (default ANSI/ISO mode/default in Oracle)
Explicit transaction mode - Begin a transaction with "BEGIN TRANSACTION" and end the transaction by commit or roll back.

May 13, 2014

Mutiple Groupings in a query and the use of Grouping_ID function

(1) Single Grouping
Most of the time, we group a data set by a certain column(or a combination of columns) and summarize(count/sum/avg/etc) the data for each of the groups. The following query first groups the records in Sales.Orders table using customer IDs, and then it counts for the number of order entries for each customer ID.
select custid, count(*) as NumOfOrders
from sales.orders
group by custid ;

* Note that in this query we see one occurrence of grouping - grouping the rows by customer id only (Single Grouping)

(2) Multiple Groupings
There are times when you want to summarize the data in a few different levels of grouping within the same query. For instance, you might want to count number of orders for the combination of each customer id & order year as well as customer id as shown in the following query result: Group by customerid, YEAR(orderdate) as well as Group By customerid

>> We can achieve this by running two separate GROUP By queries and combine the result sets.
select Custid
, CAST(YEAR(Orderdate) as VARCHAR(20)) as OrderYear
,count(*) as NumOfOrders
from sales.orders
group by Custid, YEAR(orderdate)
UNION ALL
select Custid
,'Cust ID Total' as OrderYear
, count(*) NumOfOrders
from sales.orders
group by Custid
ORDER BY Custid, orderYear

>> Or, we can use GROUPING SETS sub-clause and implement the two different groupings within one query.
select Custid
, (YEAR(Orderdate) as OrderYear
  , count(*) as NumOfOrders
from sales.orders
group by grouping sets( (custid)
,(custid, YEAR(orderdate) ) )

Using GROUPING SETS is preferred because the code is a lot simpler and, more importantly, the query yields a better performance. With the option of UNION and two separate Group By groupings, SQL Server fully scans Sales.Orders table twice while it only scans the table once with Grouping Sets option.

(3) GROUPING SETS vs CUBE vs ROLLUP
- Grouping Sets : we manually list up all groupings separated by commas
select Custid
, (YEAR(Orderdate) as OrderYear
, count(*) as NumOfOrders
from sales.orders
group by grouping sets( (custid)       --> group by Custid
, (custid, YEAR(orderdate) ) -->group by custid, YEAR(orderdate)
                                                 , ( )    --> Grouping all rows (the entire table)
                                                  )

- Cube: generates all possible combinations of groupings.
select   Custid
, (YEAR(Orderdate) as OrderYear
, count(*) as NumOfOrders
from sales.orders
group by CUBE ( custid, YEAR(orderdate) )
        --> group by Custid
          --> group by YEAR(orderdate)
        --> group by custid, YEAR(orderdate)
        --> Grouping all rows (the entire table)

- Rollup : Use this option when there is a hierarchy formed by the input elements.
   select shipcountry
, shipregion
   , shipcity
, count(*) as NumOfOrders
from sales.orders
group by Rollup( shipcountry, shipregion, shipcity)
--> group by shipcountry, shipregion, shipcity
--> group by shipcountry, shipregion    (rolled up to regional level)
--> group by shipcountry    (rolled up to country level)
--> group by the entire table (rolled up to level of grouping entire table)

(4) The use of Grouping_ID( )
We can use this function to identify the level of grouping done for each of the returned rows. Grouping_ID() function in the following query will return an integer representing the level of grouping for each row.

   select    shipcountry
, shipregion
     , shipcity
    , count(*) as NumOfOrders
, GROUPING_ID(shipcountry, shipregion, shipcity) as Grp_ID
from sales.orders
group by Rollup( shipcountry, shipregion, shipcity)

The following spreadsheet should give you an idea of how the function calculates its returned integer. Each element gives 0 (zero) if it is part of the grouping. If not, it gives a number. It gives a different number depending on the position of it inside of the function input parenthesis.

If the function returns 5, we know that the row should be grouped by Shipregion.
(5 = 4 for element3 + 0 for element2 + 1 for element1 ). We can list up all the possible integers that can be returned by the function and what each number represents.

(5) Now, we can actually use Group_ID( ) function in such a way that the query result is more readable to the report users. The following query use ROLLUP, and we understand that all the possible integers that can be returned by GROUP_ID( ) in the following query are 0, 1, 3, 7.

select    CASE WHEN GROUPING_ID(shipcountry, shipregion, shipcity) = 7 THEN 'GRAND TOTAL'
                          ELSE shipcountry END   as ShipCountry
           , CASE WHEN GROUPING_ID(shipcountry, shipregion, shipcity) = 3 THEN 'Country TOTAL'
                         ELSE ISNULL(shipregion,'-') END   as shipregion
           , CASE WHEN GROUPING_ID(shipcountry, shipregion, shipcity) = 1 THEN 'Region TOTAL'
                         ELSE ISNULL(shipcity,'-')   END   as shipcity
           , COUNT(*) as NumOfOrders
    from sales.orders
    group by Rollup( shipcountry, shipregion, shipcity) ;

The query will yield a result as shown below.

Apr 12, 2014

APPLY operator: a correlated subquery and JOIN blended together.

APPLY operator was first introduced with SQL Server 2005. And here is my own summary about this operator.

Example code

Select S.Supplierid, S.CompanyName as supplier, A.*
     From Production.Suppliers s
             CROSS APPLY ( Select productid, productname, unitprice
                         from Production.Products p
                                               where p.supplierid = s.supplierid
                           order by unitprice, productid
                                               offset 0 rows fetch first 2 rows only ) as A

APPLY operator
(1) It applies a table expression in the right side to each row from a table expression in the left side.
      (correspondingly color-coded in the example code)
(2) And usually a table expression in the right side references to an element from the left table. In other words, the right table expression is correlated to the left table expression.
(3) So APPLY operator seems to have a mixed concept of a correlated subquery and JOIN.
        APPLY operator
     - evaluates the right table expression for each row of the left table and yields a result set
     - and then, joins the evaluated result set back to the row in the left table.
(4) Two different types: Cross Apply and Outer Apply

Re-explained in the context of the provided example code
The left table contains supplier records, and each record has a supplierid.
The right table contains product records.

(1) The Apply operator takes each supplierid from the left table and evaluates the right table for that supplierid, which yields a evaluated result set. The operator now JOIN the result set back to the record of the supplierid in the left table.
(2) The above process is repeated for each supplierid(each row) of the left table(cursor-based operation).
(3) When the whole operation is completed, the result of the example query will show the two products with the lowest unit prices for each suppliers. See the query result below.
(4) **IF a supplier doesn't have any product record in the product table, that supplier will be omitted from the query result. It is because the example query uses CROSS Apply. OUTER Apply will, however, include such a supplier with NULLs in the columns from the right table

** CROSS APPLY vs. OUTER APPLY is similar to INNER JOIN vs. LEFT OUTER JOIN.
* Note : If the right table doesn't reference any element in the left table, the query result will be exactly same as that of CROSS JOIN between the two tables(Cartesian product).

Result of the example query (order by supplierid)

APPLY operator challenged.
Before I got a good grasp of APPLY operator, I was just wondering if I could find a different way to generate the same query result without using APPLY operator. And I tried and came up with the following query. It yields the same result. To be honest, the following query did make more sense to me before I become comfortable with APPLY operator. But I now see the benefits of using APPLY operator, so I will probably use the operator when it is necessary.

Select   S.supplierid, S.companyName as supplier, P.productid, p.productname, p.unitprice
    From   Production.suppliers s
           INNER JOIN ( select ROW_NUMBER() over ( partition by supplierid
                                                                                       order by unitprice, productid) as rownum
                                                 ,supplierid, productid, productname, unitprice
                                            from Production.Products ) P   ON s.supplierid = p.supplierid
    where ISNULL(p.rownum,0) <=2

APPLY w/ Inline Table-valued Function
Apply operator also allows us to use an inline table-valued function as a table expression in the right side. So if we put the right table expression into a table-valued function, we could make our CROSS APPLY query a lot simpler.

(1) Create a table-valued function

   IF OBJECT_ID(N'Production.', N'IF') IS NOT NULL
      DROP FUNCTION Production.GetTop2Product;
   GO

   CREATE FUNCTION Production.GetTop2Product(@supplierid AS INT)
     RETURNS TABLE
   AS
    RETURN

   Select productid, productname, unitprice
       from Production.Products p
       where p.supplierid = @supplierid
       order by unitprice, productid
           offset 0 rows fetch first 2 rows only
   GO

(2) Re-write the query

Select S.Supplierid, S.CompanyName as supplier, A.*
     From Production.Suppliers s
             CROSS APPLY Production.GetTop2Product(s.Supplierid) as A

-------------------------------------------------------------------------------------------------------------
Reference: Querying Microsoft SQL Server 2012 Training Kit by Itzik Ben-Gan, Dejan Sarka, Ron Talmage

Dec 16, 2013

Date and Time Functions in SQL Server and Oracle

A list of commonly used date and time functions in SQL Server and Oracle.

Dec 10, 2013

Surrogate keys - system generated primary keys

This post should be a quick reference to how to generate surrogate keys in SQL Server.

(1)   IDENTITY (column property)
: allows the database to generate keys for the column of a numeric types such as INT, BIGINT, SMALLINT, TINYINT.

   CREATE TABLE   guests
       ( GuestId      int    IDENTITY (100, 1)           -- starting from 100 and increment by 1
        , Firstname    nvarchar (200)
         ,Lastname    nvarchar (200)
         ,Email          nvarchar (100) )

   INSERT INTO guests(Firstname, Lastname, Email)
        VALUES('Hoon', 'Lee', 'hoonerLee@hoon.com');

   SELECT * FROM guests;

   Studentid    Firstname    Lastname    Email
   100             Hoon          Lee   hoonerLee@hoon.com

(2) Sequence (database object)
: an independent object that can be shared by all the tables in the entire database. Numeric types.

   Create SEQUENCE dbo.GuestID
       AS BIGINT
       START WITH    1000
       INCREMENT BY   1 ;
   GO

   DECLARE @guestID as INT;
   SELECT @guestID = NEXT VALUE FOR dbo.GuestID;

   SELECT @guestID as GuestID;     ->   10001
   SELECT @guestID as GuestID;     ->   10002
   ...
   SELECT @guestID as GuestID;     ->   10011

(3) Non-sequential GUIDs (Global Unique Identifiers)
:   For UNIQUEIDENTIFIER type of column. Use NEWID() function to get a new GUID in INSERT statement or create a UNIQUEIDENTIFIER column with NEWID() as DEFAULT constraint.

CREATE TABLE students
( studentID    UNIQUEIDENTIFIER NOT NULL
, Name         VARCHAR(30)   );

INSERT students (studentid, name) VALUES ( NEWID() , 'Hooner');

OR

CREATE TABLE students
( studentID    UNIQUEIDENTIFIER DEFAULT   NEWID()
, Name         VARCHAR(30)   );

INSERT students ( name) VALUES ( 'Hooner');

SELECT * FROM students;
studentID                                                                          Name
A5971324-30BF-4781-B2B2-DA8C7B36C18F             Hooner
4F79B592-3CCF-4C11-B85F-DCFFB2BD84D7           Hooner
F4D94A6B-9D5A-4D3C-A590-B91F3D462E2F   Hooner

(4) Sequential GUIDs
:   For UNIQUEIDENTIFIER type of column. Use NEWSEQUENTIALID() function with DEFAULT constraints of table columns to get a new GUID.

** The database generates a GUID that is greater than any GUID previously generated by the function since Windows was started. Whenever Windows restarts, the GUID starts again from a lower range, but it is still globally unique. Becuase GUIDs are generated sequentially, one can easily guess the next new GUID. This can become a privacy/security concern. NEWSEQUENTIALID is a wrapper over the Windows UuidCreateSequential function.

CREATE TABLE students
( studentID      UNIQUEIDENTIFIER DEFAULT     NEWSEQUENTIALID()
, Name             VARCHAR(30)   )

INSERT students ( name) VALUES ( 'Hooner')

select * from students

studentID                                                                    Name
AD27E74A-1862-E311-9B46-0023AE0AD326       Hooner
AE27E74A-1862-E311-9B46-0023AE0AD326       Hooner
AF27E74A-1862-E311-9B46-0023AE0AD326       Hooner
B027E74A-1862-E311-9B46-0023AE0AD326        Hooner
B127E74A-1862-E311-9B46-0023AE0AD326        Hooner
8B48F850-1862-E311-9B46-0023AE0AD326         Hooner

------------------
Reference: Querying Microsoft SQL Server 2012 Training Kit by Itzik Ben-Gan, Dejan Sarka, Ron Talmage
http://technet.microsoft.com

Mar 31, 2013

CHAR vs VARCHAR in terms of performance

CHAR, NCHAR
- fixed type: the system always allocates the indicated size in memory.
- UPDATEs don't need data shifting or physical row expansion.
- If frequent UPDATES operation is anticipated for a column, CHAR/NCHAR types is preferred.

VARCHAR, NVARHCAR
- variable type: the system allocates variable size in memory depending on size of the value.
- Less storage is used
- Thus, the less there is for a query to read.
- Faster query performance.