Aug 7, 2011

Set-based operation Versus Cursor-based operation

We develop a query to retrieve/process data. We could have our processing operation done in row level(one row after another row) or in the level of the whole data set (the whole set at once). We could use a cursor to process each row at a time (Cursor-based operation). The following examples* could help us understand the difference between set-based operation  and  cursor-based operation.


> SET-based operation

UPDATE s                       -- update the entire set of data at once.
   SET StatusCode = 'ACTIVE', ModDate = dbo.DateTrunc('day', GETDATE())
   FROM dbo.Site s
        INNER JOIN dbo.Contact c WITH (NOLOCK) ON s.SiteNo = c.SiteNo


> Cursor-based operation (row-based/serial) 

DECLARE @ls_SiteNo      CHAR(10)
DECLARE crsModule1 CURSOR LOCAL STATIC FORWARD_ONLY FOR
   SELECT s.SiteNo
   FROM dbo.Site s
   INNER JOIN dbo.Contact c WITH (NOLOCK) ON s.SiteNo = c.SiteNo

OPEN crsModule1
FETCH NEXT FROM crsModule1 INTO @ls_SiteNo

WHILE (@@fetch_status = 0 ) BEGIN
     UPDATE dbo.Site
     SET StatusCode = 'ACTIVE', ModDate = dbo.DateTrunc('day', GETDATE())
     WHERE SiteNo = @ls_SiteNo     -- update each row at a time.              
     FETCH NEXT FROM crsModule1 INTO @ls_SiteNo
END
CLOSE crsModule1
DEALLOCATE crsModule1


1) Database engines are optimized for set-based operation. However, there are some cases that a serial operation is the only or better option. Generally speaking, if we can avoid using a cursor, we would be better off.

2) When we deal with a smaller set of data, the difference between the two operations might be very minimal or we wouldn't even notice any difference in performance. But as the volume of data grows bigger, the performance difference will probably become more obvious.

3) Personally speaking, I believe that a set-based operation would be processed still serially inside of the database engine in the end. However, experts still recommend that we use a set-based operation if possible. It seems to me that it is a better idea to let the database engine take care of the serial data operation. It is probably not a good idea for us to manually figure out how to serially process data because the database engines normally know better than we do about the data operation. Just a thought...


-----------------------------------------------------------------
* Thank to Mr. Gord Gray who kindly provided me with these examples to help me understand on this subject.
 

Data Pivoting - how to create a pivot query

1.  Pivoting data
Itzik Ben-Gan describes, "pivoting data is a technique that rotates data from a state of rows to a state of columns" in his article, 'Pivoting Data' (SQL Server Magazine, January 2011). Why do we ever have to rotate data then?  Data pivoting allows us to summarize(aggregate) a given data set and re-organize(pivot) the structure of the original data set for a better report purposes.

2.  Creating a pivot query   
Let's say we have a table, called 'SalesOrderHeader.'  And we need to summarize the data and horizontally show SUM of TotalDue of each YEAR for all the TerritoryIDs.

 SELECT     TerritoryID
                   , TotalDue
                  , YEAR(OrderDate) OrderYear
  FROM      Sales.SalesOrderHeader






















The report is supposed to summarize the data as following












3. A few things to identify before writing a PIVOT query.
- The data is to be grouped by TerritoryID (Grouping element)
TotalDue column is to be aggregated (Aggregation element)
- The distinct values of YEAR(OrderDate) are used as the column headings of the pivoted result (Pivoted element. To be displayed horizontally.). And we need to know the distinct values of YEAR(OrderDate) in advance in order to write our Pivot query.  i.e) 2005, 2006, 2006, 2007

4. Writing a pivot query in SQL Server
4.1) SQL Server with CASE expression.

 SELECT     TerritoryID,
SUM (CASE WHEN YEAR(OrderDate) = 2005  THEN TotalDue ELSE 0 END) as [2005],
SUM (CASE WHEN YEAR(OrderDate) = 2006  THEN TotalDue ELSE 0 END) as [2006],
SUM (CASE WHEN YEAR(OrderDate) = 2007  THEN TotalDue ELSE 0 END) as [2007],
SUM (CASE WHEN YEAR(OrderDate) = 2008  THEN TotalDue ELSE 0 END) as [2008]
       FROM     Sales.SalesOrderHeader
       GROUP BY    TerritoryID
       ORDER BY    TerritoryID;

4.2) SQL Server with PIVOT operator

WITH BaseTable AS
 (
    SELECT     TerritoryID
                      , TotalDue
                      , YEAR(OrderDate) OrderYear
     FROM      Sales.SalesOrderHeader
  )
  SELECT    TerritoryID
    , [2005], [2006], [2007], [2008]
FROM    BaseTable
PIVOT  ( SUM(TotalDue)  FOR  OrderYear  IN ([2005],[2006],[2007],[2008]))  as PvtResult ;

5. Things to remember
- When using PIVOT operator, creating a base table using CTE is preferred. Include the columns needed only: a grouping column, pivoted column and aggregated column. It is because any column that is not pivoted or aggregated will be used as a grouping element.
-  IN list restricts the rows that are pivoted and supplies the pivoted column names. So if you omit [2007] from the above query, summation on TotalDue for year 2007 won't be done even if there are some data for year 2007.
- IN list is NOT DYNAMIC. We can not use a sub-query to populate this IN list. It should be hard-coded.
  (If the IN list is unknown, we can create a dynamic SQL statement. Dynamic pivoting is beyond the scope of this post. )
---------------------------------------------------
Reference: 
-  Querying Microsoft SQL Server 2012 Training Kit by Itzik Ben-Gan, Dejan Sarka, Ron Talmage     http://technet.microsoft.com
-  http://sqlmag.com/t-sql/create-pivoted-tables-3-steps  "Create Pivoted Tables in 3 Steps" by Kathi Kellenberger

Aug 6, 2011

NCHAR, NVARCHAR, NTEXT / NCLOB - supporting Unicode characters

As more companies deploy their database globally, their database needs to be able to handle Unicode. Unicode enables us to represent all the characters that are expressed in most of human written languages.

The following datatypes allow us to handle unicode characters.
   -  NCHAR, NVARCHAR,  NTEXT, NVARCHAR(max), NCLOB

And the following example shows how we can use these datatypes.


CREATE TABLE  unicodeDataTable
  (  text_id       number           Primary KEY,
     uni_text      nvarchar(20)
   );

INSERT INTO unicodeDataTable(text_id, uni_text) VALUES (1,  N'이동훈' );
INSERT INTO unicodeDataTable(text_id, uni_text) VALUES (2,  N'こんにちは' );

SELECT * FROM  unicodeDataTable;

   text_id       uni_text
   1               이동훈
   2               こんにちは


--------------------------------------------------------- 
NTEXT :  (SQL Server)  NText is going to be deprecated. Microsoft recommends that we use NVARCHAr(max).
NCLOB & CLOB  : (Oracle) store up to 8 to 128 terabytes of character data (11g)