Aug 7, 2011

Set-based operation Versus Cursor-based operation

We develop a query to retrieve/process data. We could have our processing operation done in row level(one row after another row) or in the level of the whole data set (the whole set at once). We could use a cursor to process each row at a time (Cursor-based operation). The following examples* could help us understand the difference between set-based operation  and  cursor-based operation.


> SET-based operation

UPDATE s                       -- update the entire set of data at once.
   SET StatusCode = 'ACTIVE', ModDate = dbo.DateTrunc('day', GETDATE())
   FROM dbo.Site s
        INNER JOIN dbo.Contact c WITH (NOLOCK) ON s.SiteNo = c.SiteNo


> Cursor-based operation (row-based/serial) 

DECLARE @ls_SiteNo      CHAR(10)
DECLARE crsModule1 CURSOR LOCAL STATIC FORWARD_ONLY FOR
   SELECT s.SiteNo
   FROM dbo.Site s
   INNER JOIN dbo.Contact c WITH (NOLOCK) ON s.SiteNo = c.SiteNo

OPEN crsModule1
FETCH NEXT FROM crsModule1 INTO @ls_SiteNo

WHILE (@@fetch_status = 0 ) BEGIN
     UPDATE dbo.Site
     SET StatusCode = 'ACTIVE', ModDate = dbo.DateTrunc('day', GETDATE())
     WHERE SiteNo = @ls_SiteNo     -- update each row at a time.              
     FETCH NEXT FROM crsModule1 INTO @ls_SiteNo
END
CLOSE crsModule1
DEALLOCATE crsModule1


1) Database engines are optimized for set-based operation. However, there are some cases that a serial operation is the only or better option. Generally speaking, if we can avoid using a cursor, we would be better off.

2) When we deal with a smaller set of data, the difference between the two operations might be very minimal or we wouldn't even notice any difference in performance. But as the volume of data grows bigger, the performance difference will probably become more obvious.

3) Personally speaking, I believe that a set-based operation would be processed still serially inside of the database engine in the end. However, experts still recommend that we use a set-based operation if possible. It seems to me that it is a better idea to let the database engine take care of the serial data operation. It is probably not a good idea for us to manually figure out how to serially process data because the database engines normally know better than we do about the data operation. Just a thought...


-----------------------------------------------------------------
* Thank to Mr. Gord Gray who kindly provided me with these examples to help me understand on this subject.
 

No comments:

Post a Comment