Apr 19, 2011

Materialized views increases query performance - Why and How ?

In data warehouse environments, we usually run long-running queries. Those queries might take hours. So it is very important for DBAs and developers to make efforts to reduce query response time(query performance).

How can we improve query response time?  Use indexes,  partitions the data,  and implement parallel execution.

Are these all the options that we have?

The queries that we run in data warehouse are involved in very intensive computation, data aggregation, summarization and joinings. Using indexes, partitioning tables and parallel execution will definitely help reduce the query response time. But how much?  The queries might still run slow if they are involved with expensive table joinings and intensive data aggregation and summarization.  How can we overcome this problem?

What if we store the result of such a long-running query in a separate table(summary table)? We can just query against the summary table to retrieve the data without running the original, long-running query again. This is great!!!  We can probably use "Create Table ... AS Select...(CTAS)" to create the summary table.

This solution seems to a great idea in the first place. However, there are a couple of drawbacks with this approach.  (1) What if the data in the original tables(base tables) changes? The summary table will become out-of-date. We can't run a query against the summary table and expect to get a result that reflects the current state of our data. We need to manually refresh the summary table: probably drop the table and re-run the CTAS statement. Do we have to do this whenever the origianl(base) tables are modified? Um...I am not sure about that.

(2) Another inconvinience with this approach is that the end users have to know the existence and the name of the summary tables in order to use them. They also have to modify their applications so that the applications can now use the summary tables.  As a result, this approach leads to a conlusion that using summary tables is not the perfect solution for improving query response time.

What can we do then? In order to resolve this problem, Oracle has introduced something called MV (Materialized Views). DBAs identify expensive SQL queries and create MVs for those queries. Now when the users run the same expensive queries, Oracle server internally rewrites the user's original query to a new query so that it can make use of the MV that the DBAs have created before. The re-written query just accesses the MV to retrieve the data, instead of accessing the original(base) tables and going through the whole expensive summarization and joinings. As a result, the users get their query result back much faster. This whole query rewrite process happens behind the scene. The user doesn't even have to know the existence of the MV. 

They would probably feel that their query(original) now runs much faster than it used to. They would be HAPPY now.

Apr 17, 2011

Transportable tablespace - Fastest way to move a large volume of data

In data warehouse environments, we move data from one system to another system. There are a few ways to transport data. We can use flat files(most common and simple), distributed queries or transportable tablespaces.  Today, I like to talk about how to use a transportable tablesapce.

Using a transportable tablespace.
- This is the fastest way to move large volumes of data between two Oracle databases. 
- Limitations: the source and target systems must have the same O/S and must use the same character set.

< Example >
Let's say we have a table, called 'sales' in our database1(SH schema). And we want to move all the sales records since year 2000 to database2(SH schema). The amount of records that we have to move is very large. We decided to use a transportable tablespace.


> In the source system

1. create a tablespace that will hold a table that we will create in the next step.
 SQL>    CREATE TABLESPACE  temp_sales_tbls
               DATAFILE  ' /tmp/temp_sales.dbf '
               SIZE 30 M REUSE Autoextend ON;

2. create a table in the new tablespace. And we will store the sales records that we need to transport in this new table.
 SQL >    CREATE TABLE  temp_sales_table   NOLOGGING
                TABLESPACE temp_sales_tbls
                AS
                select * from sh.sales where time_id  > '31-DEC-1999' ;

3. Alter the tablespace to READ-ONLY  to prevent any changes made to it.
 SQL >   ALTER TABLESPACE temp_sales_tbls READ  ONLY;

4. Create a directory object that points to the location where the dump file for the Data Pump export are written.
  SQL >   CREATE DIRECTORY dum_dir AS '/temp/dump/' ;

5. Run a Data Pump untility, expdp, to greate a dump file that contains the meta data for the tablespace, temp_sales_tbls.
  $>   EXPDP sh/sh DIRECTORY=dump_dir  DUMPFILE=sales.dmp 
         TRANSPORT_TABLESPACES=temp_sales_tbls


> In the target system

6. Now we copy the dump file(sales.dmp) and the datafile(temp_sales.dbf ) to the target system.
  - copy the dump file to where a new directory object will point.
     (we are going to create this new directory in step 7)
     i.e)  /temp/dump_import_loc/
  - copy the datafile to where the datafiles of database 2 are located.
      i.e)  /home/oracle/oradata/

7. Create a directory object pointing to the location
  SQL> CREATE DIRECTORY dump_import_dir  AS '/temp/dump_import_loc/' ;

8. Run a Data Pump utility, impdp, to make the datafile(temp_sales.dbf) accessible to database2.is as
$> IMPDP  sh/sh  DIRECTORY=dump_import_dir 
      DUMPFILE=sales.dmp  LOGFILE=imp.log
      TRANSPORT_DATAFILES='/home/oracle/oradata/temp_sales.dbf '

9. We can verify our works so far. 
 SQL>  connect sh/sh
 SQL>  Select count(*) from temp_sales_table;
 SQL>  select * from temp_sales_table;

10. Finally, don't forget that we have 'sales' table in the target database. Our original goal is to move the sales records since year 2000 from thee source database to the target database. We need to inset the imported sales records into the 'Sales' table in the target database.  In the target database,
  SQL>  connect sh/sh
  SQL >  INSERT  /*+ APPEND */  INTO sales
               SELECT * FROM  temp_sales_table;

--------------------------------------
*** This is a simple example. There might be more processes involved in a real-life situation.

Apr 3, 2011

Enhancing database security - Application Contexts & Virtual Private Database

< Hypothesis >
(1) You are a DBA for your company's database.
(2) Your company's database has a schema called HR where you basically store all the HR-related data.
(3) There is a table called employees in HR schema. This table contains name, emp_id, position, salary andcommission for all the employees in your company.
(4)  All the employees can access employees table. (Granted 'Select' on the table) 
(5) However, for security and data-confidentiality reason, you (DBA) need to find a way to limit the data access of each employee so that they can only see their own record. In other words, you should be able to control the data access of each user in row-level.

Issue   
Oracle database provides several security measures.  Password profile, authentication methods, system/object privileges and roles. However, none of these measure provides an data access control in row level. Object privileges can control data access on object level at most.
Impact
Because all the employees are granted "SELECT" object privilege on employees table. They can basically view anybody's records including salary and commission, which we don't want.

Solution
We can make use of an application context and Virtual Private Database to overcome this issue.
Virtual Private Database(VPD) combines an application context and fine-grained access control. The following steps show how application context and VPD are implemented. However the steps are simplified and doesn't show all in details.

1. Application Context: When we create an application context, a memory area is allocated to the context. An application context is like an associative array. We add attributes to the context and set values of the attributes. In regard to VPD, we can store users' current session information in an application context. Then based on the user's session information stored in the context, we can control a fine-grained data access control.

(1) Make sure HR user has all privileges required to create an application context.
- We can just let HR user to control the data access in HR schema.
- We can grant roles and privileges to HR as followings.
       >  GRANT create session  TO HR WITH ADMIN OPTION;
       >  GRANT create any context, drop any context, create role, alter any role, drop any role,

           create table, create procedure, create any trigger, administer database trigger,
            create any directory, alter profile, create profile, drop profile, audit system, alter system,
            grant any object privilege TO HR;
       >   GRANT execute on DBMS_SESSION to HR;

(2) Creating an application context (as HR user)
      SQL> CREATE CONTEXT context_name USING package_name ; 

(3) Create a PL/SQL package that set the context attributes. 
- This package should contains a procedure that sets the context attributes.
       ...
       procedure set_emp_id
       IS 
          v_emp_id NUmber;
      Begin
         SELECT employee_id INTO v_emp_id FROM hr.employees
         WHERE email = SYS_CONTEXT('userenv', 'session_user') ;
          DBMS_Session.SET_CONTEXT('context_name','attribute_name','v_emp_id) ;
      End;
      ...
(4) Create a trigger that calls the PL/SQL package to set the context attributes when a user logs in.
          Create or Replace TRIGGER hr_context_logon
          After Logon on database
          Begin
                Package_Name.procedure_name;
          End;

2. Virtual Private Database - fine-grained access control implementation. Create a security policy function thatreturns a predicate(a Where condition).
(1) GRANT execute on dbms_rls to HR. Grant exempt access policy to HR.
(2) Create a policy function as following (as HR user)
...
Funiction limit_emp( object_schema IN varchar2, object_name varchar2)
  return varchar2
Is
Begin
   return 'emp_id = SYS_CONTEXT('context_name', 'attribute_name')';
END;
...
(3) Creating the policy (as HR user)
  SQL>    dbms_rls.add_policy( object_schema=>'HR', object_name=>'orders', 
                       policy_name=> 'policy_name', function_schema=>'HR', 
                       policy_function=> 'package_name.fuction_name', statement_types => 'select');

All set...  Now when an employee run a select statement against EMPLOYEES table, she/he should only see his/her own record only. The security function returns a predicate such as "emp_id =  xxx" This xxx is the employee id value that was stored in the application context. This application context was created when the employee logged in.