In data warehouse environments, we usually run long-running queries. Those queries might take hours. So it is very important for DBAs and developers to make efforts to reduce query response time(query performance).
How can we improve query response time? Use indexes, partitions the data, and implement parallel execution.
Are these all the options that we have?
The queries that we run in data warehouse are involved in very intensive computation, data aggregation, summarization and joinings. Using indexes, partitioning tables and parallel execution will definitely help reduce the query response time. But how much? The queries might still run slow if they are involved with expensive table joinings and intensive data aggregation and summarization. How can we overcome this problem?
What if we store the result of such a long-running query in a separate table(summary table)? We can just query against the summary table to retrieve the data without running the original, long-running query again. This is great!!! We can probably use "Create Table ... AS Select...(CTAS)" to create the summary table.
This solution seems to a great idea in the first place. However, there are a couple of drawbacks with this approach. (1) What if the data in the original tables(base tables) changes? The summary table will become out-of-date. We can't run a query against the summary table and expect to get a result that reflects the current state of our data. We need to manually refresh the summary table: probably drop the table and re-run the CTAS statement. Do we have to do this whenever the origianl(base) tables are modified? Um...I am not sure about that.
(2) Another inconvinience with this approach is that the end users have to know the existence and the name of the summary tables in order to use them. They also have to modify their applications so that the applications can now use the summary tables. As a result, this approach leads to a conlusion that using summary tables is not the perfect solution for improving query response time.
What can we do then? In order to resolve this problem, Oracle has introduced something called MV (Materialized Views). DBAs identify expensive SQL queries and create MVs for those queries. Now when the users run the same expensive queries, Oracle server internally rewrites the user's original query to a new query so that it can make use of the MV that the DBAs have created before. The re-written query just accesses the MV to retrieve the data, instead of accessing the original(base) tables and going through the whole expensive summarization and joinings. As a result, the users get their query result back much faster. This whole query rewrite process happens behind the scene. The user doesn't even have to know the existence of the MV.
They would probably feel that their query(original) now runs much faster than it used to. They would be HAPPY now.
good info!
ReplyDeletethanks for sharing 0_0