 |
|
The Cost-Based Optimizer (CBO)
Oracle Tips by Burleson Consulting |
Oracle’s cost-based optimizer was created to
provide a more sophisticated alternative to rule-based optimization
because Oracle recognized that SQL optimization could be more
effective of the optimizer was aware of details about the data in
the tables and indexes. This data includes:
- Table data
- The number of rows
- The number of physical data blocks
- Index data
- The number of unique values in the index
- The distribution of values within the
index
- The selectivity of the index
- The index clustering factor
While the goal of adding statistical
information was noble, Oracle underestimated the complexity of
creating an optimizer that would always choose the fastest execution
plan. The early releases of the CBO in Oracle7 often made less than
optimal execution decisions, and while the official policy of Oracle
was for customers to use the CBO, Oracle continued to utilize the
RBO for their own Oracle Application products.
The CBO and Statistics
The introduction of the CBO also brought
forth a very important change in SQL theory. Prior to the CBO, the
premise of Oracle was that a single execution plan could be derived
that would always find the optimal execution plan.
When the CBO was introduced, Oracle provided
a new SQL utility called analyze that would compute or
estimate the statistics for an Oracle table or index. Oracle also
recommended that tables and indexes be re-analyzed whenever
important characteristics of the table or index have changed. Hence,
Oracle professionals were quite upset to find that the execution
plans for their SQL changed whenever they re-analyzed their database
objects. By tying the execution plans to the characteristics of the
data, an SQL statement might have a different execution plan every
time new statistics were generated. This made SQL tuning quite
difficult.
Gathering statistics for the CBO
Oracle offers two method for collecting table
and index statistics, the compute and estimate methods. Here is an
example of the syntax:
analyze table
customer compute statistics;
analyze table customer estimate statistics sample 5000 rows;
There has been a great deal of debate about
whether it is better to compute or estimate statistics. Note that
there are documented problems when sampling less than a 25-30%
sample of the rows in a table. This is due to the way Oracle
calculates row counts. Contrary to early documentation, row counts
are done using a average row size verses total occupied blocks and
not a full count. In tables of several million rows, the row counts
can be off by as much as 15% if a sample of less than 25% was used
for the analyze.
The downside to running compute statistics
is that it is very time consuming and resource intensive. The
downside to running estimate statistics is that you need to
be careful to get a statistically valid sample. In experiments using
estimate statistics it has been found that a sample as low as 50
rows will generate a valid value for num_rows in the Remote DBA_tables
view. However, most professional Oracle Remote DBAs will estimate
statistics based on a sample of 5,000 rows. This does not take much
execution time, and generates valid statistics.
The CBO and column histogram
data
The CBO also allows you to generate special
statistics in cases where a data column contains highly skewed data.
The syntax example appears below:
analyze table
customer
estimate statistics
sample 500 rows
for all indexed columns;
Column histogram information is only useful
in cases of low cardinality columns (e.g. bitmapped indexes) where
the data is highly skewed. For example, suppose we have a bitmapped
index on USA state abbreviations. This index will only have 50
distinct values, and we would not need column histograms unless the
data was not evenly distributed.
Let’s take a simple example. Suppose that we
have a state_abbr_idx on our customer table, with customers in all
50 states. The following query is used to fetch the customer list by
state:
select
customer_full_name,
customer_full_address
from
customer
where
customer_full_address.state_abbr = ‘:var1:’;
However, this query should not have a stable
execution plan if the data is skewed. Let’s assume that we have a
disproportional amount of customer in North Carolina, with more than
60% of the rows for customers in North Carolina. To be effective,
the CBO should invoke a full-table scan when the where clause
of the query asks for customers in North Carolina. However, the CBO
should invoke an index range scan is cases where the identical query
asks for customers in Nevada. The use of column histograms will tell
the CBO to consider the distribution of data when formulating the
execution plan.
Also note that the state_abbr is coded by
using a host variable. Normally a query with a host variable will be
located in the library cache as immediately re-usable. However, the
presence of the column histograms will cause the Oracle optimizer to
insert the column values and re-evaluate the query against the
histogram data. This can add significant overhead to this query
because it must be re-parsed every time that it is executed.
Column histograms should only be calculated for
columns with a small number of distinct vales where some column
values represent a disproportional amount of the rows. Histograms
are time consuming to calculate and may cause excessive re-parsing
of the SQL in the library cache. Because all indexed columns on a
table will never be skewed, be careful to NEVER use the all
indexed columns clause when computing histograms.
The CBO and hints
Because of the shortcomings of the CBO,
Oracle began to offer hints that could be placed within an SQL
statement to change the default CBO execution plan. The existence of
hints confirm that the CBO does not always make the best choice of
the execution plan, and we would expect the amount of hints to
decrease as the CBO became more sophisticated. However, up to
Oracle8i, the CBO continued to add new hints, and the list of hints
now contains dozens of hints. When the CBO finally becomes
sophisticated enough to always make the best execution plan, we can
expect hints to become obsolete with the CBO.
Execution plan persistence
Oracle eventually recognized that execution
plans should be static, especially after a SQL tuning professional
has found the optimal execution plan. Until Oracle8i, the SQL tuning
professional has to tune the SQL, locate the SQL source code and add
hints to ensure that the same execution plan was always generated.
With the introduction of Oracle8i, an
exciting new feature was introduced called optimizer plan stability.
Optimizer plan stability allows for the SQL tuning professional to
store an outline for the execution of an SQL statement. This stored
outline has the dual purpose of reducing the parsing time required
to generate the execution plan and to ensure that the same execution
plan is always generated. This topic is so important to SQL tuning
that we have dedicated Chapter 13 to this topic.
This is an excerpt from "Oracle High-Performance SQL Tuning" by
Donald K. Burleson, published by Oracle Press.