 |
|
Cost-Based Table Joins
Oracle Tips by Burleson Consulting |
If we have a query that joins multiple tables
together, we want a look at the wealth of hint options available to
us for changing the method by which the join takes place within
Oracle. As you will remember from earlier chapters, Oracle provides
three basic methods for joining tables together, the nested loop
join (use_nl), the sort merge join (use_merge), and
the hash join (use_hash). In addition to these basic join
techniques, we can also add parallelism to improve table join speeds
for sort merge queries.
One of the shortcomings of the cost-based
optimizer prior to Oracle8i was that when more than four
tables were joined, the cost-based optimizer almost always invoked a
full-table scan on one of the tables. In it is still possible to see
these suboptimal execution plans with the CBO in Oracle8i,
depending upon the complexity of the table joins that you are
performing.
As a general rule, nested loop joins and hash
joins should always use indexes on all of the tables, and a
full-table scan should never be invoked in order to service the
table join, except in the case of a sort merge join. When tuning
cost-based SQL statements that performs table joins, the following
steps are commonly executed:
- Search the existing execution plan for
full-table scans. If you see that the execution plan invokes a
full-table scan against one of the tables, one of the first things
you might want to try is to use an index hint or a rule
hint in order to force all of the tables to use indexes.
- Try timing the execution of the query using
different join techniques. For example, the use_hash hint
may be used in cases where we are performing a not in
subquery and we have enough RAM available to perform a hash join
against the specified tables. If the query retrieves the majority
of blocks in the table as determined by the Boolean conditions in
the where clause, we might consider invoking a use_merge
hint combined with parallel hint in order to improve the
speed of the query.
Once the optimal execution plan has been verified by
using the SQL*Plus set timing on command, we make the change
permanent by creating a stored outline for the SQL statement. The
complete procedure for doing this is fully described in Chapter 13.
Next, let's take a look at tuning cost-based
SQL statements that involve subqueries.
Tuning Cost-Based SQL Statements with Subqueries
When we see a SQL statement that specifies a
subquery, we first need to carefully check the where clause
and determine if the subquery is a noncorrelated subquery or a
correlated subquery. A correlated subquery is a query whereby
the key in the subquery is correlated (using the = operator) with a
column that is selected in the outer query. On the other hand, a
noncorrelated subquery is a query where the subquery executes
independently of the outer query, passing a result set to the outer
query at the end of its execution. Noncorrelated subqueries are
commonly seen when using the IN, NOT IN, EXISTS, and NOT
EXISTS SQL clauses.
TIP: The use_hash hint was originally
designed for equi-joins, such as NOT IN joins, anti-joins,
and joins with equality conditions (such as where cust.cust_nbr =
order.cust_nbr). You should always try the use_hash hint
when tuning these types of queries.
Rewriting Subqueries into Standard Joins
To understand sub-queries, let’s examine
three equivalent SQL queries that produce the same result with
different syntax. In many cases, sub-queries can be structured to
use either of the subquery types, and they can sometimes be replaced
by a standard join. For example, consider a query to display all
students who received an “A” in any class. This can be written as a
standard join, a noncorrelated subquery, or a correlated subquery.
The standard join will always run faster than a subquery, and
individual SQL statements should be carefully evaluated to see if
the subquery can be replaced with a standard join.
A Standard Join
select
*
from
student,
registration
where
student.student_id = registration.student_id
and
registration.grade = 'A';
A Non-correlated Subquery
select
*
from
student
where
student_id =
(select student_id
from registration
where
grade = 'A'
);
A Correlated Subquery
select
*
from
student
where
0 <
(select count(*)
from registration
where
grade = 'A'
and
student_id = student.student_id
);
Here we see the same query result, specified
in three ways. This ability to specify the same query in many
different forms is very important to those tuning SQL statements.
Whenever possible, you want to replace subqueries with a standard
join, and you must be familiar with all of the possible ways to
manually rewrite a query to make the execution plan more efficient.
This is an excerpt from "Oracle High-Performance SQL Tuning" by
Donald K. Burleson, published by Oracle Press.