 |
|
The Semi-Join
Oracle Tips by Burleson Consulting |
A
semi-join is an operation where the EXISTS clause is used with a
subquery. It is called a semi-join because even if duplicate rows
are returned in the subquery, only one set of matching values in the
outer query is returned. In the case of the EXISTS clause, the
subquery is executed, but even if the subquery returned multiple
rows, the semi-join will not duplicate the value referenced in the
outer query, as show in Figure 16-5.
Figure 5: A semi-join
This is an example of a correlated subquery
because the EXISTS references the outer table in the where
clause. Here we display the names of all departments who have
employees who earned more than a $5,000 commission. This is a
semi-join because even though the subquery may return many rows for
each employee with more than $5,000 in commissions, only one
department name will be displayed.
select /*+
first_rows */
dname
from
dept
where
exists
(select
*
from
emp
where
dept.deptno = emp.deptno
and
emp.comm > 5000
)
;
When you examine this query, it is clear that even
though there may be many employees in each department with $5,000
commissions, only one match is required to display the department
name. Of course, if we have an index on the comm column, then
a semi-join would not be necessary, because the query could filter
all emp rows where comm > 5000 and then probe into the
dept table for the names of employees in those departments.
If there is no index on the comm column in emp, then a
semi-join can be used to improve query performance.
As you may remember from previous chapters,
the CBO always invokes a NESTED LOOPS table access method for all
queries with EXISTS clauses when an index is available on both
columns (comm and deptno in this example). However,
since we have no index on comm, we see the following
semi-join plan.
OPERATION
----------------------------------------------------------------------
OPTIONS
OBJECT_NAME
POSITION
------------------------------ ----------------------------
----------
SELECT STATEMENT
1
FILTER
1
TABLE ACCESS
FULL
DEPT
1
TABLE ACCESS
BY INDEX ROWID
EMP
2
INDEX
RANGE SCAN
EMP_DEPTNO
1
This makes sense for a semi-join because the
query only has the deptno index available, and the entire
join set must be filtered for those rows where comm is
greater than $5,000. Next, let’s see how the inefficient nature of
semi-joins makes them candidates for replacement with equi-joins
using the select distinct clause.
Alternative Representation for Semi-Joins
If you examined the query carefully, you may
notice that it could be rewritten as a standard equi-join. The trick
is to eliminate the duplicate rows in the department name, and we
can do this in a standard join by specifying the select distinct
clause. As you may remember from previous chapters, the distinct
clause invokes a sort to eliminate the duplicate rows from the
result set. Here is the rewritten query with an equi-join:
select
distinct /*+ first_rows */
dname
from
dept,
emp
where
dept.deptno = emp.deptno
and
emp.comm > 5000
;
Now, look carefully at the new execution plan
where there is no index on the comm column. Here wee see a
hash join, with full-table scans being executed against both tables.
The presence of the full-table scans indicates that this may not be
the fastest execution plan if these are large tables.
OPERATION
----------------------------------------------------------------------
OPTIONS
OBJECT_NAME
POSITION
------------------------------ ----------------------------
----------
SELECT STATEMENT
6
SORT
UNIQUE
1
HASH JOIN
1
TABLE ACCESS
FULL
EMP
1
TABLE ACCESS
FULL
DEPT
2
This execution plan should raise immediate suspicion
because the query is serviced by performing a full-table scan
against the emp table, and then performing another full-table
scan on the dept table.
With an index on the comm column, we
see a very different execution plan because full-table scans are no
longer required.
OPERATION
----------------------------------------------------------------------
OPTIONS
OBJECT_NAME
POSITION
------------------------------ ----------------------------
----------
SELECT STATEMENT
6
SORT
UNIQUE
1
HASH JOIN
1
TABLE ACCESS
BY INDEX ROWID
EMP
1
INDEX
RANGE SCAN
EMP_COMM
1
TABLE ACCESS
FULL
DEPT
2
Here you see the HASH JOIN table access
method, with the department table acting as the driving table.
Tip: You can usually avoid a semi-join by
ensuring that a column index exists on all relevant predicates in
the where clauses of both the outer and inner queries. Also,
you can often replace a subquery with the exists clause,
replacing it with a standard equi-join using the select distinct
clause.
Next, let’s move away from the theoretical
join types and take a look at Oracle implementation of table
joining.
This is an excerpt from "Oracle High-Performance SQL Tuning" by
Donald K. Burleson, published by Oracle Press.