 |
|
EnterpriseDB: Joining Tables
Oracle Tips by Burleson Consulting
|
The
easiest join to make is a simple join between two tables. Normally,
you will join tables based on keys. We have already discussed
Primary Keys and Foreign Keys. We join on the foreign keys that
relate two tables.
In our
ongoing example, we join the emp table to the dept table by the
foreign key, deptno. Deptno is the primary key of the dept table
and is a foreign key into the emp table.
Let's
refer to a query that we've already seen:
SELECT ename
FROM emp
WHERE deptno IN (
SELECT deptno
FROM dept
WHERE dname = 'RESEARCH');
edb=# SELECT ename
edb-# FROM emp
edb-# WHERE deptno IN (
edb(# SELECT deptno
edb(# FROM dept
edb(# WHERE dname = 'RESEARCH');
ename
-------
SMITH
JONES
SCOTT
ADAMS
FORD
(5
rows)
edb=#
Another way, which is both more obvious and a better practice, to
write this would be to join the emp table to the dept table:
SELECT Employees.ename
FROM emp Employees
JOIN dept Departments
ON Employees.deptno = Departments.deptno
WHERE Departments.dname = 'RESEARCH';
edb=# SELECT ename
edb-# FROM emp Employees
edb-# JOIN dept Departments
edb-# ON Employees.deptno = Departments.deptno
edb-# WHERE dname = 'RESEARCH';
ename
-------
SMITH
JONES
SCOTT
ADAMS
FORD
(5
rows)
edb=#
As you
can see, the two queries return exactly the same information but the
new query is much easier to read and maintain.
The
important thing to note about the above query are the table aliases
and the JOIN and ON keywords.
Always
alias your tables and columns when joining. Always! That is not a
SQL requirement in all cases but it is a best practice. When
someone comes along later to maintain your code (and that someone
may be you), they will thank you for taking a couple of extra
seconds and making those few extra keystrokes.
The
JOIN keyword tells the SQL engine to join with the table that comes
after it. The ON keyword tells the SQL engine what keys are to be
joined. If you omit the ON keyword, you will create a Cartesian
product. A Cartesian product is the product of all the rows in the
first table and all of the rows in the second table. That means
that your output will contain a row for every row in the first table
joined to every row in the second table. As an example, re-run the
query above and remove the ON and WHERE clauses and see how many
rows you get back.
Joining multiple tables (3 or more) is just as easy as joining two.
You need to know which keys you will be using to join and that's
about it.
We
have another table in our sample schema called jobhist. This table
contains a history of our employee's job history. Do a DESC on the
table and see what it looks like:
edb=# desc jobhist
Table "public.jobhist"
Column | Type |
Modifiers
-----------+--------------------------------+-----------
empno | numeric(4,0) | not null
startdate | timestamp(0) without time zone | not null
enddate | timestamp(0) without time zone |
job | character varying(9) |
sal | numeric(7,2) |
comm | numeric(7,2) |
deptno | numeric(2,0) |
chgdesc
| character varying(80) |
Indexes:
"jobhist_pk" PRIMARY KEY,
btree (empno, startdate)
Check
constraints:
"jobhist_date_chk" CHECK (startdate <= enddate)
Foreign-key constraints:
"jobhist_ref_dept_fk"
FOREIGN KEY (deptno)
REFERENCES dept(deptno)
ON DELETE SET NULL
"jobhist_ref_emp_fk" FOREIGN KEY (empno)
REFERENCES emp(empno)
ON DELETE CASCADE
edb=#
The
primary key is the empno and the startdate of that position. Now we
will query the emp, the dept the employee is currently assigned to,
the employee's current job and all of the employee's job history:
SELECT Employees.ename, Departments.dname,
Employees.job,
Job_History.job, Job_History.startdate
FROM emp Employees
JOIN Dept Departments
ON Employees.deptno = Departments.deptno
JOIN jobhist Job_History
ON Employees.empno = Job_History.empno
ORDER BY Employees.ename;
edb=# SELECT Employees.ename, Departments.dname,
Employees.job,
edb-# Job_History.job, Job_History.startdate
edb-# FROM emp Employees
edb-# JOIN Dept Departments
edb-# ON Employees.deptno = Departments.deptno
edb-# JOIN jobhist Job_History
edb-# ON Employees.empno = Job_History.empno
edb-# ORDER BY ename;
ename | dname | job | job |
startdate
--------+------------+-----------+-----------+--------------------
ADAMS | RESEARCH | CLERK | CLERK | 23-MAY-87 00:00:00
ALLEN | SALES | SALESMAN | SALESMAN | 20-FEB-81 00:00:00
BLAKE | SALES | MANAGER | MANAGER | 01-MAY-81 00:00:00
CLARK | ACCOUNTING | MANAGER | MANAGER | 09-JUN-81 00:00:00
FORD
| RESEARCH | ANALYST | ANALYST | 03-DEC-81 00:00:00
JAMES | SALES | CLERK | CLERK | 15-JAN-83 00:00:00
JAMES | SALES | CLERK | CLERK | 03-DEC-81 00:00:00
JONES | RESEARCH | MANAGER | MANAGER | 02-APR-81 00:00:00
KING | ACCOUNTING | PRESIDENT | PRESIDENT | 17-NOV-81 00:00:00
MARTIN | SALES | SALESMAN | SALESMAN | 28-SEP-81 00:00:00
MILLER | ACCOUNTING | CLERK | CLERK | 23-JAN-82 00:00:00
SCOTT | RESEARCH | ANALYST | CLERK | 13-APR-88 00:00:00
SCOTT | RESEARCH | ANALYST | CLERK | 19-APR-87 00:00:00
SCOTT | RESEARCH | ANALYST | ANALYST | 05-MAY-90 00:00:00
SMITH | RESEARCH | CLERK | CLERK | 17-DEC-80 00:00:00
TURNER | SALES | SALESMAN | SALESMAN | 08-SEP-81 00:00:00
WARD | SALES | SALESMAN | SALESMAN | 22-FEB-81 00:00:00
(17
rows)
edb=#
Very
simple stuff, isn't it. Just a JOIN and an ON and you can combine
tables.
If you
notice in the emp table, there is a mgr column. That column is the
empno of the employee's manager. I'm going to restrict our query to
the employee SCOTT and I am going to add the mgr by adding the
emp_clone table.
SELECT Employees.ename, Departments.dname,
Employees.job,
Job_History.job, Job_History.startdate, Clones.ename mgr_name
FROM emp Employees
JOIN Dept Departments
ON Employees.deptno = Departments.deptno
JOIN jobhist Job_History
ON Employees.empno = Job_History.empno
JOIN emp_clone Clones
ON Employees.mgr = Clones.empno
WHERE Employees.ename = 'SCOTT'
ORDER BY Employees.ename;
edb=# SELECT Employees.ename, Departments.dname,
Employees.job,
edb-# Job_History.job, Job_History.startdate, Clones.ename
mgr_name
edb-# FROM emp Employees
edb-# JOIN Dept Departments
edb-# ON Employees.deptno = Departments.deptno
edb-# JOIN jobhist Job_History
edb-# ON Employees.empno = Job_History.empno
edb-# JOIN emp_clone Clones
edb-# ON Employees.mgr = Clones.empno
edb-# WHERE Employees.ename = 'SCOTT'
edb-# ORDER BY Employees.ename;
ename
| dname | job | job | startdate | mgr_name
-------+----------+---------+---------+--------------------+--------
SCOTT | RESEARCH | ANALYST | CLERK | 13-APR-88 00:00:00 | JONES
SCOTT | RESEARCH | ANALYST | ANALYST | 05-MAY-90 00:00:00 | JONES
SCOTT | RESEARCH | ANALYST | CLERK | 19-APR-87 00:00:00 | JONES
(3
rows)
edb=#
From
this we can see that JONES was SCOTT's manager for the three jobs
that SCOTT has held. But how would we get the same information if
we had not created the emp_clone table? We would use what is called
a SELF-JOIN. That is we would join emp to itself.
Take a
look at the modified query:
SELECT Employees.ename, Departments.dname,
Employees.job,
Job_History.job, Job_History.startdate, Clones.ename mgr_name
FROM emp Employees
JOIN Dept Departments
ON Employees.deptno = Departments.deptno
JOIN jobhist Job_History
ON Employees.empno = Job_History.empno
JOIN emp Clones
ON
Employees.mgr = Clones.empno
WHERE Employees.ename = 'SCOTT'
ORDER BY Employees.ename;
edb=# SELECT Employees.ename, Departments.dname,
Employees.job,
edb-# Job_History.job, Job_History.startdate, Clones.ename
mgr_name
edb-# FROM emp Employees
edb-# JOIN Dept Departments
edb-# ON Employees.deptno = Departments.deptno
edb-# JOIN jobhist Job_History
edb-# ON Employees.empno = Job_History.empno
edb-# JOIN emp Clones
edb-# ON Employees.mgr = Clones.empno
edb-# WHERE Employees.ename = 'SCOTT'
edb-# ORDER BY Employees.ename;
ename | dname | job | job | startdate
| mgr_name
-------+----------+---------+---------+--------------------+----------
SCOTT | RESEARCH | ANALYST | CLERK | 13-APR-88 00:00:00 | JONES
SCOTT | RESEARCH | ANALYST | ANALYST | 05-MAY-90 00:00:00 | JONES
SCOTT | RESEARCH | ANALYST | CLERK | 19-APR-87 00:00:00 | JONES
(3
rows)
edb=#
The
only change I made was changing the table emp_clone to the table emp.
Emp is now joined to dept, jobhist and to itself.
If we
run the above query, but change the where clause from ename =
'SCOTT' to ename = 'KING', what happens? We get 0 rows. The reason
that we get 0 rows is because King does not have a manager defined.
When
we define a join, as JOIN/ON, we are really defining INNER JOIN/ON.
An INNER JOIN tells the SQL engine to only return rows where there
is a matching row in each table being joined. Because we joined emp
to itself on mgr, and the mgr column on the KING record is null, we
got 0 rows.
The
way to tell the SQL engine to return rows even if the record is
empty in one of the tables, is to use an OUTER JOIN. Outer join
uses the same syntax as an inner join with the addition of the OUTER
keyword and an equivalency direction (RIGHT or LEFT).
Here I
am re-writing our earlier query but I am changing the join to emp
(Clones) to an outer join:
SELECT Employees.ename, Departments.dname,
Employees.job,
Job_History.job, Job_History.startdate, Clones.ename mgr_name
FROM emp Employees
JOIN Dept Departments
ON Employees.deptno = Departments.deptno
JOIN jobhist Job_History
ON Employees.empno = Job_History.empno
LEFT OUTER JOIN emp_clone Clones
ON Employees.mgr = Clones.empno
WHERE Employees.ename = 'KING'
ORDER BY Employees.ename;
edb=# SELECT Employees.ename, Departments.dname,
Employees.job,
edb-# Job_History.job, Job_History.startdate, Clones.ename
mgr_name
edb-# FROM emp Employees
edb-# JOIN Dept Departments
edb-# ON Employees.deptno = Departments.deptno
edb-# JOIN jobhist Job_History
edb-# ON Employees.empno = Job_History.empno
edb-# LEFT OUTER JOIN emp_clone Clones
edb-# ON Employees.mgr = Clones.empno
edb-# WHERE Employees.ename = 'KING'
edb-# ORDER BY Employees.ename;
ename | dname | job | job |
startdate | mgr_name
-------+------------+-----------+-----------+--------------------+----------
KING | ACCOUNTING | PRESIDENT | PRESIDENT | 17-NOV-81 00:00:00 |
(1
row)
edb=#
Notice
that I used LEFT OUTER JOIN. LEFT OUTER JOIN told the SQL engine
that I wanted to join to emp with the alias Clones and that if the
LEFT table did not have a matching record (in this case Clones.empno),
then return the columns in the record as a NULL.
In the
output, mgr_name is NULL, as expected. If you changed LEFT OUTER
JOIN to RIGHT OUTER JOIN, what happens? Play with the query and
see.
This
is an excerpt from the book "EnterpriseDB:
The Definitive Reference" by Rampant TechPress.