 |
|
EnterpriseDB: Aggregation
Oracle Tips by Burleson Consulting
|
Aggregation is combining multiple rows into a single result.
If we counted many rows and returned a single row of the count, that
would be an aggregate.
If we
were to dig into my pocket again, we would see that: my left = (.25
* 2) + (.10 * 2) + (.01 * 3) = 50 + 20 + 3 = 73 cents or $0.73 and
my right = (.25 * 1) + (.10 * 3) + (.05 * 2) = 25 + 30 + 10 = 65
cents or $0.65.
The
sum of left and right = 73 + 65 = 138. The sum of my left
pocket and my right pocket is 138. Let's prove that by running
a query:
SELECT SUM(amount) sum_amount, COUNT(amount)
cnt_amount, 'LEFT' tb_name
FROM LEFT_TABLE
UNION
SELECT SUM(amount) sum_amount, COUNT(amount) cnt_amount, 'RIGHT'
tb_name
FROM RIGHT_TABLE;
edb=# SELECT SUM(amount) sum_amount, COUNT(amount)
cnt_amount, 'LEFT' tb_name
edb-# FROM LEFT_TABLE
edb-# UNION
edb-# SELECT SUM(amount) sum_amount, COUNT(amount) cnt_amount,
'RIGHT' tb_name
edb-# FROM RIGHT_TABLE;
sum_amount | cnt_amount | tb_name
------------+------------+---------
0.65 |
6 | RIGHT
0.73 |
7 | LEFT
(2 rows)
edb=#
That
gives me a row for each table but how do I get a final sum?
There are several ways to do it but I am going to do it this way:
SELECT SUM(sum_amount) sum_amount, SUM(cnt_amount)
cnt_amount, tb_name
FROM (
SELECT SUM(amount) sum_amount, COUNT(amount) cnt_amount,
'LEFT' tb_name
FROM LEFT_TABLE
UNION
SELECT SUM(amount) sum_amount, COUNT(amount) cnt_amount,
'RIGHT' tb_name
FROM RIGHT_TABLE ) as all_records;
If I
run this, I get an error:
ERROR:
column "all_records.tb_name" must appear in the GROUP BY clause or
be used in an aggregate function
The
aggregate functions require an additional keyword if your select
contains any columns not being aggregated. Think of it this
way, if we are trying to get a single row, which tb_name would we
expect to see in our records?
We can
change our query to this:
SELECT SUM(sum_amount) sum_amount, SUM(cnt_amount)
cnt_amount, tb_name
FROM (
SELECT SUM(amount) sum_amount, COUNT(amount) cnt_amount,
'LEFT' tb_name
FROM LEFT_TABLE
UNION
SELECT SUM(amount) sum_amount, COUNT(amount) cnt_amount,
'RIGHT' tb_name
FROM RIGHT_TABLE ) as all_records
GROUP BY tb_name;
edb=# SELECT SUM(sum_amount) sum_amount,
COUNT(cnt_amount) cnt_amount, tb_name
edb-# FROM (
edb(# SELECT SUM(amount) sum_amount,
edb(#
COUNT(amount) cnt_amount,
edb(#
'LEFT' tb_name
edb(# FROM LEFT_TABLE
edb(# UNION
edb(# SELECT SUM(amount) sum_amount,
edb(#
COUNT(amount) cnt_amount,
edb(#
'RIGHT' tb_name
edb(# FROM RIGHT_TABLE ) as
all_records
edb-# GROUP BY tb_name;
sum_amount | cnt_amount | tb_name
------------+------------+---------
0.65 |
6 | RIGHT
0.73 |
7 | LEFT
(2 rows)
edb=#
The
GROUP BY tells the SQL engine to do a break on each change of
tb_name and return a row for each distinct tb_name. When you
do a GROUP BY, you are implicitly doing a sort.
I am
now doing a SUM of the COUNT. The inner query is counting for
us. If I were to count again, I would get 2 as my result.
Not what I'm looking for.
Of
course, even though that works, we're back to where we were with our
first query. If you remove tb_name from the query (from the
select clause) and remove the GROUP BY altogether, the result will
give us what we've been looking for.
SELECT SUM(sum_amount) sum_amount, SUM(cnt_amount)
cnt_amount
FROM (
SELECT SUM(amount) sum_amount, COUNT(amount) cnt_amount,
'LEFT' tb_name
FROM LEFT_TABLE
UNION
SELECT SUM(amount) sum_amount, COUNT(amount) cnt_amount,
'RIGHT' tb_name
FROM RIGHT_TABLE ) as all_records;
edb=# SELECT SUM(sum_amount) sum_amount,
SUM(cnt_amount) cnt_amount
edb-# FROM (
edb(# SELECT SUM(amount) sum_amount,
COUNT(amount) cnt_amount, 'LEFT' tb_name
edb(# FROM LEFT_TABLE
edb(# UNION
edb(# SELECT SUM(amount) sum_amount,
COUNT(amount) cnt_amount, 'RIGHT' tb_name
edb(# FROM RIGHT_TABLE ) as
all_records;
sum_amount | cnt_amount
------------+------------
1.38 |
13
(1 row)
edb=#
Can
you modify the inner query so that you get the same results but you
don't use a UNION?
The
last SELECT keyword that I am going to show you is the HAVING
keyword. HAVING is like the WHERE clause but it is applied to
the GROUP BY instead of to the select.
For
example, if I wanted to do my earlier query but only return the row
that was equal to 7, I could rewrite it this way:
SELECT SUM(sum_amount) sum_amount, SUM(cnt_amount)
cnt_amount, tb_name
FROM (
SELECT SUM(amount) sum_amount, COUNT(amount) cnt_amount,
'LEFT' tb_name
FROM LEFT_TABLE
UNION
SELECT SUM(amount) sum_amount, COUNT(amount) cnt_amount,
'RIGHT' tb_name
FROM RIGHT_TABLE ) as all_records
GROUP BY tb_name
HAVING SUM(cnt_amount) = 7;
edb=# SELECT SUM(sum_amount) sum_amount,
COUNT(cnt_amount) cnt_amount, tb_name
edb-# FROM (
edb(# SELECT SUM(amount) sum_amount,
edb(#
COUNT(amount) cnt_amount,
edb(#
'LEFT' tb_name
edb(# FROM LEFT_TABLE
edb(# UNION
edb(# SELECT SUM(amount) sum_amount,
edb(#
COUNT(amount) cnt_amount,
edb(#
'RIGHT' tb_name
edb(# FROM RIGHT_TABLE ) as
all_records
edb-# GROUP BY tb_name
edb-# HAVING SUM(cnt_amount) = 7;
sum_amount | cnt_amount | tb_name
------------+------------+---------
0.73 |
7 | LEFT
(1 row)
edb=#
The
SQL engine ran the SELECT and FROM to create the results set, it
then aggregated the results using the GROUP BY and then it applied
the HAVING to the final result set. For performance reasons,
if you have an item that you can restrict via the WHERE or via the
HAVING, it is usually a better practice to restrict it in the WHERE
clause.
That's
it for the SELECT statement. Now we'll start manipulating data
and we're going to begin with the INSERT command.
This
is an excerpt from the book "EnterpriseDB:
The Definitive Reference" by Rampant TechPress.