 |
|
Tuning Oracle SQL Sorting
Oracle Tips by Burleson Consulting |
As a small but very important component of
SQL syntax, sorting is a frequently overlooked aspect of Oracle
tuning. In general, an Oracle database will automatically perform
sorting operations on row data as requested by a create index
or a SQL ORDER BY or GROUP BY statement. In general, Oracle sorting
occurs under the following circumstances:
-
SQL using the ORDER BY clause
-
SQL using the GROUP BY clause
-
When an index is created
-
When a MERGE SORT is invoked by the SQL
optimizer because inadequate indexes exist for a table join
At the time a session is established with
Oracle, a private sort area is allocated in RAM memory for use by
the session for sorting. If the connection is via a dedicated
connection, a Program Global Area (PGA) is allocated according to
the sort_area_size init.ora parameter. For connections
via the multithreaded server, sort space is allocated in the
large_pool. Unfortunately, the amount of memory used in sorting
must be the same for all sessions, and it is not possible to add
additional sort areas for tasks that require large sort operations.
Therefore, the designer must strike a balance between allocating
enough sort area to avoid disk sorts for the large sorting tasks,
and not overwhelming the server RAM memory. We must also remember
that the extra sort area will be allocated and not used by tasks
that do not require intensive sorting. Of course, sorts that cannot
fit into the sort_area_size will be paged out into the TEMP
tablespaces for a disk sort. Disk sorts are about 14,000 times
slower than memory sorts.
As I noted, the size of the private sort area
is determined by the sort_area_size init.ora
parameter. The size for each individual sort is specified by the
sort_area_retained_size init.ora parameter. Whenever a sort
cannot be completed within the assigned space, a disk sort is
invoked using the temporary tablespace for the Oracle instance.
Disk sorts are expensive for several reasons.
First, they are extremely slow when compared to an in-memory sort.
Also, a disk sort consumes resources in the temporary tablespace.
Oracle must also allocate buffer pool blocks to hold the blocks in
the temporary tablespace. In-memory sorts are always preferable to
disk sorts, and disk sorts will surely slow down an individual task
as well as impact concurrent tasks on the Oracle instance. Also,
excessive disk sorting will cause a high value for free buffer
waits, paging other tasks’ data blocks out of the buffer.
The following STATSPACK query uses the
stats$sysstat table. From this table we can get an accurate
picture of memory and disk sorts.
rpt_sorts_alert.sql
set pages
9999;
column mydate heading 'Yr. Mo Dy Hr.' format a16
column sorts_memory format 999,999,999
column sorts_disk format 999,999,999
column ratio format .99999
select
to_char(snap_time,'yyyy-mm-dd HH24') mydate,
newmem.value-oldmem.value sorts_memory,
newdsk.value-olddsk.value sorts_disk,
((newdsk.value-olddsk.value)/(newmem.value-oldmem.value)) ratio
from
perfstat.stats$sysstat oldmem,
perfstat.stats$sysstat newmem,
perfstat.stats$sysstat newdsk,
perfstat.stats$sysstat olddsk,
perfstat.stats$snapshot sn
where
newdsk.snap_id = sn.snap_id
and
olddsk.snap_id = sn.snap_id-1
and
newmem.snap_id = sn.snap_id
and
oldmem.snap_id = sn.snap_id-1
and
oldmem.name = 'sorts (memory)'
and
newmem.name = 'sorts (memory)'
and
olddsk.name = 'sorts (disk)'
and
newdsk.name = 'sorts (disk)'
and
newmem.value-oldmem.value > 0
and
newdsk.value-olddsk.value > 100
;
Here is the output from the script. Here, you can
clearly see the number of memory sorts and disk sorts, and the ratio
of disk to memory sorts.
Yr. Mo Dy
Hr. SORTS_MEMORY SORTS_DISK RATIO
---------------- ------------ ------------ -------
2000-12-20 12 13,166 166 .01261
2000-12-20 16 25,694 223 .00868
2000-12-21 10 99,183 215 .00217
2000-12-21 15 13,662 130 .00952
2000-12-21 16 17,004 192 .01129
2000-12-22 10 18,900 141 .00746
2000-12-22 11 19,487 131 .00672
2000-12-26 12 12,502 147 .01176
2000-12-27 13 20,338 118 .00580
2000-12-27 18 11,032 119 .01079
2000-12-28 16 16,514 205 .01241
2000-12-29 10 17,327 242 .01397
2000-12-29 16 50,874 167 .00328
2001-01-02 08 15,574 108 .00693
2001-01-02 10 39,052 136 .00348
2001-01-03 11 13,193 153 .01160
2001-01-03 13 19,901 104 .00523
2001-01-03 15 19,929 130 .00652
This report can be changed to send an alert when the
number of disk sorts exceeds a predefined threshold, and we can also
modify it to plot average sorts by hour of the day and day of the
week. The script that follows computes average sorts, ordered by
hour of the day:
rpt_avg_sorts_hr.sql
set pages
9999;
column sorts_memory format 999,999,999
column sorts_disk format 999,999,999
column ratio format .99999
select
to_char(snap_time,'HH24'),
avg(newmem.value-oldmem.value) sorts_memory,
avg(newdsk.value-olddsk.value) sorts_disk
from
perfstat.stats$sysstat oldmem,
perfstat.stats$sysstat newmem,
perfstat.stats$sysstat newdsk,
perfstat.stats$sysstat olddsk,
perfstat.stats$snapshot sn
where
newdsk.snap_id = sn.snap_id
and
olddsk.snap_id = sn.snap_id-1
and
newmem.snap_id = sn.snap_id
and
oldmem.snap_id = sn.snap_id-1
and
oldmem.name = 'sorts (memory)'
and
newmem.name = 'sorts (memory)'
and
olddsk.name = 'sorts (disk)'
and
newdsk.name = 'sorts (disk)'
and
newmem.value-oldmem.value > 0
group by
to_char(snap_time,'HH24')
;
Here is the output from the script. We can now take
this data and create a graph in a spreadsheet.
TO
SORTS_MEMORY SORTS_DISK
-- ------------ ------------
00 18,855 11
01 19,546 15
02 10,128 5
03 6,503 8
04 10,410 4
05 8,920 5
06 8,302 7
07 9,124 27
08 13,492 71
09 19,449 55
10 19,812 106
11 17,332 78
12 20,566 76
13 17,130 46
14 19,071 61
15 19,494 68
16 20,701 79
17 19,478 44
18 23,364 29
19 13,626 20
20 11,937 17
21 8,467 7
22 8,432 10
23 11,587 10
Figure 5-2 shows the plot from the output. Here you
see a typical increase in sort activity during the online period of
the day. Sorts rise about 8:00
a.m. and then go down
after 6:00 p.m.
Figure 2: Average
memory sorts by hour of the day
Now, let’s run the script to compute the
averages by the day of the week.
rpt_avg_sorts_dy.sql
set pages
9999;
column sorts_memory format 999,999,999
column sorts_disk format 999,999,999
column ratio format .99999
select
to_char(snap_time,'day') DAY,
avg(newmem.value-oldmem.value) sorts_memory,
avg(newdsk.value-olddsk.value) sorts_disk
from
perfstat.stats$sysstat oldmem,
perfstat.stats$sysstat newmem,
perfstat.stats$sysstat newdsk,
perfstat.stats$sysstat olddsk,
perfstat.stats$snapshot sn
where
newdsk.snap_id = sn.snap_id
and
olddsk.snap_id = sn.snap_id-1
and
newmem.snap_id = sn.snap_id
and
oldmem.snap_id = sn.snap_id-1
and
oldmem.name = 'sorts (memory)'
and
newmem.name = 'sorts (memory)'
and
olddsk.name = 'sorts (disk)'
and
newdsk.name = 'sorts (disk)'
and
newmem.value-oldmem.value > 0
group by
to_char(snap_time,'day')
;
Again, we will take the result set and plot it in a
chart. This time, let’s plot the disk sorts.
DAY
SORTS_MEMORY SORTS_DISK
--------- ------------ ------------
friday 12,545 54
monday 14,352 29
saturday 12,430 2
sunday 13,807 4
thursday 17,042 47
tuesday 15,172 78
wednesday 14,650 43
Figure 5-3 shows the graph. In this database,
the activity pattern on Tuesday shows a large number of disk sorts,
with another smaller spike on Thursdays. For this database, the Remote DBA
may want to pay careful attention to the TEMP tablespaces on these
days, and perhaps issue a alter tablespace TEMP coalesce; to
create continuous extents in the TEMP tablespace.
Figure 3: Average disk sorts by day of the
week
At the risk of being redundant, I need to
reemphasize that the single most important factor in the performance
of any Oracle database is the minimization of disk I/O. Hence, the
tuning of the Oracle sorting remains one of the most important
considerations in the tuning of any Oracle database.
Now, let’s turn our attention to the Oracle
rollback segments and see how we can use STATSPACK to monitor,
identify, and tune your high impact SQL statements.
This is an excerpt from "Oracle High-Performance SQL Tuning" by
Donald K. Burleson, published by Oracle Press.