 |
|
Disk Contention
Oracle Tips by Burleson Consulting
|
Once DBWR has been tuned, the Remote DBA needs to
look at disk contention. Disk contention happens when one or more
users attempt to read the same disk at the same time, or in some
cases, access a different disk through the same controller path at the
same time. Spreading Oracle-related files across several platters or
sets of platters--the more the better--prevents this. The new RAID
options don’t relieve the Remote DBA of file placement concerns. You should
be sure that the RAID volumes are properly set. I had one system where
a system administrator set up multiple RAID5 volumes using two disks
for each volume (a hint: the 5 is a meaningful number for RAID5).
The report in Source 13.17 can be used to
monitor relative-fill I/O efficiency--essentially, how many reads are
being performed per request as a percent. The more times the process
has to access the same datafile to get the same information, the less
efficient the datafile is. This could be caused by co-location of
indexes and tables, a poorly ordered table that is scanned frequently,
or having temporary or undo segments in with data or index segments.
SOURCE 13.17 File efficiency report.
REM
REM NAME :FILE_EFF.SQL
REM PURPOSE :GENERATE FILE IO EFFICIENCIES REPORT
REM USE :FROM STATUS_REPORTS.COM
REM Limitations :MUST BE RUN FROM ORACLE Remote DBA ACCOUNT
REM Revisions:
REM Date Modified By Reason For change
REM 10-JUL-1992 M. AULT INITIAL CREATE
REM 07-JUN-1993 M.AULT Added reads to writes, reformatted
REM 23-Jun-1997 M.Ault kcffio went away, rewrote to use
REM existing views/tables
SET PAGES 58 NEWPAGE 0
SET LINES 131
COLUMN eff FORMAT A6 HEADING '% Eff'
COLUMN rw FORMAT 9,999,999 HEADING 'Phys Block|read/writes'
COLUMN ts FORMAT A22 HEADING 'Tablespace Name'
COLUMN name FORMAT A40 HEADING ‘File Name’
START title132 "FILE IO EFFICIENCY"
BREAK ON ts
DEFINE OUTPUT = 'rep_out/&db/file_io.lis'
SPOOL &OUTPUT
SELECT
f.tablespace_name ts,
f.file_name name,
v.phyreads+v.phywrts rw,
TO_CHAR(DECODE(v.phyblkrd,0,null,
ROUND(100*(v.phyrds+v.phywrts)/(v.phyblkrd+v.phyblkwrt),2))) eff
FROM Remote DBA_data_files f, v$filestat v
WHERE f.file_id=v.file#
ORDER BY 1,file#;
SPOOL OFF
PAUSE Press return to continue
This is a cumulative report that gives
information based on I/O since the Oracle instance was started. The
report generated will list physical block reads and efficiency level
(the efficiency number measures the percent of time Oracle asked for
and got the right block the first time, which is a function of type of
table scan and indexing). An example report is shown in Listing 13.18.
LISTING 13.18 File I/O efficiency report.
Date:
11/09/01 Page: 1
Time: 06:25 PM FILE IO EFFICIENCY SYS
aultdb1 database
Tablespace Phys Block
Disk name File Name read/writes %
Eff
----- ----------- -------------------------------------- -----------
-----
C:\O SYSTEM C:\ORACLE\ORADATA\AULTDB1\SYSTEM01.DBF 3,260
63.74
RBS C:\ORACLE\ORADATA\AULTDB1\RBS01.DBF
1,351 100
USERS C:\ORACLE\ORADATA\AULTDB1\USERS01.DBF 40
100
TEMP C:\ORACLE\ORADATA\AULTDB1\TEMP01.DBF 4,252
85.25
TOOLS C:\ORACLE\ORADATA\AULTDB1\TOOLS01.DBF 891
78.9
INDX C:\ORACLE\ORADATA\AULTDB1\INDX01.DBF 40
100
DRSYS C:\ORACLE\ORADATA\AULTDB1\DR01.DBF 40
100
PERFSTAT C:\ORACLE\ORADATA\AULTDB1\PERFSTAT.DBF 1,206
100
TEST_2K C:\ORACLE\ORADATA\AULTDB1\TEST_2K.DBF 42
100
*****
-----------
sum
11,122
9 rows selected.
Points of interest in Listing 13.34 are:
·
In general, the relatively low efficiency of the SYSTEM
and TOOLS areas. This is due to indexes and
tables being mixed together in the
SYSTEM and TOOLS tablespaces. A classic example, on Oracle’s part, of
“Do what we say, not what we do.”
·
Rollback efficiency should always be 100 percent; if
not, someone is using the rollback area for tables.
·
Index tablespace should always show high efficiencies;
if they don't, then either the indexes are bad or someone is using the
index areas for normal tables.
·
An attempt should be made to even-out I/O. In the above
example, too much I/O is being done on C:\; some of these data files
should be spread to other disks.
·
This report shows total I/O for the time frame beginning
with the Oracle system startup. The results could be stored for two or
more dates and times and then subtracted to show the disk I/O for a
selected period of time. STATSPACK should be used for this type of
measurement.
Running this report before and after doing an
application test run will give you an idea of the disk I/O profile for
the application. This profile, combined with information concerning
the maximum I/O supported by each disk or each controller, can help
the Remote DBA determine how best to split out the application's files
between disks.
In a RAID situation, where tracking disk I/O
can be problematic, I suggest using one of the tools from Quest or
Precise that allow you to track I/O down to the spindle, even in RAID
configurations.
Tuning to Prevent Contention
Contention occurs when a number of users
attempt to access the same resource at the same time. This can occur
for any database object but is most noticeable when the contention is
for rollback segments, redo logs, latches, or locks. You may also
experience contention during the processes involved with the
multithreaded server.
To correct contention, you must first realize
that it is occurring. The procedure called in the script shown in
Source 13.18 can be used to monitor for contention. The procedure
called is a part of the Remote DBA_UTILITIES package, described later in this
chapter, in section 13.8, “Using the Remote DBA_UTILITIES Package.” (Note,
this package is over 1,200 lines long; if you want to look at it,
download it from the Wiley Web site.) The report generated by
this script is shown in Listing 13.19.
SOURCE 13.18 The RUNNING_STATS calling script.
REM
REM NAME : DO_CALSTAT.SQL
REM FUNCTION :Generate calculated statisitics report using
REM FUNCTION :just_statistics procedure
REM USE :FROM STATUS.SQL or SQLPLUS
REM Limitations :
REM Revisions:
REM Date Modified By Reason For change
REM 05-MAY-1992 Mike Ault Initial Creation
REM 23-JUN-1997 Mike Ault Updated to V8
REM
SET PAGES 58 NEWPAGE 0
EXECUTE Remote DBA_utilities.running_stats(TRUE);
START title80 "CALCULATED STATISTICS REPORT"
DEFINE output = rep_out\&db\cal_stat.lis
SPOOL &output
SELECT * FROM Remote DBA_temp;
SPOOL OFF
The Remote DBA_UTILITIES package called in Source
13.18 retrieves contention and database health-related statistics,
then calculates other statistics based upon those it retrieves.
LISTING 13.19 Sample output from running
stats listing.
Date:
11/07/01 Page: 1
Time: 03:35 PM CALCULATED STATISTICS REPORT Remote DBAUTIL
aultdb1
database
NAME
VALUE
-------------------------------- ----------
Startup Date: 07-nov-01 15:22:57
0
CUMMULATIVE HIT RATIO
.982755488
sorts (memory)
4891
sorts (disk)
6
redo log space requests 1
redo log space wait time 25
Rollback Wait %
0
Shared Pool Available
56.4824371
Shared SQL%
44.0445754
Shared Pool Used
31.4790573
Data Dictionary Miss Percent 9.64587019
Library Reload %
.039244203
table fetch by rowid
14930
table scans (long tables)
85
table scans (short tables)
694
table fetch continued row
383
Non-Index Lookups Ratio
.109677419
RECURSIVE CALLS PER USER 35.1794083
SYS UNDO HDR WAIT CONTENTION
0
SYS UNDO BLK WAIT CONTENTION
0
UNDO BLK WAIT CONTENTION
0
UNDO HDR WAIT CONTENTION
0
Free List Contention Ratio
0
library cache
.000266132
cache buffers chains
.001655418
cache buffers lru chain
.009571759
redo writing
.038273117
redo allocation .012726424
FUNCTION
6
LIBRARY
16
TRIGGER
22
TABLE
426
SYNONYM
13226
SEQUENCE
59
PROCEDURE
3
PACKAGE BODY
125
PACKAGE
140
OPERATOR 21
LOB
9
VIEW
165
TYPE BODY
23
TYPE
75
JAVA RESOURCE
16
INDEX
478
JAVA CLASS
847
INDEXTYPE
7
TOTAL ALLOCATED MEG
1106.5
TOTAL USED MEG 520.320313
TOTAL SGA SIZE 260407308
Press enter
to continue
Let’s examine the various statistics gathered
by the do_calst2.sql report and see what each means in respect to the
database.
The first value reported is just the startup
time for the instance. Since many of the following statistics are
cumulative from the time the instance was started, it was deemed
necessary to have a timestamp showing the startup date.
Startup Date: 07-nov-01 15:22:57 0
The next value, the cumulative hit ratio, is
used to get a feel for performance over time. The cumulative hit ratio
is the “average” hit ratio since the database was started, thus it may
be low if taken too soon after startup. Usually, you should look for
the cumulative hit ratio to be in the range of 0.85 to 1.0 for an OLTP
system and 0.6 to 1.0 for a batch system. Generally speaking, if this
value is below the applicable range, an increase in the number of
db_block_buffers is indicated.
CUMMULATIVE
HIT RATIO .982755488
The next two statistics deal with how the
database is performing sorts. Essentially, you want to minimize disk
sorts. Disk sorts are reduced by increasing the size of the
sort_area_size initialization parameter. If disk sorts exceed 1 to
2percent of total sorts, tune your sort areas. If disk sorts are
required, look into using direct writes to speed sort processing to
disks.
sorts (memory) 4891
sorts (disk) 6
The next two statistics deal directly with
redo log contention. If redo log space waits become excessive (into
the hundreds), and wait time is excessive, consider tuning log_buffers
and possibly resizing or increasing the number of redo logs. Actually,
you would increase the number of redo log groups or the size of
individual redo log members. All redo logs should be of equal size. It
is suggested that the redo logs be sized such that loss of the online
redo log will lose only X amount of data, where X is the critical
amount of data for your application (say, an hour’s worth, day’s
worth, ten minutes’ worth, etc.).
redo log space requests
25
redo log space wait time
0
The next statistic, “Rollback Wait %,” tells
how often a process had to wait for a rollback segment. If this value
gets near 1.0, consider rollback segment tuning.
Rollback Wait % 0
The next set of statistics deal with
shared-pool health. The “Shared SQL %” statistic shows the ratio of
total SQL areas in the library cache that can’t be reused against the
total number of SQL areas (by memory). If more than 40 to 50 percent
of your SQL is not reusable, and your used area is greater than 40 to
60 meg, you may suffer performance problems due to shared-pool
thrashing. Shared-pool thrashing can be reduced by automated flushing
or by reducing the shared-pool size, depending on the amount of
non-reusable SQL.
Shared Pool Available 56.4824371
Shared SQL% 44.0445754
Shared Pool Used 31.4790573
The next statistic, “Data Dictionary Miss
Percent,” shows the miss percents for attempts to read values from the
dictionary caches and fails. Dictionary cache misses can be quite
expensive, so if this value exceeds 10 percent, you probably need to
increase the size of your shared pool, because the individual
dictionary caches haven’t been tunable since Oracle 6 (unless you
count undocumented parameters).
Data Dictionary Miss Percent 9.64587019
The next value, “Library Reload %,” shows how
often your library cache has to reload objects that have been aged (or
flushed) out of the shared pool. If this value nears 1 percent, look
at resizing the shared pool in most systems. In systems where the
percentage of reusable SQL is low, this value may be high due to
shared pool thrashing.
Library Reload %
.039244203
The next values deal with table fetch
efficiencies: “table fetch by rowid” shows how many rows were fetched
due to direct statement of rowid in the select or via index lookups
resulting in rowid fetches. This value should be used in concert with
the “table fetch continued rows” statistic, to determine if your
system is experiencing a high level of row/block chaining. If the
ratio between the two values (table fetch continued rows/table fetch
by rowid) times100 reaches whole percentage points, examine frequently
searched tables containing VARCHAR2 or numeric values that are
frequently updated. Columns with LOB, LONG, or LONG RAW values can
also result in chaining if their length exceeds the db_block_size. The
table scans (long tables) tell how many long full table scans were
performed. Ideally, table scans (long tables) should be as small as
possible; I try to keep it in a 10:1 ratio with table scans (short
tables). The table scans (short_tables) are scans of tables that are
less than 2 percent of the value DB_CACHE_SIZE; or, for pre-Oracle9i
databases, the product of DB_BLOCK_SIZE and DB_BLOCK_BUFFERS.
table fetch by rowid
14930
table scans (long tables)
85
table scans (short tables)
694
table fetch continued row
383
The next statistic, “Non-Index Lookups Ratio,”
is a ratio between long table scans and short table scans, and is
based on the assumption that a short table scan will most likely be an
index scan. The short table versus long table scan is based on a short
table being less than 10 percent of the total of db_block_buffers
times db_block_size on most systems. The undocumented initialization
parameter, “_small_table_threshold” will override this 10 percent rule
if it is set for your system. Generally speaking, this value should
always be much less than 1.0. If this value even approaches 0.1, look
at the number of full table scans being performed on your system. It
is suggested that this number be trended so that you know what is
normal for your system and can take action if it changes. For a
well-tuned system, this value should remain relatively stable.
Non-Index Lookups Ratio
.109677419
The next statistic, “RECURSIVE CALLS PER
USER,” shows how many times, on average, a user process had to repeat
a request for information. These repeat calls can be due to dynamic
object extension, reparse of SQL statements, or several other
recursive database actions. Generally speaking, you want this number
as small as possible. On well-tuned systems, I have seen this as low
as 7 to 10 per user or lower.
RECURSIVE CALLS PER USER
35.1794083
The next four statistics deal with UNDO
(rollback) segment contention. Usually these numbers will be at or
near 0 for a well-tuned database. Generally, any value approaching 1
indicates the need for rollback segment resizing/tuning.
SYS UNDO HDR WAIT CONTENTION
0
SYS UNDO BLK WAIT CONTENTION
0
UNDO BLK WAIT CONTENTION
0
UNDO HDR WAIT CONTENTION
0
The next statistic, “Free List Contention
Ratio,” is applicable to Oracle Parallel Server only. If the ratio
reaches whole percentages, then look at major tables and rebuild them
with better freelist and freelist group structure. This is applicable
only if you have freelist groups defined in OPS or RAC, or in Oracle8i
or Oracle9i with freelist groups configured in a normal database.
Free List Contention Ratio
0
The next set of statistics varies in number
from 0 (none shown) to as many as there are latch types in the
database. If there is any latch contention for a specific latch, it
will be shown in this listing. Generally, these will be in the .001 or
less range; if they get into the 0.1 to .01 range, consider tuning the
latch that is showing the large contention value, if possible. It is
normal for redo copy or redo allocation latches to show contention on
RS6000 or single CPU machines. The parallel query latches may also
show contention (even Oracle Support can’t tell why; surprise,
surprise).
library cache
.000266132
cache buffers chains
.001655418
cache buffers lru chain
.009571759
redo writing
.038273117
redo allocation
.012726424
The next set of statistics give general
database object counts. You should be cognizant of the usual values
and be aware if any get out of normal bounds.
FUNCTION
6
LIBRARY
16
TRIGGER
22
TABLE
426
SYNONYM 13226
SEQUENCE
59
PROCEDURE
3
PACKAGE BODY
125
PACKAGE
140
OPERATOR
21
LOB
9
VIEW
165
TYPE BODY
23
TYPE
75
JAVA RESOURCE
16
INDEX
478
JAVA CLASS
847
INDEXTYPE
7
The next two statistics deal with how
allocated filespace is being utilized by the database; they measure
the total sum of datafile sizes against the total size of all
allocated extents in the database. If you see that your used space is
exceeding 90 percent of the total space allocated, then look at a more
detailed report of space usage (which follows later in the report
list) to see which tablespaces may need additional files allocated.
TOTAL ALLOCATED MEG
1106.5
TOTAL USED MEG
520.320313
The final statistic, “TOTAL SGA SIZE,”
calculates a total size of the SGA based on the V$SGA view. This is
strictly an informational statistic.
TOTAL SGA SIZE
260407308
There may be an additional set of statistics
that deal with database block contention. The “waits” statistics tell
how many waits occurred for a given block type, and the “time”
statistics tell how long the waits took for each type of block. The
data statistics deal with database blocks; the undo statistics deal
with rollback segment blocks; and if you get system statistics, they
deal with the system rollback segment. If data block waits or data
block header waits exceed 100, then look at increasing the number of
data base block buffers (note that hit ratio can be satisfactory even
with significant contention, so both statistics should be used to
determine database block buffer health). If undo block waits are
indicated, increase the size of the rollback segment extents; if undo
header waits are indicated, increase the number of rollback segment
extents. If system block or system header waits are indicated, there
may be users other than SYS or SYSTEM assigned to the SYSTEM
tablespace.
data block waits 396
undo header time 0
data block time 324
undo header waits 8
See
Code Depot for Full Scripts
 |
This is an excerpt
from Mike Ault, bestselling author of "Oracle
10g Grid and Real Application Clusters".
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |
 |
Expert Remote DBA
BC is America's oldest and largest Remote DBA Oracle support
provider. Get real Remote DBA experts, call
BC Remote DBA today. |
 |
|