I'm Feeling SCSI About Disk Performance, What Then?
Oracle Tips by Mike Ault
Sorry for the bad pun
(well, actually I’m not) what can be done with SCSI interfaces? To
tell you the truth, not a lot, however, there are some items which you
may find useful. Most interfaces will buffer commands and issue them
in batches, for example, most SCSI interfaces use a 32 command buffer
that stacks commands until it has 32 of them and then fires them off.
This can be reset in LINUX using options in the modules.conf file for
the SCSI interface module.
In other UNIX flavors
there are many settings which can be changed, but an exact
understanding of the interface and its limitations as well as current
system loads must be had before changing any of the SCSI settings. If
you feel you need to have them checked, ask your SA.
Disk Stress In a
In summary, to determine
if a disk or array is undergoing IO related stress, perform an IO
balance and an IO timing analysis. If the IO timing analysis shows
excessive read or write times investigate the causes. Generally
speaking, poor IO timings will result when:
A single disk exceeds
110 – 150 IO per second
An entire multi-read
capable RAID10 array exceeds #MIRRORS*#DPM*110 IO’s per second
non-multi-read capable RAID10 array exceeds #DPM*110 IO’s per second
If a RAID5 array
exceeds (#DISKS-1)*66 IO’s per second then it will probably
experience poor IO timings
Make sure Oracle is
using direct IO at both the OS and Oracle levels
Make sure your disk
interface is tuned to perform optimally
*DPM=Disks per mirror
Seeing stress from
the Oracle Side
Disk stress will show up
on the Oracle side as excessive read or write times. Filesystem stress
is shown by calculating the IO timings as shown in Figure 16.
Calculate IO timing values for datafiles
col name format a65
col READTIM/PHYRDS heading 'Avg|Read Time' format 9,999.999
col WRITETIM/PHYWRTS heading 'Avg|Write Time' format 9,999.999
set lines 132 pages 45
start title132 'IO Timing Analysis'
from v$filestat f, v$datafile d
and phyrds>0 and phywrts>0
from v$tempstat a, v$tempfile b
and phyrds>0 and phywrts>0
order by 5 desc
Figure 16: IO Timing Report
An example of the output
from Figure 16 is shown in Figure 17:
11/20/05 Page: 1
Time: 11:12 AM IO Timing Analysis
NAME PHYRDS PHYWRTS READTIM/PHYRDS WRITETIM/PHYWRTS
----- -------------- ---------- ------- --------------
13 /dev/raw/raw19 77751 102092 76.8958599
33 /dev/raw/raw35 32948 52764 65.7045041
7 /dev/raw/raw90 245854 556242 57.0748615
54 /dev/raw/raw84 208916 207539 54.5494409
40 /dev/raw/raw38 4743 27065 38.4469745
15 /dev/raw/raw41 3850 7216 35.6272727
12 /dev/raw/raw4 323691 481471 32.5510193
16 /dev/raw/raw50 10917 46483 31.9372538
18 /dev/raw/raw24 3684 4909 30.8045603
23 /dev/raw/raw58 63517 78160 29.8442779
5 /dev/raw/raw91 102783 94639 29.1871516
Figure 17: Example IO Timing Report
As you can see from
Figure 17 we are looking at an example report from a RAW configuration
using single disks. Notice how both read and write times exceed even
the rather large good practice limits of 10-20 milliseconds for a disk
read. However in my experience for reads you should not exceed 5
milliseconds and usually with modern buffered reads, 1-2 milliseconds.
Oracle is more tolerant for write delays since it uses a delayed write
mechanism, so 10-20 milliseconds on writes will normally not cause
significant Oracle waits, however, the smaller you can get read and
write times, the better!
In UNIX or LINUX you
have multiple filesystem options, RAW, JFS, ext2, ext3, reiserFS,
OCFS. You need to use the best performing filesystem usually RAW, ext3
or resierFS. Oracle OCFS is also viable.
RAW is probably the
simplest filesystem to understand. In Unix systems, a disk can be
configured to contain a single contiguous chunk of space, or it can be
configured to contain multiple separate chunks of space. Each chunk of
space is called a partition or section. Typically a partition
would be formatted to contain a filesystem, allowing a hierchical
structure of directories and files to be created. A partition that
does not contain a filesystem is called a raw partition.. A
database extent stored on a raw partition is called a raw extent.
The primary advantage to
using raw extents is the possibility of increased performance compared
to extents stored in Unix files. The performance benefits occur
File system overhead
and address translation is eliminated. Database block addresses map
directly to raw partition addresses.
transfer data directly from the disk controller to shared memory and
write operations transfer directly from shared memory to the disk
controller. The Unix buffer pool is not used, avoiding the necessity
to copy data from a Unix page buffer to shared memory and vice
Although they may provide
increased performance, raw extents have several major disadvantages.
You should not take the decision to use them lightly. Among the
Raw disk partitions are
cumbersome to configure and manage. You have to keep track of what
is on each raw partition yourself. You can easily make mistakes and
destroy the contents of a raw partition.
The number of raw
partitions available on a disk may be quite limited. For example, on
Solaris 2.5, a disk can have at most 8 partitions.
Moving raw partitions
from one disk to another may be difficult or impossible.
Raw partition sizes are
fixed. Allocating more space to a partition requires backing up all
the partitions on a disk, designing a new partition layout, and
restoring all saved data.
You cannot use the same
operating system backup utilities to make backups of files and raw
You may get worse
performance with raw devices than you do with files. The performance
difference is highly dependent on the operating system's
implementation of the file system. Many modern Unix systems, like
Ditical Unix, AIX 4.2, HP-UX 10, Solaris 2.6, and others have highly
advanced filesystem implementations that include many performance
optimizations. Older Unix systems, such as Unix System V Release 4
are somewhat less advanced and raw partitions may provide better
performance on these systems.
You may not be able to
use logical volume managers with raw partitions. Not all Unix
systems support logical volumes composed of raw partitions.
SEE CODE DEPOT FOR FULL SCRIPTS
For more information on this topic, I recommend Don
Burleson's latest book "Oracle
Tuning: The Definitive Reference".
You can buy it direct from the publisher for 50%-off and get
instant access to the code depot of Oracle tuning scripts:
Expert Remote DBA
BC is America's oldest and largest Remote DBA Oracle support
provider. Get real Remote DBA experts, call
BC Remote DBA today.