 |
|
Disk Speed and Size Selection
Oracle Tips by Mike Ault |
This all points to the fact that in order to get the
maximum performance from your disk system you must understand the IO
characteristics (the profile) of your database system, be it Oracle,
SQL Server, Informix, UDB or MySQL. You must tune your disk
architecture to support the expected IO profile and must tune the
database system to take advantage of the disk architecture. For
example, an Oracle database has different IO characteristics depending
on whether it is reading or writing data and what type of read or
write it is doing. Other databases have fixed read/write sizes.
You must determine the IO profile for your database
and then use the IO profile of the database to determine the maximum
and minimum IO size. The IO profile will tell you what percentage of
IO is large IO and what percentage is small IO, it will also give you
the expected IO rate in IO/second.
Once you have the IO per second you can determine the
IO capacity (number of drives) needed to support your database.
The first rule of tuning your disk system is:
Size first for IO capacity, then for volume.
Some back of the envelope calculations for the number
of spindles needed to support IO rate are:
RAID10 with active read/write to all mirrors:
MAX(CEILING(IOR/(NSIOR*M),M),2*M)
Where:
·
IOR is expected maximum IO rate in IO/sec
·
NSIOR is the average non-sequential IO rate of the disks
in IO/sec (range of 90-100 for RAID10)
·
M is the number of mirrors
(The maximum of the IO rate divided by the average
non-sequential IO rate per disk times the number of mirrors to the
nearest power of M or 2*M)
RAID5 assuming 1 parity disk:
MAX((IOR/CNSIOR)+1,3)
Where:
·
IOR is expected maximum IO rate in IO/sec
·
CNSIOR is the corrected average non-sequential IO rate
of the disks in IO/sec (range of 60-90 for RAID5)
(The maximum of the IO rate divided by the average
non-sequential IO rate per disk corrected for RAID5 penalties plus 1
disk for the parity disk)
The correction for the non-sequential IO rate for RAID
is due to the up to 400% penalty on writes (writes take 4 times linger
than reads on the same drive). In some cases on RAID5 I have seen this
go as high as 6400% (writes take 64 times as long as reads for the
same file) when combined with other problems such as fragmentation.
A case in point, early RAID architectures utilized the
"stripe shallow and wide" mind set where files where broken into small
pieces and spread over a large number of disks. For example, stripe
unites per disk of as small as 8K were common. Many systems read in IO
sizes of 64K or larger. This means that to satisfy a single IO request
8 disks of the RAID set were required, if there were fewer than 8
disks in the set. Disks would have to undergo 2 or more IOs to satisfy
the request. This sounds fine if you are talking about a single user
wanting to read a large file from a large group of disks very quickly,
however, what happens when you have 10 or 100 or 1000 concurrent users
all wanting to do the same thing?
Tune for Concurrency
This problem with concurrent access and RAID arrays is
one of the most prevailing in the industry. The ubiquitous IO wait is
usually the predominant wait event in any database system simply due
to the fact that IO to memory is in the nanosecond range while IO to
disk is in the millisecond range, when you add in blocked access due
to multi-disk IO requests you get a snowball effect that can cripple
your IO subsystem.
Array manufacturers have begun to recognize this
concurrent access problem and have increased the base stripe unit per
disk to 64K, matching the IO unit for many systems. Of course now
systems such as SUN and Windows utilize maximum IO sizes of 1 megabyte
or larger, so again the array manufacturers are playing catch up to
the server manufacturers.
So what is our second rule of tuning disks? Based on
the above information the rule is:
Always ensure that the primary IO size for your
database system is matched to the IO size of the disk array system.
Of course the inverse also holds true:
Always match the stripe unit per disk to the
expected majority IO request from your (database) application.
In the 1990's Paul Chen of the University Of Berkeley
computer center published a series of papers on tuning disk array
stripe units size based on expected concurrency. In these papers by
Mr. Chen and his associates they determined that the IO speed (as
measured by average seek time) and IO rate (as measured in megabytes
per second) for a disk determined the stripe size for performance in
an array even when the number of concurrent accesses is not known.
There were three formulae derived from these papers:
For non-RAID5 arrays when concurrency is known:
SU = (S*APT*DTR*(CON-1)*1.024)+.5K
Where:
·
SU - Striping unit per disk
·
S - Concurrency slope coefficient (~.25)
·
APT - Average positioning time (milliseconds)
·
DTR - Data transfer rate (Megabyte/sec)
·
CON - number of concurrent users.
o
= 1s/1000ms*1024K/1M (conversion factors for units)
So for a drive that has an average seek time of 5.6 ms
and a transfer rate of 20 Mbyte/second the calculated stripe unit for
a 20 concurrent user base would be:
(.25*5.6*20*(19)*1.024)+.5 = 545K (or ~512K)
For a system where you didn't know the concurrency the
calculation becomes:
SU
=(2/3*APT*DTR)
So for the same drive:
2/3*5.6*20*1.024 = 76.46K so rounding up ~128K or rounding down 64K
And from Chen's final paper, a formula for RAID5
arrays is:
0.5*5.6*20*1.024 = 57.34 (rounding up 64K)
The values for average access time and transfer rate
used in these examples is actually fairly low when compared to more
advanced drives so the stripe sizes shown above are probably low by at
least a factor of 2 or more. I say this because while average seek
times drop, the transfer rate increases for example on a Ultra3 SCSI
15K drive the spec for average seek may drop to 4.7 ms, however the
transfer rate leaps to 70 Mbyte per second. So the over all value of
the combined factor goes from 112 to 329, a 293% increase.
The 100% Myth
Many system administrators are guilty of perpetuating
the 100% myth. This myth states that you don't need more assets (be it
disk, CPU, or Memory) until the existing asset is 100% utilized. This
leads to performance issues in the area of disks. Due to disk physics
the best performance for a disk is at the outer edges, once you get
towards the inner sectors performance decreases because of the
distance the head must travel to read the data and other factors. In
the good old days administrators spent much time positioning
frequently used files on the outer edges of disks.
While physically positioning files on disks is
difficult if not impossible in modern RAID systems, you should
endeavor not to fill the disks to 100% of capacity. Some experts say
don't use more then 30% if you want maximum performance, others 50%. I
say it depends on how the system is used, the operating system and the
RAID array system. For example the Clariion from EMC promises to tune
the placement of files such that frequently used files are in the best
locations.
So, what can we summarize about disk size and speed?
Get the fastest drives you can and plan capacity
based on concurrency requirements as well as IO requirements. The
more, faster disks the better.
SEE CODE DEPOT FOR FULL SCRIPTS
 |
For more information on this topic, I recommend Don
Burleson's latest book "Oracle
Tuning: The Definitive Reference".
You can buy it direct from the publisher for 50%-off and get
instant access to the code depot of Oracle tuning scripts: |
http://www.rampant-books.com/book_1002_oracle_tuning_definitive_reference_2nd_ed.htm
 |
Expert Remote DBA
BC is America's oldest and largest Remote DBA Oracle support
provider. Get real Remote DBA experts, call
BC Remote DBA today. |
 |
|