The GCS maintains the status of the resources.
It also keeps an inventory of the access
requests for the data blocks. After the blocks
are transferred from one instance to another to
meet requests, the requesting processes need to
be notified that the block is actually
available. Therefore, processes utilize
interrupts to inform of the arrival or
completion of block transfers. The GCS uses
various interrupts to manage resource
allocation. These interrupts are:
Blocking Interrupt
- When exclusive access is
needed for a requestor, the GCS sends a blocking
interrupt to a process that currently owns the
shared resource, notifying it that a request for
an exclusive resource is waiting
Acquisition Interrupt
- When the requested access
(e.g., exclusive) is made available after
releasing an earlier access mode, an acquisition
interrupt is sent to alert the process that has
requested the exclusive resource. The
acquisition interrupt helps to notify the
requesting process.
Block Arrival Interrupt
- When a process requests a block from the GCS,
the request is forwarded to the instance holding
the block. Then the requested block is sent to
the requesting process, and the process informs
the GCS that it has received
the block. This notification is called block
arrival interrupt.
The block requests are granted for many
processes at the same time, but they follow a
queuing mechanism. The GCS maintains two types
of queues for resource requests. If the GCS is
unable to grant a resource request immediately,
then the GCS puts it in the convert queue. The
GCS then tracks all waiting requests. Once a
resource is granted to the requesting process,
it is kept in the granted queue. The GCS tracks
resource requests in the granted queue.
Cache Fusion and Recovery
In the RAC system, whenever there is a node
failure, the instance running on the failed node
crashes and becomes unusable. There can be
several reasons for such a failure. In this
section, focus will be placed on the changes
that take place in the global cache and how the
recovery of the failed instance is undertaken by
one of the surviving instances.
Recovery Features
Only the cache resources that reside on the
failed nodes or are mastered by the GCS on the
failed nodes need to be rebuilt or re-mastered.
Rebuilt or re-master does not mean building a
block; the lock ownership is merely changed and
this is explained later with examples.
All resources previously mastered at the failed
instance are redistributed across the remaining
instances. These resources are reconstructed at
their new master instance. All other resources
previously mastered at surviving instances
remain unaffected.
The cluster manager first detects the node and
instance failure. It communicates the failure
status to the GCS by way of the LMON
process. At this stage, any surviving instance
in the cluster initiates the recovery process.
Remember, instance recovery does not include
restarting the failed instance or recovering
applications that were running on that instance.
Also note that, even after a node failure and
instance loss, the redo log file of the failed
instance is still available to the other
recovering instance since the redo log file is
located on the shared cluster file system or
shared raw partition. This is an important
feature of the RAC system.
Because of past images, instance recovery is
performed differently in the RAC implementation.
The SMON process of a surviving instance
performs recovery of the failed instance or
thread. However, note that the foreground
process performs recovery in a stand-alone
instance.
Recovery Methodology and
Steps
Oracle performs the following steps to recover:
-
In the initial phase of recovery, GES
enqueues are reconfigured and the global
resource directory is frozen. All GCS
resource requests and writes are temporarily
halted.
-
GCS resources are reconfigured among the
surviving instances. One of the surviving
instances becomes the recovering instance.
The SMON process of the recovering instance
starts a first pass of the redo log read of
the failed instance's redo thread.
-
Block resources that need to be recovered
are identified and the global resource
directory is reconstructed. Pending requests
or writes are cancelled or replayed.
-
Resources identified in the previous log
read phase are defined as recovery
resources. Buffer space for recovery is
allocated.
-
Assuming that there are past images of
blocks to be recovered in other caches in
the cluster, source buffers are requested
from other instances. The resource buffers
are the starting point of recovery for a
particular block.
-
All resources and enqueues required for
subsequent processing have been acquired and
the global resource directory is now
unfrozen. Any data blocks that are not in
recovery can now be accessed. At this time,
the system is partially available.
-
The SMON merges the redo thread order by SCN
to ensure that changes are written in an
orderly fashion. This process is important
for multiple simultaneous failures. If
multiple instances die simultaneously,
neither the PI buffers nor the current
buffers for a data block can be found in any
surviving instance's cache. Then a log
merger of the failed instances is performed.
-
Now the second pass of recovery begins and
redo is applied to data files, releasing the
recovery resources immediately after block
recovery, so that more and more blocks
become available as cache recovery proceeds.
-
After all blocks have been recovered and
recovery resources have been released, the
system is available for normal use.
Figure 2.16 shows the basic steps in the
recovery.
Figure 2.16:
Online Instance Recovery Steps