The recovery process is done in two passes. The
first pass will construct recovery sets and the
appropriate lock modes after eliminating the
not-needed entries like BWR. This process makes use
of extra buffers in the recovering instance’s cache
to store the recovery list. In the second pass, the
actual recovery of the blocks takes place, and redo
is applied to the data files.
The following is an examination of some of these
situations to facilitate understanding of how the
process works. The scenario involves a RAC with
three instances (A, B, C) and instance C has failed.
Instance A has taken over the role of recovering
instance and Instance B is an open, good instance.
The situation is constructed as if the failed
instance existed.
Scenario 1:
Neither the recovering instance (A) nor the open
instance (B) has a lock element or it may be in NL0
mode. This indicates that the failed instance had
XL0. Therefore, SMON
acquires a lock in XL0 mode, reads the block from
disk, and applies redo changes. Thusly, the block is
kept in the recovery set. Later, DBWR
writes the recovery buffer out when recovery is
completed.
Figure 2.17:
Lock Re-Mastering – (Scenario-1)
Scenario 2:
Instance B has the block buffer in either XL0 mode
or SL0, but the recovering instance (A) does not
have anything. Since Instance B is holding the lock
in exclusive local mode, it is more current than the
redo stream. Therefore, no recovery is needed. There
is also no need to write this block to disk.
Figure 2.18:
Lock Remastering – (Scenario-2)
Scenario 3:
Instance B has the block buffer in either XG# mode
or SG# mode (both global), but the recovering
instance (A) does not have anything. Here, the
resource is in global role. Therefore, SMON
initiates the write of the current block on Instance
B. No recovery is needed because a current copy of
the block exists in Instance B. The entry is removed
from the recovery set. Write completion will release
the recovery buffer and Instance A acquires NG1.
Figure 2.19:
Lock Remastering – (Scenario-3)
Scenario 4:
The recovering instance (A) does not have anything
and Instance Bhas NG1 mode, which indicates the
failed instance had the more current block,
perhaps something like XG0. Therefore, Instance A
gets a consistent-read image block based on SCN from
Instance B, and acquires XG0 mode. It keeps the
block in the recovery list.
Figure 2.20:
Lock Re-Mastering – (Scenario-4)
Scenario 5:
The recovering instance (A) has the lock element in
SL0 or XL0 (both local) and other instances have no
lock elements on this block. This scenario requires
no recovery as the current copy of the buffer is
present in Instance A. It removes the redo entry
from the recovery list.
Figure 2.21:
Lock Re-mastering – (Scenario-5)
Scenario 6:
The recovering instance (A) has the lock element in
SG# or XG# (both Global). Since it has a global
role, shared or exclusive, the status on the other
open instance is immaterial. Therefore, Instance A
initiates the writing of the current block
to disk. There is no recovery needed and it releases
the buffer from the redo list.
Figure 2.22:
Lock Remastering – (Scenario-6)
Scenario 7:
Instance A has the lock element in NG1 and Instance
B has XG# or SG#. This involves writing the current
block
on Instance B and no recovery is needed.
Figure 2.23:
Lock Remastering – (Scenario-7)
Scenario 8:
Instance A has the lock element in NG1 and Instance
B has the lock in NG0/NG1 mode. It indicates the
failed instance was holding the resource in
exclusive mode. This involves getting a
consistent-read copy from the highest past image,
based on SCN, and applying redo changes. Instance B
sends the CR block to Instance A. This block is kept
for recovery.
Figure 2.24:
Lock Remastering – (Scenario-8)
Thus, after the first pass, the recovering instance
will have locks on every block in the recovery list
(set). Other instances will not be able to acquire
these locks until the recovery operation is
completed. Now the second pass begins, the redo is
applied to the data files.
During instance recovery, if the recovering instance
dies, a surviving instance, if one exists, will
acquire instance recovery enqueue and starts
recovery. If a non-recovering instance fails, SMON
will abort recovery, release the IR enqueue, and the
next live instance will reattempt instance recovery.
Conclusion
In this chapter, topics concerning Oracle RAC
Architecture have been covered. All the components
that make up Oracle 11g RAC were reviewed. Memory
Structures, background processes, cluster ready
services, and physical and logical structures of the
database dispatchers have been examined. The
differences between the database instance and
database have been identified. The concept of thread
was explored and how it is extended in case of RAC
database system.
This chapter has also explained the nature of cache
fusion, resource coordination, cache-to-cache
transfers, resource management, and lock
conversions. It also covered instance failure and
the associated re-mastering of resources by the
surviving instance.
|