The Data Warehouse Development Life Cycle
Parallelism And Oracle Data Warehousing
SMP And MPP Processing
SMP, or symmetrical multiprocessing, describes an architecture where
many CPUs share a common memory area and I/O buffer. This type of
architecture is not scaleable, as additional processors must compete
for the shared memory and I/O resources. On the other hand, MPP, or
massively parallel processors, describes an architecture where many
independent processors share nothing, operating via a common I/O
bus. An MPP system can add processors without impeding performance,
and performance will actually increase as processors are added.
Some tasks are naturally suited for parallel execution. But more
common are tasks that have components that can be parallelized while
also containing some serial operations. One of the most common
examples of highly parallel operations is the text search of a very
large database. In this case, thousands of concurrent processes can
search portions of the data, and when all processes have completed,
the query manager can merge and sort the results for presentation.
The most confounding tasks for parallel processing are those that
have many steps that rely on the output from previous steps. But
even with these types of processes, we can find some tasks that have
components that can be run in parallel. For example, consider the
tasks involved in placing an order for a product.
1. Check customer profile.
2. Check customer credit rating.
3. Check customer payment history.
4. Check inventory levels.
5. Calculate the costs for the items.
6. Add sales tax.
7. Decrement the inventory on hand.
8. Prepare a shipping order.
9. Print customer bill.
Now, which of these processes can be parallelized? It appears that
there are some operations that can be parallel while others must be
serial (see Figure 7.5).
Figure 7.5 Parallelism of dependent tasks.
Here, we can see that there are three phases to the
process, where the output of one phase serves as input to the next
phase. Within each phase, we can see a number of tasks that can be
run in parallel. In essence, the process of parallelism requires the
restructuring of linear tasks to identify those tasks that can run
concurrently, while preserving the sequence of tasks that must be
serialized.