The Data Warehouse Development Life Cycle
Oracle Data Warehouse Design
HIERARCHICAL ATTRIBUTE DESIGN
In our over-simplified example, we
see that level_one can be food or nonfood; level_two
can be animal, vegetable, paper, or toy; and level_three can
be beef, chicken, or "U" for unclassified (in cases where a level
three classification does not apply). The advantage of this
representation of the hierarchy is that this representation will
only add 3 bytes to each fact row. The downside is that queries are
cryptic for end users. End users will have to know the valid values
for each classification level in their queries. For example, to
query for the total sales of non-food items by customers who earn
more than $50,000 per year, would require that they know the proper
flag vales as shown below.
SELECT sum(sale_amount) from FACT
SEE CODE DEPOT FOR FULL SCRIPT
yearly_income_class > 3;
2. Assign a descriptive name
for each level in the hierarchy. This method would assign
descriptive values for each product class, thereby making it easier
for the end users to query the fact table. For example:
food_flag
char(1)
CONSTRAINT food_check
CHECK (food_flag in (Y,N);
animal_or_vegtable_flag char(1)
CONSTRAINT animal_check
CHECK (animal_or_vegetable_flag in (A,V,U);
type_of_meat_flag char(1)
CONSTRAINT meat_check
CHECK (type_of_meat_flag in (F,C,P,B);
. . .
The benefit of this approach
is that the queries can be made very descriptive, for example:
SELECT * from FACT
SEE CODE DEPOT FOR FULL SCRIPT
yearly_income_class = 2;
The downside, of
course, is that there will be many more flags in the fact table,
and, because the values are mutually exclusive at each level, most
of the flags will contain NULL values. There will also be far
more indexes on the fact table, and they will slow down the nightly
batch update process. As data warehouses grow into the terabyte
range, even a few additional bytes can have a substantial disk cost.
Also, the type of front end will also influence the decision about
the type of flags. If your application hides the SQL behind a front
end query tool, then this type of approach has no real advantage.
Again, these physical
attribute representation issues go to the very heart of the data
warehouse, and intelligent up front planning will ensure a sound,
robust system.
This is an excerpt from "High Performance
Data Warehousing".
 |
If you
like Oracle tuning, you may enjoy the book
Oracle Tuning: The Definitive Reference , with
over 900 pages of BC's favorite tuning tips &
scripts.
You can buy it
directly from the publisher and save 30%, and get
instant access to the code depot of Oracle tuning
scripts. |