 |
|
The CODASYL Network Model
Oracle Tips by Burleson Consulting |
A second database type is a network
database. Many computer systems have been implemented using the
network Database Management System (DBMS) specifications developed by
the Conference on Data Systems Languages (CODASYL). Also involved
were two subgroups of CODASYL: the Database Task Group (DBTG), and
the Data Description Language Committee (DDLC).
CODASYL and it's subgroups are an organization
of volunteer representatives of computer manufacturers and users. CODASYL began
in 1959. The first set of DBMS specifications was produced in 1969. This set
of specifications was revised, and the first real CODASYL DBTG specifications
were issued in 1971.
Basically, all database management systems are
based in some part of the CODASYL DBTG specifications. From the early CODASYL
DBTG specs, the data model was called a network data model. The model that
CODASYL DBTG developed became the basis for new database systems like IDMS from
Cullinet in 1970.
The Data Base Task Group (DBTG) CODASYL
Specifications included Schema definition, Device Media Control Language (DMCL)
definition, Data Manipulation Language (DML) definition. It also included the
concept of a database "area" which referred to the Physical Structure of the
data files. The Logical Structure of the database was defined by a Data
Definition Language (DDL), and a user view of the data was defined by a
subschema.
The Data Manipulation Language (DML) commands
were used to navigate through the linked-list structures that comprised the
database, much the same as object-oriented databases are navigated in C++. The
CODASYL DML verbs included FIND, GET, STORE, MODIFY, and DELETE.
The Data Base Administrator (Remote DBA) functions
included: data structure or schema, data integrity, security, and
authorization. Also a Data Base Manager (DBM) function was defined which
included: operation, backup/recovery, performance, statistics, auditing.
The CODASYL model used two data storage methods,
BDAM and linked-list data structure. BDAM used a hashing algorithm to store and
retrieve records. Linked-list is a group of items, each of which points to the
next item. The pointers establish relationships between the items.
Because of the many choices that can be made in
the design of a Network database, it is important to review the design with as
many people as possible. Charles Bachman developed a "diagram" that represented
the data structures as required by CODASYL. This diagram method became known as
the Bachman diagram. (Figure 2-6)
Figure 2-6 Bachman diagram layout
The Bachman diagram describes the physical
constructs of all record types in the database. The rectangles of the Bachman
diagram are subdivided into four rows. The top row of the box contains the
record name. Row two contains the numeric identification ID number (each record
is given a number which is associated with the record name), the length mode
which is fixed or variable, the length of the records, and the location mode
(CALC, or VIA). Row three contains for CALC, the field serving as the CALC key,
and for VIA SET, the set name. Row four contains the area designated. The set
type is shown by a Bachman arrow pointing from the owner record type to the
member record type. See figure 2-7. Set name is the owner name hyphen member
name. Pointers are Next, Prior and Owner; the membership option is used for
insertion and retention (MA, OA, MM, OM); the set order is (First, Last, Next,
Prior, or Sorted); and the mode is (Chain or Index).
Figure 2-7 Bachman set type
Records are stored using hashing techniques, and
records that are stored "CALC" use a symbolic key to determine the physical
location of the record. In a CODASYL database records are allowed to be
clustered. Records that have "VIA" indicate that they are stored on the same
physical data blocks as their owner records. Data relationships are established
by using "sets", which link the relationships together. For example, the
ORDER-LINE records are physically clustered near their ITEM records. This is
indicated in the Bachman diagram (figure 2-8) where the ORDER-LINE box shows VIA
as the location mode, and the ORDER-ITEM relationship as the cluster set. The
Bachman diagram is still used today. It is a very useful graphical picture of
the database schema.
Figure 2-8 Bachman diagram example
The CODASYL model combined two data storage
methods to create an engine which can and does process hundred of transactions
per second. The CODASYL model uses the basic direct access method (BDAM) which
uses a hashing algorithm (sometimes called a CALC algorithm) to quickly store
and retrieve records. CODASYL also employs linked-list data structures that
create embedded pointers in the prefix of each occurrence of a record. These
pointers, called NEXT, PRIOR, and OWNER, are used to establish relationships
between data items and are referenced in the Data Manipulation Language (DML).
For example, the DML command OBTAIN NEXT ORDER WITHIN CUSTOMER-ORDER, would
direct the network database to look in the prefix of the current ORDER record,
and find the NEXT pointer for the CUSTOMER-ORDER set. The database will then
access the record whose address is found at this location.
Two advantages to the CODASYL approach were
performance and the ability to represent complex data relationships. The
following example shows how BDAM is invoked for the OBTAIN CALC CUSTOMER
statement, and linked lists are used in the statement OBTAIN NEXT CUSTOMER
WITHIN CUSTOMER-ORDER.
This example shows how to navigate a one-to-many
relationship, (i.e. to get all of the orders for a customer), a CODASYL
programmer would enter:
MOVE 'MS' to CUST-DESC.
OBTAIN CALC CUSTOMER.
PERFORM ORDER-LOOP UNTIL END-OF-SET.
ORDER-LOOP.
OBTAIN NEXT ORDER WITHIN
CUSTOMER-ORDER.
MOVE ORDER-NBR TO OUT-REC.
WRITE OUT-REC.
Figure 2-9 Set occurrence diagram
The set occurrence diagram figure 2-9, used as a
visual tool has great potential for use in object-oriented databases.
Relationships between objects become readily apparent, and a programmer can
easily envision the navigation paths. For example, in figure 2-9, you can
easily see that order 1202 is for 24 carrots, 6 oranges, and 8 apples. Now look
at the "item" side of the diagram, and you can easily see which orders include
apples. When you are working with systems that physically link objects, the set
occurrence diagram is an extremely useful visual tool.
The design of the CODASYL network model was very
elaborate, but there were serious problems with implementation. Network
databases, very much like hierarchical databases, are very difficult to
navigate. The Data Manipulation Language (DML), especially for complex
navigation, was a skill that required months of training.
Structural changes are like a bad dream with
network databases, data relationships are "hard linked" with embedded pointers,
to add an index or a new relationship requires special utility programs that
will "sweep" each and every effected record in the database. As records are
located, the prefix is restructured to accommodate the new pointers.
Object-oriented databases will encounter this same problem if a class hierarchy
needs to be modified.
Even with these shortcomings, CODASYL databases
were still far superior to any other technology of the day, and many
corporations began to implement their mission-critical systems on IDMS
platforms. However, as soon as relational databases solved their speed problems
and became stable enough to support mission-critical systems, the awkward and
inflexible CODASYL systems were deserted.
CODASYL and the Object Database Management
Group (ODMG)
Even though the CODASYL model is more than 25
years old, it is fascinating to note that there is a remarkable similarity
between the CODASYL model and the internal models of today's state-of-the-art
object databases.
Just as the CODASYL model required a CALC to
uniquely identify a record, object databases require an "Object ID" or OID
(pronounced oy-id) to identify an object.
Unfortunately, relational database models cannot
address the high overhead and potential problems involved in generating OBJECT
ID's. Some theoreticians have proposed a data model which allow a single field
to contain multiple values, or even another table, such as Dr. Won Kim's UniSQL
database. In a procedural language such as C++, the problem of recursion is
addressed very elegantly with pointers to structures.
The ODMG standard for object-oriented databases
requires unique object ID's to identify each object, and they have deliberately
not addressed the ability to access a row, based on the data contents of the
row.
Many researchers have noted remarkable
similarities between the CODASYL Network Model (NWM), and the requirements for
object-oriented databases. The CODASYL model supports the declaration of
abstract "sets" to relate classes together, and CODASYL also supports the notion
of "currency", whereby a record may be accessed without any reference to its
data attributes. CODASYL databases provide currency tables that allow the
programmer to "remember" the most recently accessed record of each type, and the
most recently accessed record within a set.
Schek and Scholl, two object database
researchers, state, "This shows that some of the essential features of the
object model can be found in the CODASYL; CODASYL records are instances of
abstract types manipulated by a limited set of functions (called FIND's), mostly
for navigational access. CONNECT and DISCONNECT are used to add or remove
objects to or from relationships. Finally, GET retrieves data about the objects
into a pre-defined communications area."
Of all of the existing database models, the
CODASYL network model most closely matches the requirements for object-oriented
databases, and with some refinement (such as the support of "cyclic" data
relationships), the CODASYL model may re-emerge in a new form, as the standard
data model for object-oriented modeling.
Some vendors are already using the CODASYL model
as the architecture for object-oriented databases. For example, the C-Data
Manager from Database Technologies is an object-oriented database and
programming environment which is based on the CODASYL Network Data Model, and
uses ISAM file structures to index its data records.
 |
If you like Oracle tuning, see the book "Oracle
Tuning: The Definitive Reference", with 950 pages of tuning tips and
scripts.
You can buy it direct from the publisher for 30%-off and get
instant access to the code depot of Oracle tuning scripts. |