 |
|
EnterpriseDB SQL: Data Architecture
Oracle Tips by Burleson Consulting
|
An
index in a database is like the little card on the front of a filing
cabinet that says A-C in this drawer, D-F in this drawer, etc. An
index helps programs looking for information find the information
faster. Instead of looking through every drawer possible, it can go
straight to the drawer with the right folder.
Actually, the analogy falls apart somewhat at this point. An index
is really more like a list of names written on a folder. When you
look for a specific name, you know which folder it's in (and
approximately where it is in the folder) but you have to look
through the entire folder to find the specific name. The list will
tell you which folder it's in and will make your search quicker.
The
downside to an index is that every time you add a name to the
folder, you have to update the list. Therefore, adding new records,
or updating or deleting existing records, takes longer. The trade
off to an index is that you can choose to find it as quickly as
possible or you can choose to add, update and delete as quickly as
possible. The more indexes you have, for example: by name, date of
birth, job function, etc, the longer it takes to add or update an
existing record.
That
wasn't painful was it? We've already covered tables, columns and
indexes.
Now,
let's pretend we have a table called EMP. The columns in EMP look
like this:
Column Data Type
------------ -------------
EMPNO NUMBER(4)
ENAME VARCHAR2(10)
JOB VARCHAR2(9)
MGR NUMBER(4)
HIREDATE TIMESTAMP(0)
SAL NUMBER(7,2)
COMM NUMBER(7,2)
DEPTNO NUMBER(2)
Don't
worry about the data types yet. I'll cover those in the next
section. Right now, we're still working on architecture.
The
EMPNO column can hold a 4-digit number that uniquely identifies an
employee. When I say it "uniquely identifies", I mean that the
table can only have one record with a column that has that value.
This is called a primary key.
We
could possibly use ENAME, which is the employee name, as a primary
key but we might have two employees with the same name. That
wouldn't be unique! We could use something like social security
number but not every employee has one. So the column, or columns,
that uniquely identifies a row in a table is called the primary
key. In this case, our primary key is EMPNO.
To
make sure we have a unique value in EMPNO, we can use an object
called a sequence. I will show you how to create and use a sequence
below. A sequence is a number generator. It is user defined. It
can go forward or backwards and it skips one or more numbers between
calls. You can have many sequences defined in a database.
When
we populate EMPNO, we would call this sequence and get a new
number. The database automatically increments and keeps track of
the current value so that we don't have to. Compare this to the old
way of generating unique numbers (by hand on paper) and you can see
how this makes life easier.
A
primary key is always NOT NULL. Keeping this general, a NULL column
means it is an empty column (technically, it means the value is not
known). Before you put data in a column, it will be NULL. When an
empty field on a form, is blank, you could say that its value is
NULL.
By
saying a field must be NOT NULL, that means that you cannot add a
record unless that column has data. This is called a NOT NULL
constraint. Now we know that a primary key has, at least, two
constraints: Unique and NOT NULL.
We
have another table in our database called DEPT. DEPT has the
following columns:
Column Data Type
------------ -------------
DEPTNO NUMBER(2)
DNAME VARCHAR2(14)
LOC VARCHAR2(13)
In
this table, DEPTNO is a sequence that uniquely identifies the
department. DEPTNO is the DEPT table's primary key.
Notice
that EMP
also
has a DEPTNO column. Just because they have the same name does not
mean they are related. You have to know the business use of the
data to know if they are related. In this case, I know they are.
When
the primary key of one table is referenced in another table, it
means there is a relationship between the two tables. In this case,
the EMP
table
has a foreign key relationship with the DEPT table. That just means
that a foreign key (in this case the DEPT table's DEPTNO primary
key) exists in EMP.
This
is also called a referential constraint.
What
that means is that EMP
is
a
child table to DEPT and DEPT is the parent of EMP.
If the
DEPTNO column in EMP
is
defined as NOT NULL, then an employee cannot exist in a DEPTNO that
doesn't exist in the DEPT table.
If the
DEPTNO column in EMP
is
defined as to allow NULLs, then an employee can exist in a NULL
DEPTNO.
In our
case, DEPTNO in EMP
does
allow NULL values. We can create an employee record that is not
assigned to a department. We might want to do that if we had a new
hire who was not yet assigned to a particular position.
In the
diagram below, Figure 3.1, shows the relationship between EMP and
DEPT. This is called an Entity Relationship Diagram (ERD).
Figure
3.1: Entity Relationship Diagram
Let's
see what we've covered since our last summary:
* A
Primary Key uniquely identifies a record.
* A
NULL value is an unknown value.
* For
most purposes, an empty column is also a NULL column.
* A
Primary Key CANNOT be NULL.
* A
Sequence is a database object that generates sequential numbers.
* A
constraint is a restriction defined in the database.
* A
NOT NULL constraint on a column forces that column to have data when
a record is added.
* A
Referential constraint, also called a foreign key constraint,
restricts a row in a table to valid rows in a corresponding table.
Now
we've covered the major concepts of Data Architecture, albeit at a
very general level. I don't intend this small section to make you a
database designer or guru. However, you should now be somewhat
comfortable with the rest of this chapter.
Summary of Data Architecture: Tables have columns and tables can be
related to other tables on foreign keys. A foreign key is usually
the primary key of another table. The relationships between tables
are why some databases are called Relational Databases. Indexes can
make queries go faster but they might slow down updates.
This
is an excerpt from the book "EnterpriseDB:
The Definitive Reference" by Rampant TechPress.