Your next task is to establish keys for each table in the database.
There are four main types of keys:
candidate,
primary,
foreign, and
non-keys. A key's type determines its function
within the table.
Candidate Keys
The first type of key you establish for a table is the candidate
key, which is a field or set of fields that uniquely identifies a single
instance of the table's subject. Each table must have at least one
candidate key. You'll eventually examine the table's pool of available
candidate keys and designate one of them as the official primary key for
the table.
Before you can designate a field as a candidate key, you must make
certain it complies with all of the Elements of a Candidate Key.
These elements constitute a set of guidelines you can use to determine
whether the field is fit to serve as a candidate key. You cannot
designate a field as a candidate key if it fails to conform to any
of these elements.
Elements of a Candidate Key
- It cannot be a multipart field.
- It must contain unique values. This element helps you guard
against duplicating a given record within the table. Duplicate records
are just as bad as duplicate fields, and you must avoid them at all
costs.
- It cannot contain null values. As you already know, a null
value represents the absence of a value. There's absolutely no
way a candidate key field can identify a given record if its value is
null.
- Its value cannot cause a breach of the organization's security
or privacy rules. Values such as passwords and Social Security
Numbers are not suitable for use as a candidate key.
- Its value is not optional in whole or in part. A value that
is optional implies that it may be null at some point. You can infer,
then, that an optional value automatically violates the previous
element and is, therefore, unacceptable.
- It comprises a minimum number of fields necessary to define
uniqueness. You can use a combination of fields (treated as a
single unit) to serve as a candidate key, so long as each field
contributes to defining a unique value. Try to use as few fields as
possible, however, because overly complex candidate keys can
ultimately prove to be difficult to work with and difficult to
understand.
- Its values must uniquely and exclusively identify each record
in the table. This element helps you guard against duplicate
records and ensures that you can accurately reference any of the
table's records from other tables in the database.
- Its value must exclusively identify the value of each field
within a given record. This element ensures that the table's
candidate keys provide the only means of identifying each field value
within the record.
- Its value can be modified only in rare or extreme cases.
You should never change the value of a candidate key unless you have
an absolute and compelling reason to do so. A field is likely to have
difficulty conforming to the previous elements if you can change its
value arbitrarily.
Establishing a candidate key for a table is quite simple: Look for a
field or set of fields that conforms to all of the Elements of a
Candidate Key. You'll probably be able to define more than one candidate
key for a given table. Loading a table with sample data will give you
the means to identify potential candidate keys accurately.
See if you can identify any candidate keys for the table in
Figure 8.1.
FIGURE
8.1 Are there any candidate keys in this table?
You probably identified EMPLOYEE ID, SOCIAL SECURITY NUMBER, EMPLAST
NAME, EMPFIRST NAME and EMPLAST NAME, EMPZIPCODE, and EMPHOME
PHONE as potential candidate keys. But you'll need to examine these
fields more closely to determine which ones are truly eligible to become
candidate keys. Remember that you must automatically disregard any
field(s) failing to conform to even one of the Elements of a
Candidate Key.
Upon close examination, you can draw the following conclusions:
- EMPLOYEE ID is eligible. This field conforms to every
element of a candidate key.
- SOCIAL SECURITY NUMBER is ineligible because it could contain
null values and will most likely compromise the organization's privacy
rules. Contrary to what the sample data shows, this field could
contain a null value. For example, there are many people working in
the United States who do not have Social Security numbers because they
are citizens of other countries.
NOTE
Despite its widespread use in many types of databases, It is recommended that you refrain from using SOCIAL SECURITY
NUMBER as a candidate key (or a primary key, for that matter) in any
of your database structures. In many instances, it doesn't conform
to the Elements of a Candidate Key. You can learn some very
interesting facts about Social Security numbers (which will shed
some light on why they make poor candidate/primary keys) by visiting
the Social Security Adminstration's Web site at http://www.ssa.gov.
- EMPLAST NAME is ineligible because it can contain duplicate
values. As you've learned, the values of a candidate key must be
unique. In this case there can be more than one occurrence of a
particular last name.
- EMPFIRST NAME and EMPLAST NAME are eligible. The
combined values of both fields will supply a unique identifier for a
given record. Although multiple occurrences of a particular first name
or last name will occur, the combination of a given first name and
last name will always be unique.
- EMPZIPCODE is ineligible because it can contain duplicate
values. Many people live in the same zip code area, so the values
in EMPZIPCODE cannot possibly be unique.
- EMPHOME PHONE is ineligible because it can contain duplicate
values and is subject to change. This field will contain duplicate
values for either of these reasons:
- One or more family members work for the organization.
- One or more people share a residence that contains a single
phone line.
You can confidently state that the EMPLOYEES table has two candidate
keys: EMPLOYEE ID and the combination of EMPFIRST NAME and EMPLAST NAME.
Mark candidate keys in your table structures by writing the letters
"CK" next to the name of each field you designate as a candidate key. A
candidate key composed of two or more fields is known as a composite
candidate key, and you'll write "CCK" next to the names of the
fields that make up the key. When you have two or more composite
candidate keys, use a number within the mark to distinguish one from
another. If you had two composite candidate keys, for example, you would
mark one as "CCK1" and the other as "CCK2."
Apply this technique to the candidate keys for the EMPLOYEES table in
Figure 8.1. Figure 8.2
shows how your structure should look when you've completed
FIGURE
8.2 Marking candidate keys in the EMPLOYEES table structure.
Now, try to identify as many candidate keys as you can for the PARTS
table in Figure 8.3.
FIGURE
8.3 Can you identify any candidate keys in the PARTS table?
At first glance, you may believe that PART NAME, MODEL NUMBER, the
combination of PART NAME and MODEL NUMBER, and the combination of
MANUFACTURER and PART NAME are potential candidate keys. After
investigating this theory, however, you come up with the following
results:
- PART NAME is ineligible because it can contain duplicate
values. A given part name will be duplicated when the part is
manufactured in several models. For example, this is the case with
Faust Brake Levers.
- MODEL NUMBER is ineligible because it can contain null values.
A candidate key value must exist for each record in the table. As
you can see, some parts do not have a model number.
- PART NAME and MODEL NUMBER are ineligible because either field
can contain null values. The simple fact that MODEL NUMBER can
contain null values instantly disqualifies this combination of fields.
- MANUFACTURER and PART NAME are ineligible because the values
for these fields seem to be optional. Recall that a candidate key
value cannot be optional in whole or in part. In this instance, you
can infer that entering the manufacturer name is optional when it
appears as a component of the part name; therefore, you cannot
designate this combination of fields as a candidate key.
It's evident that you don't have a single field or set of fields that
qualifies as a candidate key for the PARTS table. This is a problem
because each table must have at least one candidate key.
Fortunately, there is a solution.
Artificial Candidate Keys
When you determine that a table does not contain a candidate key, you
can create and use an artificial (or surrogate)
candidate key. (It's artificial in the sense that it didn't occur
"naturally" in the table; you have to manufacture it.) You establish an
artificial candidate key by creating a new field that conforms to all of
the Elements of a Candidate Key and then adding it to the table; this
field becomes the official candidate key.
You can now solve the problem in the PARTS table. Create an
artificial candidate key called PART NUMBER and assign it to the table.
(The new field will automatically conform to the Elements of a Candidate
Key because you're creating it from scratch.)
Figure 8.4 shows the revised structure of the PARTS table.
FIGURE
8.4 The PARTS table with the artificial candidate key PART NUMBER.
When you've established an artificial candidate key for a table, mark
the field name with a "CK" in the table structure, just as you did for
the EMPLOYEES table in the previous example.
You may also choose to create an artificial candidate key when it
would be a stronger (and thus, more appropriate) candidate key than any
of the existing candidate keys. Assume you're working on an EMPLOYEES
table and you determine that the only available candidate key is the
combination of the EMPFIRST NAME and EMPLAST NAME fields. Although this
may be a valid candidate key, using a single-field candidate key might
prove more efficient and may identify the subject of the table more
easily. Let's say that everyone in the organization is accustomed to
using a unique identification number rather than a name as a means of
identifying an employee. In this instance, you can choose to create a
new field named EMPLOYEE ID and use it as an artificial candidate key.
This is an absolutely acceptable practice—do this without hesitation or
reservation if you believe it's appropriate.
Review the candidate keys you've selected and make absolutely certain
that they thoroughly comply with the Elements of a Candidate Key. Don't
be surprised if you discover that one of them is not a candidate key
after all—incorrectly identifying a field as a candidate key happens
occasionally. When this does occur, just remove the "CK" designator from
the field name in the table structure. Deleting a candidate key won't
pose a problem as long as the table has more than one candidate key. If
you discover, however, that the only candidate key you identified for
the table is not a candidate key, you must establish an
artificial candidate key for the table. After you've defined the new
candidate key, remember to mark its name with a "CK" in the table
structure.
Primary Keys
By now, you've established all the candidate keys that seem
appropriate for every table. Your next task is to establish a primary
key for each table, which is the most important key of all.
- A primary key field exclusively identifies the table
throughout the database structure and helps establish relationships
with other tables.
- A primary key value uniquely identifies a given record
within a table and exclusively represents that record throughout the
entire database. It also helps to guard against duplicate records.
A primary key must conform to the exact same elements as a candidate
key. This requirement is easy to fulfill because you select a primary
key from a table's pool of available candidate keys. The process of
selecting a primary key is somewhat similar to that of a presidential
election. Every four years, several people run for the office of
president of the United States. These individuals are known as
"candidates" and they have all of the qualifications required to become
president. A national election is held, and a single individual from the
pool of available presidential candidates is elected to serve as the
country's official president. Similarly, you identify each qualified
candidate key in the table, run your own election, and select one of
them to become the official primary key of the table. You've already
identified the candidates, so now it's election time!
Assuming that there is no other marginal preference, here are a
couple of guidelines you can use to select an appropriate primary key:
- If you have a simple (single-field) candidate key and a composite
candidate key, choose the simple candidate key. It's always best to
use a candidate key that contains the least number of fields.
- Choose a candidate key that incorporates part of the table name
within its own name. For example, a candidate key with a name such as
SALES INVOICE NUMBER is a good choice for the SALES INVOICES table.
Examine the candidate keys and choose one to serve as the primary key
for the table. The choice is largely arbitrary—you can choose the one
that you believe most accurately identifies the table's subject or the
one that is the most meaningful to everyone in the organization. For
example, consider the EMPLOYEES table again in
Figure 8.5.
FIGURE
8.5 Which candidate key should become the primary key of the
EMPLOYEES table?
Either of the candidate keys you identified within the table could
serve as the primary key. You might decide to choose EMPLOYEE ID if
everyone in the organization is accustomed to using this number as a
means of identifying employees in items such as tax forms and employee
benefits programs. The candidate key you ultimately choose becomes the
primary key of the table and is governed by the Elements of a Primary
Key. These elements are exactly the same as those for the candidate key,
and you should enforce them to the letter. For the sake of clarity, here
are the Elements of a Primary Key:
Elements of a Primary Key
- It cannot be a multipart field.
- It must contain unique values.
- It cannot contain null values.
- Its value cannot cause a breach of the organization's security or
privacy rules.
- Its value is not optional in whole or in part.
- It comprises a minimum number of fields necessary to define
uniqueness.
- Its values must uniquely and exclusively identify each record in
the table.
- Its value must exclusively identify the value of each field within
a given record.
- Its value can be modified only in rare or extreme cases.
Before you finalize your selection of a primary key, it is imperative
that you make absolutely certain that the primary key fully complies
with this particular element:
- Its value must exclusively identify the value of each field within
a given record.
Each field value in a given record should be unique throughout the
entire database (unless it is participating in establishing a
relationship between a pair of tables) and should have only one
exclusive means of identification—the specific primary key value for
that record.
You can determine whether a primary key fully complies with this
element by following these steps:
- Load the table with sample data.
- Select a record for test purposes and note the current primary key
value.
- Examine the value of the first field (the one immediately after
the primary key) and ask yourself this question:
Does this
primary key value exclusively identify the current value of
<fieldname>?
- If the answer is yes, move to the next field and repeat the
question.
- If the answer is no, remove the field from the table,
move to the next field and repeat the question.
- Continue this procedure until you've examined every field value in
the record.
A field value that the primary key does not exclusively
identify indicates that the field itself is unnecessary to the
table's structure; therefore, you should remove the field and reconfirm
that the table complies with the Elements of the Ideal Table. You can
then add the field you just removed to another table structure, if
appropriate, or you can discard it completely because it is truly
unnecessary.
Here's an example of how you might apply this technique to the
partial table structure in Figure 8.6. (Note
that INVOICE NUMBER is the primary key of the table.)
FIGURE
8.6 Does the primary key exclusively identify the value of each
field in
this table?
First, you load the table with sample data. You then select a record
for test purposes—we'll use the third record for this example—and note
the value of the primary key (13002). Now, pose the question above for
each field value in the record.
Does this primary key value exclusively identify the current value of . . .
INVOICE DATE? |
Yes, it does. This invoice number
will always identify the specific date that the invoice was created. |
CUSTFIRST NAME? |
Yes, it does. This invoice number
will always identify the specific first name of the particular
customer who made this purchase. |
CUSTLAST NAME? |
Yes, it does. This invoice number
will always identify the specific last name of the particular
customer who made this purchase. |
EMPFIRST NAME? |
Yes, it does. This invoice number
will always identify the specific first name of the particular
employee who served the customer for this sale. |
EMPLAST NAME? |
Yes, it does. This invoice number
will always identify the specific last name of the particular
employee who served the customer for this sale. |
EMPHOME PHONE? |
No, it doesn't! The invoice
number indirectly identifies the employee's home phone number
via the employee's name. In fact, it is the current value of
both EMPFIRST NAME and EMPLAST NAME that exclusively identifies the
value of EMPHOME PHONE—change the employee's name and you must
change the phone number as well. You should now remove EMPHOME PHONE
from the table for two reasons: The primary key does not exclusively
identify its current value and (as you've probably already
ascertained) it is an unnecessary field. As it turns out, you can
discard this field completely because it is already part of the
EMPLOYEES table structure. |
After you've removed the unnecessary fields you identified during
this test, examine the revised table structure and make sure it complies
with the Elements of the Ideal Table.
The primary key should now exclusively identify the values of the
remaining fields in the table. This means that the primary key is truly
sound and you can designate it as the official primary key for the
table. Remove the "CK" next to the field name in the table structure and
replace it with a "PK." (A primary key composed of two or more fields is
known as a composite primary key, and you mark it with the
letters "CPK.") Figure 8.7 shows the revised
structure of the SALES INVOICE table with INVOICE NUMBER as its primary
key.
FIGURE
8.7 The revised SALES INVOICES table with its new primary key.
As you create a primary key for each table in the database, keep
these two rules in mind:
Rules for Establishing a Primary Key
- Each table must have one—and only one—primary key. Because
the primary key must conform to each of the elements that
govern it, only one primary key is necessary for a particular table.
- Each primary key within the database must be unique—no two
tables should have the same primary key unless one of them is a
subset table. You learned at the beginning of this section that
the primary key exclusively identifies a table throughout the database
structure; therefore, each table must have its own unique
primary key in order to avoid any possible confusion or ambiguity
concerning the table's identity. A subset table is excluded from this
rule because it represents a more specific version of a particular
data table's subject—both tables must share the same primary
key.
Later in the database-design process, you'll learn how to use the
primary key to help establish a relationship between a pair of tables.
Alternate Keys
Now that you've selected a candidate key to serve as the primary key
for a particular table, you'll designate the remaining candidate keys as
alternate keys. These keys can be useful to you in an RDBMS
program because they provide an alternative means of uniquely
identifying a particular record within the table. If you choose to use
an alternate key in this manner, mark its name with "AK" or "CAK"
(composite alternate key) in the table structure; otherwise, remove its
designation as an alternate key and simply return it to the status of a
normal field. You won't be concerned with alternate keys for the
remainder of the database- design process, but you will work with them
once again as you implement the database in an RDBMS program.
(Implementing and using alternate keys in RDBMS programs is beyond the
scope of this work—our only objective here is to designate them as
appropriate. This is in line with the focus of the book, which is the
logical design of a database.)
Figure 8.8 shows the final structure for
the EMPLOYEES table with the proper designation for both the primary key
and the alternate keys.
FIGURE
8.8 The EMPLOYEES table with designated primary and alternate keys.
Non-keys
A non-key is a field that does not serve as a candidate,
primary, alternate, or foreign key. Its sole
purpose is to represent a characteristic of the table's subject, and its
value is determined by the primary key. There is no particular
designation for a non-key, so you don't need to mark it in the table
structure.