MIT5314:
Database Applications
Fall, 2003
What is 2NF and what is an example of it?
Definition: A table is in 2NF, (Second
Normal Form), if it is in
1NF, (First Normal Form), and all non-key attributes are dependent only on the
primary key.
If a partial dependency occurs when a non-key attribute is dependent on only a part of the (composite) key, the definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and if it has no partial dependencies."
???What is Normalization???
Normalization is the process of efficiently organizing
data in a database.
The goal is to eliminate redundant data in a database
and to ensure data dependencies make sense.
The basic ideas of normalization are simple enough.
We want to create a design for a relational database files where:
1. All the data necessary for the
purposes that the database is to serve
2. Have as little redundancy as
possible
3. Accommodate multiple values for
types of data that require them
4. Permit efficient updates of the data in
the database
5. Avoid the danger of losing data
unknowingly
???Why do we need normalization???
Normalization is needed to prevent possible corruption
of a database stemming from update anomalies such as
insertion, deletion, and
modification.
Insertion anomaly is a failure to place information
about a new database entry into all the places in the database where
information
for that entry needs to be stored.
Deletion anomaly is a failure to remove information
about an existing database when it is time to remove that entry.
Update anomalies may be additions, deletions or
both.
??? Ok, remind me again. What does 1NF mean ???
First Normal Form (1NF)
A table (relation) is in 1NF
if:
1. There are no duplicated rows in the table
2. Entries in a column, (attribute or field), are of the same kind.
3. Each cell is single-valued or there are no repeating groups or arrays
Example: The following relation is in First Normal Form.
Table 1Order # | Company | Purchase Agent | Item Number | Price |
121 | Best Windows | Tom Jones | A2351 | $ 165.20 |
122 | AAA Construction | James Smith | A5675 | $ 2,553.60 |
123 | Able Doors | Oscar Sanchez | A6689 | $ 56.00 |
124 | Best Windows | Tom Jones | A12356 | $ 4,568.10 |
125 | Don Home Improvement | Don Whatshisname | A2351 | $ 165.20 |
126 | Able Windows | Oscar Sanchez | A7956 | $ 900.50 |
???What is wrong with this???
There are several problems with this table.
1.
Insertion anomaly If you wanted to insert an order you would have to know or look up all
of the information needed. This leaves room for many errors in the entry
process.
2.
Deletion anomaly If you wanted to delete order # 122
you would lose several pieces of information:
The company
name of
AAA Construction.
Their purchase agent's name for the AAA Construction Company.
The price of item number A5675.
2a.
If you just removed one piece of data, say the item number the order would still
exist but
what was the order for?
If you just removed the company name the order
would still exist but who ordered it?
3.
Update anomaly If you wanted to modify order # 122 to Item
Number A2351 you could lose
item number A5675 and its price.
To avoid these problems you should put your database in Second Normal Form!!!
???But how do we make it Second Normal Form???
In the table above, Order # serves as the primary key.
Did you notice that the Company and Price are dependent on the Order #.
The Purchase Agent is only dependent on the Company.
The Price is dependent on the Item Numbers.
!!! The solution is to make 3 separate tables!!!
Table 2Order # | Company | Price |
121 | Best Windows | $ 165.20 |
122 | AAA Construction | $ 2,553.60 |
123 | Able Doors | $ 56.00 |
124 | Best Windows | $ 4,568.10 |
125 | Don Home Improvement | $ 1,234.00 |
126 | Able Windows | $ 900.50 |
In table 2, all of the information included is unique to each order.
Table 3
Company | Purchase Agent |
Best Windows | Tom Jones |
AAA Construction | James Smith |
Able Doors | Oscar Sanchez |
Don's Home Improvement | Don Whatshisname |
In table 3, the Purchase Agent is dependent on the primary key which is the Company.
Table 4
Item Number | Price |
A2351 | $ 165.20 |
A5675 | $ 2,553.60 |
A6689 | $ 56.00 |
A12356 | $ 4,568.10 |
A7956 | $ 900.50 |
In table 4, the the Price is dependent on the Item Number.
!!!Now we have Second Normal Form!!!
???How do we know it is 2NF???
We started with a table in 1NF, and then we broke it up until all non-key attributes were dependent only on the primary key.
Now lets
look at each table and see how the data is related:
In table 2, all of the information included is unique to
each order.
In table 3, the Purchase Agent is dependent on the primary key which is the Company.
In table 4, the the Price is dependent on the Item Number.
!!!So we do have 2NF!!!
???Why is 2NF good???
By creating three separate tables we eliminated the dependency problem found in table 1.
Lets look at the results:
1. If you wanted to insert an order you would not
have to know or look up all of the information needed.
All you would need
is the Company and the Item Number they wanted.
You
could automatically
get an Order #.
The
Purchase Agent and Price could be easily called up.
2. If you wanted to delete order # 122 you would
only be deleting the Order #, Company, and the Price.
You would still have
the Item Number listing, the Company's Purchase Agent, the Item Number,
and Price.
You would also not have order #122 taking up space because it
would be deleted completely.
3. If you wanted to modify Order # 122 to item
number A2351 you would not lose item number A5675
and its price. You would just
be modifying what they ordered.
???Where can I find out More???
http://www.shsu.edu/~csc_tjm/summer2000/cs334/Chapter05/Chapter5.html
http://databases.about.com/library/glossary/bldef-2nf.htm
http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html
http://www.cs.jcu.edu.au/Subjects/cp1500/1998/Lecture_Notes/normalisation/2nf.html