YHSCS :: Database Programming :: Lessons

Database Programming :: Lessons :: Normalization

Redundancy and Anomalies

Redundancy in a database is when attribute values are repeated unnecessarily in a table. Check out the example below.

Team Name	Stadium Name	Player Name	Annual Salary
Chicago Cubs	Wrigley Field	Kris Bryant	$652,000
Chicago Cubs	Wrigley Field	Jon Lester	$20 million
Chicago White Sox	U.S. Cellular Field	Chris Sale	$9.15 million
Chicago White Sox	U.S. Cellular Field	Todd Frazier	$7.5 million
St. Louis Cardinals	Busch Stadium	Yadier Molina	$15 million

What happens if the White Sox changes the name of their stadium? In this case every tuple involving the White Sox would have to be changed. If we forget to change a tuple this could lead to an update anomaly where information in the database does not match. A deletion anomaly can occur when data is lost after a deletion. If we delete Yadier Molina from the database we would lose the stadium that the St. Louis Cardinals play in.

What if a new team, say the Las Vegas Extra Terrestrials, were added to the database? When the team is created they will not have any players until the expansion draft. In this case, a part of the primary key, the Player Name field, will be null, which isn't allowed. This is known as an insertion anomaly.

You can fix an anomaly by decomposing the relation into two tables like so:

          Team (Team Name, Stadium Name, Player Name, Annual Salary)
          
          Team (Team Name, Stadium Name)
            Chicago Cubs	        Wrigley Field
            Chicago White Sox		U.S. Cellular Field
            St. Louis Cardinals		Busch Stadium
            
          Player (Team Name, Player Name, Annual Salary)
            Chicago Cubs	        Kris Bryant     $652,000
            Chicago Cubs	        Jon Lester      $20 million
            Chicago White Sox		Chris Sale      $9.15 million
            Chicago White Sox		Todd Frazier	$7.5 million
            St. Louis Cardinals		Yadier Molina	$15 million

There is no longer any redundancy in the Team table after decomposing it into two tables. We can insert new teams without leaving a primary key null since the Player Name primary key is now in the Player table. In addition, we can delete players from the Player table without losing the stadium information. Decomposing relations does make queries take longer, but the elimination of anomalies is worth the loss of speed.

Functional Dependency

Functional dependency is when one attribute of a relationship uniquely determines another attribute. Let's use the original Team table from earlier: Team (Team Name, Stadium Name, Player Name, Annual Salary). The following functional dependencies (FD) exist in the table:

Team Name --> Stadium Name
Team Name and Player Name --> Salary
Stadium Name -/-> Player Name

Both Team Name and Player Name determine Salary since a player could move to another team and make a different salary. Stadium Name does not determine Player Name since, again, a player could move to a different team or the team could move to a different stadium.

Functional dependencies are established by the database designer and must hold true for all possible data values. Functional dependencies can be enforced during database insertion if programmed into the database design. Examine the following table: Class (Teacher Name, Class Number, Section, Class, Department, Department Chair)

Teacher Name	Class Number	Section	Class	Department	Department Chair
Derek Miller	605	1	AP Computer Science	Computer Science	Jon Calder
Tim Peters	302	4	Physics	Science	Harry Wolf
Derek Miller	605	2	AP Computer Science	Computer Science	Jon Calder
Seth Hettel	515	1	PLTW Engineering	Engineering Technology	Jon Calder

The functional dependencies of the Teacher table are the following:

Class Number --> Class, Department, Department Chair
Class Number, Section --> Teacher Name
Department --> Department Chair
Section --> Class

We cannot add classes to the table unless they are assigned a class number, which is an insertion anomaly. If a new department chair is hired for the Computer Science department we will have to change a number of tuples, which is an update anomaly. If we remove the Seth Hettel tuple we will lose all of the information connected to him, including the department and department chair.

Normalization

An unnormalized table has repeating groups. Use the following relation as an example where the parenthesis indicate a repeating group.

R (A, B, (c1, c2, c3)*, D, E, F)

FD = { A --> B, D, E, F
A, c1 --> c2, c3 }

For a table to be in first normal form (1NF) all values must be atomic, meaning there should be no repeating groups. The above table can be put into first normal form in the following way:

R1 (A, B, D, E, F)
R2 (A, c1, c2, c3)

The earlier Teacher table is already in 1NF since there are no repeating groups.

A set of attributes (X) is fully dependent on another set of attributes (Y) if X --> Y and Y can't be determined by any subset of X. In the Class table example above, Teacher Name is fully dependent on Class Number and Section while Class, Department and Department Chair are not fully dependent on Class Number and Section since the Class Number alone can determine the Class, Department, and Department Chair.

A prime attribute is an attribute that is part of a key of the table. In the Class table, Class Number and Section are prime attributes while Teacher Name, Class, Department, and Department Chair are non-prime attributes.

A table is in second normal form (2NF) if it is in 1NF and each of its non-prime attributes are fully dependent on the entire primary key. Since the Class Number alone determines the Class, Department, and Department Chair the Class table is not in 2NF. We can normalize the table into the following two tables to put it in 2NF.

          Teacher (Class Number, Section, Teacher Name)
          Class (Class Number, Class, Department, Department Chair)

There are still problems with 2NF. For example, if a department chair is moved to a different department we would have to make multiple changes in the Class table, which is an update anomaly. Also, if a class is removed from the Class table we will lose information about the department, which is a deletion anomaly. Finally, an insertion anomaly occurs if we try to insert information about a new department. We can't add the new department until there are classes to assign to it.

A non-prime attribute is transitively dependent on the primary key of an attribute if there is also a non-prime attribute that functionally determine the original non-prime attribute. For example, Department Chair is transitively dependent on the Class Number since the Class Number determines the Department and the Department determines the Department Chair. A table is in third normal form (3NF) if none of its non-prime attributes are transitively dependent on its key. The following is our example in 3NF:

          Teacher (Class Number, Section, Teacher Name)
          Class (Class Number, Class, Department)
          Department (Department, Department Chair