Sunday, August 18, 2013

Normalization Process

8:10 AM



  • Relations can fall into one or more categories (or classes) called Normal Forms
  • Normal Form: A class of relations free from a certain set of modification anomalies.
  • Normal forms are given names such as:
    • First normal form (1NF)
    • Second normal form (2NF)
    • Third normal form (3NF)
    • Boyce-Codd normal form (BCNF)
    • Fourth normal form (4NF)
    • Fifth normal form (5NF)
    • Domain-Key normal form (DK/NF) 
    •  
  • These forms are cumulative. A relation in Third normal form is also in 2NF and 1NF.
  • The Normalization Process for a given relation consists of:
    1. Specify the Key of the relation
    2. Specify the functional dependencies of the relation.
      Sample data (tuples) for the relation can assist with this step.
    3. Apply the definition of each normal form (starting with 1NF).
    4. If a relation fails to meet the definition of a normal form, change the relation (most often by splitting the relation into two new relations) until it meets the definition.
    5. Re-test the modified/new relations to ensure they meet the definitions of each normal form.
In the next set of notes, each of the normal forms will be defined along with an example of the normalization steps.

First Normal Form (1NF)


  • A relation is in first normal form if it meets the definition of a relation:
    1. Each attribute (column) value must be a single value only.
    2. All values for a given attribute (column ) must be of the same type.
    3. Each attribute (column) name must be unique.
    4. The order of attributes (columns) is insignificant
    5. No two tuples (rows) in a relation can be identical.
    6. The order of the tuples (rows) is insignificant.
  • If you have a key defined for the relation, then you can meet the unique row requirement.
  • Example relation in 1NF: STOCKS (Company, Symbol, Headquarters, Date, Close_Price)
    Company Symbol Headquarters Date Close Price
    Microsoft MSFT Redmond, WA 09/07/2010 23.96
    Microsoft MSFT Redmond, WA 09/08/2010 23.93
    Microsoft MSFT Redmond, WA 09/09/2010 24.01
    Oracle ORCL Redwood Shores, CA 09/07/2010 24.27
    Oracle ORCL Redwood Shores, CA 09/08/2010 24.14
    Oracle ORCL Redwood Shores, CA 09/09/2010 24.33
Note that the key (which consists of the Symbol and the Date) can uniquely determine the Company, headquarters and Close Price of the stock. Here was assume that Symbol must be unique but
Company, Headquarters, Date and Price are not unique


Second Normal Form (2NF)


  • A relation is in second normal form (2NF) if all of its non-key attributes are dependent on all of the key.
  • Relations that have a single attribute for a key are automatically in 2NF.
  • This is one reason why we often use artificial identifiers as keys.
  • In the example below, Close Price is dependent on Company, Date
  • The following example relation is not in 2NF: STOCKS (Company, Symbol, Headquarters, Date, Close_Price)
    Company Symbol Headquarters Date Close Price
    Microsoft MSFT Redmond, WA 09/07/2010 23.96
    Microsoft MSFT Redmond, WA 09/08/2010 23.93
    Microsoft MSFT Redmond, WA 09/09/2010 24.01
    Oracle ORCL Redwood Shores, CA 09/07/2010 24.27
    Oracle ORCL Redwood Shores, CA 09/08/2010 24.14
    Oracle ORCL Redwood Shores, CA 09/09/2010 24.33
  • List the functional dependencies (FD):
           
    FD1: Symbol, Date -> Company, Headquarters, Close Price
    FD2: Symbol -> Company, Headquarters    
  • Consider that Symbol, Date -> Close Price.
    So we might use Symbol, Date as our key.
  • However: Symbol -> Headquarters
    This violates the rule for 2NF. Also, consider the insertion and deletion anomalies.
  • Another name for this is a Partial key dependency. Symbol is only a “part” of the key and it determines a non-key attribute.
  • One Solution: Split this up into two new relations: COMPANY (Company, Symbol, Headquarters)
    STOCK_PRICES (Symbol, Date, Close_Price)
  • At this point we have two new relations in our relational model. The original “STOCKS” relation we started with is removed form the model.
  • Sample data and functional dependencies for the two new relations:
  • COMPANY Relation:
    Company Symbol Headquarters
    Microsoft MSFT Redmond, WA
    Oracle ORCL Redwood Shores, CA
    FD1: Symbol -> Company, Headquarters  
    
  • STOCK_PRICES relation:
    Symbol Date Close Price
    MSFT 09/07/2010 23.96
    MSFT 09/08/2010 23.93
    MSFT 09/09/2010 24.01
    ORCL 09/07/2010 24.27
    ORCL 09/08/2010 24.14
    ORCL 09/09/2010 24.33
    FD1: Symbol, Date -> Close Price  
  • In checking these new relations we can confirm that they meet the definition of 1NF (each one has well defined unique keys) and 2NF (no partial key dependencies).

Third Normal Form (3NF)


  • A relation is in third normal form (3NF) if it is in second normal form and it contains no transitive dependencies.
  • Consider relation R containing attributes A, B and C. R(A, B, C)
  • If A -> B and B -> C then A -> C
  • Transitive Dependency: Three attributes with the above dependencies.
  • Example: At CUNY:
      
    Course_Code -> Course_Number, Section
    Course_Number, Section -> Classroom, Professor  
  • Consider one of the new relations we created in the STOCKS example for 2nd normal form:
    Company Symbol Headquarters
    Microsoft MSFT Redmond, WA
    Oracle ORCL Redwood Shores, CA
  • The functional dependencies we can see are:
    Symbol  ->   Company
    Company -> Headquarters  
    so therefore:         
    Symbol -> Headquarters   
  • This is a transitive dependency.
  • What happens if we remove Oracle?
    We loose information about 2 different facts.
  • The solution again is to split this relation up into two new relations: STOCK_SYMBOLS(Company, Symbol)
    COMPANY_HEADQUARTERS(Company, Headquarters)
  • This gives us the following sample data and FD for the new relations
    Company Symbol
    Microsoft MSFT
    Oracle ORCL
     FD1: Symbol -> Company  

    Company Headquarters

    Microsoft Redmond, WA

    Oracle Redwood Shores, CA
     FD1:  Company ->  Headquarters  
  • Again, each of these new relations should be checked to ensure they meet the definition of 1NF, 2NF and now 3NF. 
  •  
  •  
  • Boyce-Codd Normal Form (BCNF)

  • A relation is in BCNF if every determinant is a candidate key.
  • Recall that not all determinants are keys.
  • Those determinants that are keys we initially call candidate keys.
  • Eventually, we select a single candidate key to be the key for the relation.
  • Consider the following example:
    • Funds consist of one or more Investment Types.
    • Funds are managed by one or more Managers
    • Investment Types can have one more Managers
    • Managers only manage one type of investment.
  • Relation: FUNDS (FundID, InvestmentType, Manager)
    FundID InvestmentType Manager
    99 Common Stock Smith
    99 Municipal Bonds Jones
    33 Common Stock Green
    22 Growth Stocks Brown
    11 Common Stock Smith
         
    FD1:  FundID, InvestmentType -> Manager       
    FD2:  FundID, Manager        -> InvestmentType       
    FD3:  Manager                -> InvestmentType  
  • In this case, the combination FundID and InvestmentType form a candidate key because we can use FundID,InvestmentType to uniquely identify a tuple in the relation.
  • Similarly, the combination FundID and Manager also form a candidate key because we can use FundID, Manager to uniquely identify a tuple.
  • Manager by itself is not a candidate key because we cannot use Manager alone to uniquely identify a tuple in the relation.
  • Is this relation FUNDS(FundID, InvestmentType, Manager) in 1NF, 2NF or 3NF ?
    Given we pick FundID, InvestmentType as the Primary Key: 1NF for sure.
    2NF because all of the non-key attributes (Manager) is dependant on all of the key.
    3NF because there are no transitive dependencies.
  • However consider what happens if we delete the tuple with FundID 22. We loose the fact that Brown manages the InvestmentType “Growth Stocks.”
  • Therefore, while FUNDS relation is in 1NF, 2NF and 3NF, it is in BCNF because not all determinants (Manager in FD3) are candidate keys.
  • The following are steps to normalize a relation into BCNF:
    1. List all of the determinants.
    2. See if each determinant can act as a key (candidate keys).
    3. For any determinant that is not a candidate key, create a new relation from the functional dependency. Retain the determinant in the original relation.
  • For our example:
    FUNDS (FundID, InvestmentType, Manager)
    1. The determinants are:
      FundID, InvestmentType
      FundID, Manager
      Manager
              
    2. Which determinants can act as keys ?
      FundID, InvestmentType YES
      FundID, Manager YES
      Manager NO
              
    3. Create a new relation from the functional dependency: MANAGERS(Manager, InvestmentType)
      FUND_MANAGERS(FundID, Manager)
      In this last step, we have retained the determinant “Manager” in the original relation MANAGERS.
  • Each of the new relations sould be checked to ensure they meet the definitions of 1NF, 2NF, 3NF and BCNF

Fourth Normal Form (4NF)

  • A relation is in fourth normal form if it is in BCNF and it contains no multivalued dependencies.
  • Multivalued Dependency: A type of functional dependency where the determinant can determine more than one value.
  • More formally, there are 3 criteria:
    1. There must be at least 3 attributes in the relation. call them A, B, and C, for example.
    2. Given A, one can determine multiple values of B.
      Given A, one can determine multiple values of C.
    3. B and C are independent of one another.
  • Book example:
    Student has one or more majors.
    Student participates in one or more activities.
    StudentID Major Activities
    100 CIS Baseball
    100 CIS Volleyball
    100 Accounting Baseball
    100 Accounting Volleyball
    200 Marketing Swimming
    FD1: StudentID ->-> Major       
    FD2: StudentID ->-> Activities  
    Portfolio ID Stock Fund Bond Fund
    999 Janus Fund Municipal Bonds
    999 Janus Fund Dreyfus Short-Intermediate Municipal Bond Fund
    999 Scudder Global Fund Municipal Bonds
    999 Scudder Global Fund Dreyfus Short-Intermediate Municipal Bond Fund
    888 Kaufmann Fund T. Rowe Price Emerging Markets Bond Fund
  • A few characteristics:
    1. No regular functional dependencies
    2. All three attributes taken together form the key.
    3. Latter two attributes are independent of one another.
    4. Insertion anomaly: Cannot add a stock fund without adding a bond fund (NULL Value). Must always maintain the combinations to preserve the meaning.
  • Stock Fund and Bond Fund form a multivalued dependency on Portfolio ID.
    PortfolioID   ->->   Stock Fund    
    PortfolioID   ->->   Bond Fund  
  • Resolution: Split into two tables with the common key:
    Portfolio ID Stock Fund
    999 Janus Fund
    999 Scudder Global Fund
    888 Kaufmann Fund
    Portfolio ID Bond Fund
    999 Municipal Bonds
    999 Dreyfus Short-Intermediate Municipal Bond Fund
    888 T. Rowe Price Emerging Markets Bond Fund

Fifth Normal Form (5NF)

  • Also called “Projection Join” Normal form.
  • There are certain conditions under which after decomposing a relation, it cannot be reassembled back into its original form.
  • We don’t consider these issues here.
  •  

Written by

We are Creative Blogger Theme Wavers which provides user friendly, effective and easy to use themes. Each support has free and providing HD support screen casting.

0 comments:

Post a Comment

 

© 2013 CSC. All rights resevered. Designed by Templateism

Back To Top