Spring Semester 2007


Lecture Notes Seven: Principles of Database Design
This week we will explore HTTP. There's not much to explore, things are pretty simple.

Next week we start working with MySQL. In that respect we need to clarify how information is to be structured.

MySQL is an RDBMS. RDBMS?

It's a Relational Database Management System. So our databases will be relational?

Yes, they will be composed of tables (tables are relations). We need to know how one usually structures information for storage in such a database.

There are some mathematical principles behind it. Let's review them through an example.

OK. Here's our situation. This is a hypothetical situation.
Premiere Products is a distributor of appliances, housewares, and sporting goods. Since its inception, the company has used spreadsheet software to maintain customer, order, inventory and sales representative (sales rep) data. Management has determined that the company's recent growth means it is no longer feasible to use spreadsheets to maintain its data. What has led the managers at Premiere Products to this decision? One of the company's spreadsheets, shown below, displays sample order data and illustrates the company's problems with the spreadsheet approach: Redundancy is one problem with the orders spreadsheet.

Redundancy wastes space. It also makes your changes more cumbersome, and that may lead to inconsistencies.


So the management determines spreadsheet software must be replaced. The new system must have the following specifications.
The following information about sales reps, customers and parts inventory: Premiere Products ,ust also store information about orders:

The sample order has three components:


If this is what the order looks like, what do we store? Basically everything on it, but here's a better rundown on those items.
Premiere Products must also store the following items for each customer's order:

Now, to keep a long story short, here's how the data might be organized. This is just a very crude, intuitive approach.

Note that we want to shortcut through 30 years of theory. So we work this out, give you only the highlights.

So we could start as follows. Basically, organize our operation around what we sell.
Part ( PartNum, 
       Description, 
       OnHand, 
       Class, 
       Warehouse, 
       Price, 
       (OrderNum, OrderDate, CustomerNum, CustomerName, RepNum, RepName, NumOrdered, QuotedPrice)
)

A few things are missing. Commission rate, commission thus far, etc.

They can be easily added a bit later. What's the significance of the parentheses?

They group together the fields of a table. So that means we have a table for each line in Part?

Yes. That's called an unnormalized relation. How do we normalize?

First, eliminate repeating groups. But then we'd be so very extremely redundant!

Sure, but that's just the start. OK. That puts us in the First Normal Forms (no repeating groups).

We then need to determine functional dependencies. Better give me an example instead of (or along with) a definition.

A relation (table) that contains a repeating group (or multiple entries for a single record) is called an unnormalized relation. We knew that, already.

Removal of repeating groups is the starting point in the quest to create tables that are as free of problems as possible. Tables without repeating groups are said to be in the first normal form. Reaching first normal form is an ugly stage in our case.

I agree, because of the horrendous redundancy that results. So the next step is to determine functional dependencies.

Tell me once again what those are. Let me just list them here in this case.
PartNum determines:

There are some conventions in here, some implicit. Like the fact that a part is stored in a warehouse and only one.

Yes, but those are not entirely hard to see with the naked eye. Correct. Let's list the other functional dependencies.
OrderNum determines:
So functional dependencies are almost like cause-effect links?

Something like that: a column (attribute) B is functionally dependent on another column A (or possibly a collection of columns) if each value for A in the database is associated with exactly one value of B. If you know the value of A that determines the value for B.

So that's what we have above: if you know the part number you know what the part looks like, how much you still have on hand, in what warehouse you will find it, what the catalogue price is and so on. Notice that in the definition B can be functionally dependent on a collection of columns.

We have this situation here. Interesting.
(PartNum, OrderNum) determines
Yes, that's a logic consequence of what an order is. I see. What do we do once we determine the functional dependencies?

We need a few more definitions. Let's list them along with the action we need to take.
A column is a nonkey column (also called a non-key attribute) if it is not a part of the primary key.

What's the primary key in our case? It's composite:
(PartNum, OrderNum)

There's something weird about it, I agree. Well, that's what we want to point out now.
A table is in second normal form (2NF) if it is in first normal form and no nonkey column is dependent on only a portion of the primary key.

So, eliminating these partial dependencies is what we need to do now. Precisely. What are some problems partial dependencies can create?

They're called update anomalies and they fall into four categories: Eliinate them and we have a 2NF database.

Here's what I think we need to do. Create two extra tables.
Part: (PartNum, OnHand, Class, Warehouse)

Order: (OrderNum, OrderDate, CustomerNum, CustomerName, RepNum, RepName)

PartOrder: (PartNum, OrderNum, NumOrdered, QuotedPrice)

Exactly. That removes partial dependencies.

But now we have another issue. What?

We need to make sure that all determinants are candidate keys. That would bring us in the third normal form.

Can we characterize this situation a bit better? Certainly. We need one preliminary definition, though.
Any column (or collection of columns) that determines another column is called a determinant.

A table is in third normal form (3NF) if it is in second normal form and the only determinants are candidate keys.


I see what we need to do: create a separate Customer table. Indeed, CustomerNum is a determinant, non candidate key.

What do we get? We're still not done!
Part: (PartNum, OnHand, Class, Warehouse)

Order: (OrderNum, OrderDate, CustomerNum)

Customer: (CustomerNum, CustomerName, RepNum, RepName)

PartOrder: (PartNum, OrderNum, NumOrdered, QuotedPrice)


A-ha! The RepNum is now a non-candidate key determinant. Make a Rep table!

Here we go. That's more like it.
Part: (PartNum, OnHand, Class, Warehouse)

Order: (OrderNum, OrderDate, CustomerNum)

Customer: (CustomerNum, CustomerName, RepNum)

Rep: (RepNum, RepName)

PartOrder: (PartNum, OrderNum, NumOrdered, QuotedPrice)


We're now done. Yes, here's what the whole thing eventually boils down to.

First what's left of Part:

Next, what was generated through successive splitting of the tables:

Notice that the last two tables came out of

Ah, so this contains some data too. Yes. Now try to answer the following questions.
  1. List the number and name of all sales reps.

    SELECT RepNum, LastName, FirstName
    FROM Rep;

  2. List the complete Customer table.

    SELECT * 
    FROM Customer

  3. List the number and name of every customer represented by sales rep 20.

    SELECT CustomerNum, CustomerName
    FROM Customer
    WHERE RepNum = '20';

  4. List the number and name of all customers that are represented by sales rep 20 and have credit limits of $7,500.

    SELECT CustomerNum, CustomerName
    FROM Customer
    WHERE RepNum = '20'
      AND CreditLimit = 7500;

  5. List the number and name of all customers that are represented by sales rep 20 or have credit limits of $7,500.

    SELECT CustomerNum, CustomerName
    FROM Customer
    WHERE RepNum = '20' 
       OR CreditLimit = 7500;

  6. For each order, list the order number, order date, the number of the customer that placed the order, and the name of the customer that placed the order.

    SELECT OrderNum, OrderDate, Orders.CustomerNum, CustomerName
    FROM Orders, Customer
    WHERE Orders.CustomerNum = Customer.CustomerNum;

  7. List the number and name of all customers represented by Valerie Kaiser.

    SELECT CustomerNum, CustomerName
    FROM Customer, Rep 
    WHERE Customer.RepNum = Rep.RepNum 
      AND LastName = 'Kaiser'
      AND FirstName = 'Valerie';

  8. How many customers have a credit limit of $7,500?

    SELECT COUNT(*) 
    FROM Customer 
    WHERE CreditLimit = 7500;

  9. Find the total of the balances for all customers represented by sales rep 65.

    SELECT SUM(Balance)
    FROM Customer
    WHERE RepNum = '65';

  10. Give the part number, description, and on-hand value
    OnHand * Price
    for each part in item class AP.

    SELECT PartNum, Description, OnHand * Price
    FROM Part 
    WHERE Class = 'AP'; 

  11. List all columns and all rows in the Part table. Sort the results by part description.

    SELECT * 
    FROM Part
    ORDER BY Description;

  12. List all columns and all rows in the Part table. Sort the results by part number within item class.

    SELECT *
    FROM Part
    ORDER BY Class, PartNum;

  13. List the item class and the sum of the value of parts on hand. Group the results by item class.

    SELECT Class, SUM(OnHand)
    FROM Part
    GROUP BY Class;

  14. Create a new table named Appliance to contain the columns PartNum, Description, OnHand, Warehouse, and Price for all rows in which the item class is AP.

    SELECT PartNum, Description, OnHand, Warehouse, Price
    INTO Appliance
    FROM Part
    WHERE Class='AP';

  15. In the Appliance table, change the description of part KL62 to "Electric Dryer".

    UPDATE Appliance
    SET Description='Electric Dryer'
    WHERE PartNum='KL62';

  16. In the Applicance table, delete every row in which the warehouse number is 3.

    DELETE 
    FROM Appliance
    WHERE Warehouse='3'
The basic idea is this:
  • try to find the answer by hand
  • then check your answer using SQL
We can't use SQL yet.

That's true: we first need to install MySQL. Once we do that, we need to define a database and populate it with data.

Well, all of that: next week. I can hardly wait.

Updated by Adrian German for A348/A548