Spring Semester 2007


Lecture Notes Eight: Principles of Database Design. More Examples.
Usually somebody has (or anticipates having) a significant amount of data. They then want to be able to access it on a daily (or perhaps more often) basis, efficiently: to update it, or to extract information from the existing data. The first step (always) is to imagine yourself keeping track of this information using just paper and pencil and assuming you have infinite precision and endurance. You then design a database.

Designing a database is a fairly well established process by now. We are going to illustrate an information design methodology through a couple of examples, very much like we did last time. The information-level design methodology we promote involves representing individual user views, refining them to eliminate any problems, and then merging them into a cumulative design.

After you have represented and merged all user views you can complete the cumulative design for the entire database. Let's see some examples.
Ray Henry, the owner of a bookstore chain named Henry Books, gathers and organizes information about branches, publishers, authors and books. Each branch has a number that uniquely identifies the branch. In addition, Ray tracks the branch's name, location and number of employees. Each publisher has a code that uniquely identifies the publisher. In addition, Ray tracks the publisher's name and city. The only user for the Book database is Ray, but you don't want to treat the entire project as a single user view. Ray has provided you with all the reports the system must produce, and you will treat each report as a user view. Ray has given you the following requirements:

To transform each user view into DBDL (database design language), examine the requirements and create the necessary entities, keys, and relationships. Let's take them one by one. User View 1: For each publisher, list the publisher code, publisher name, and the city in which the publisher is located. The only entity in this user view Publisher, as follows:
Publisher (PublisherCode, PublisherName, City)

This table is in third normal form; the primary key is PublisherCode. There are no alternate or foreign keys. Assume that Ray wants to be able to access a publisher rapidly on the basis of its name. You'll need to specify the PublisherName column as a secondary key.

Because this is the first user view, there is no previous cumulative design. Thus, at this point, the new cumulative design will consist only of the design for this user view, as shown below. There is no need for an E-R diagram at this point.
Publisher (PublisherCode, PublisherName, City)
     SK    PublisherName

User View 2: For each branch, list the number, name, location, and number of employees. The only entity in this user view is Branch, and it is written as follows: The table is in third normal form. The primary key is BranchNum, and there are no alternate or foreign keys.
Branch (BranchNum, BranchName, BranchLocation, NumEmployees)

Ray wants to be able to access a branch rapidly on the basis of its name, so you;ll make the BranchName column a secondary key. Because there is no table in the cumulative design with the BranchNum column as its primary key, you can add the Branch table to the cumulative design during the merge step, as shown below. Again, there is no need for an E-R diagram with this simple design.
Publisher (PublisherCode, PublisherName, City)
      SK   PublisherName

Branch (BranchNum, BranchName, BranchLocation, NumEmployees)
      SK   BranchName

User 3 View: For each book list its code, title, publisher code and name, price and whether it is paperback. To satisfy this user requirement, you'll need to create entities for publishers and books and establish a one-to-many relationship between them. This leads to the following:
Publisher (PublisherCode, PublisherName)
Book (BookCode, Title, PublisherCode, Price, Paperback) 

The PublisherCode column in the Book table is a foreign key identifying the publisher. Merging these tables with the ones you already created does not add any new columns to the Publisher table, but it does add columns to the Book table. The result of merging the Book table with the cumulative design is shown below. Assuming that Ray will need to access books based on their titles, you'll designate the Title column as a secondary key.

User View 4: For each book, list its code, title, price, and type. In addition, list the book's authors and their names. If a book has more than one author, all names must appear in the order in which they are listed on the book's cover. There are two entities in the user view for books and authors. The relationship between them is many-to-many (one author can write many books and one book can have many authors). Creating tables for each entity and the relationship between them gives:
Author (AuthorNum, AuthorLast, AuthorFirst)
Book (BookCode, Title, Price) 
Wrote (BookCode, AuthorNum, Sequence)

The Author and Wrote tables are new; merging the Book table adds nothing new. Because it may be important to find an author based on the author's last name, the AuthorLast column is a secondary key. The result of the merge step is shown below.

User View 5: For each branch, list its number and name. In addition, list the code and title of each book currently in the branch as well as the number of copies the branch has available. Suppose you decide that the only entity mentioned in this requirement was for information about branches. You would create the following table:
Branch (

You would then add the BranchNum column as the primary key, producing the following The other columns include the branch name as well as the book code, book title, and number of units on hand.
Branch (BranchNum, 

Because a branch will have several books, the last three columns will form a repeating group. Thus, you have the following:
Branch (BranchNum, BranchName, (BookCode, Title, OnHand))

You convert this table to first normal form by removing the repating group and expanding the primary key. This gives:
Branch (BranchNum, BranchName, BookCode, Title, OnHand) 
In this table, you have the following functional dependencies:
BranchNum ---> BranchName
BookCode ---> Title
BranchNum, BookCode ---> OnHand 

The table is not in second normal form because some columns depend on just a portion of the primary key. Converting to second normal form gives: You can name the new table Inventory because it represents each branch's inventory. In the Inventory table, the BranchNum column is a foreign key that identifies the Branch table, and the BookCode column is a foreign key that identifies the Book table.
Branch (BranchNum, BranchName) 
Book (BookCode, Title)
Inventory (BranchNum, BookCode, OnHand) 

In other words, in order for a row to exist in the Inventory table, both the branch number and the book code must already be in the database. You can merge this Branch table with the existing Branch table without adding any new columns or relationships to the database. After adding the Inventory table to the existing cumulative design, you have the design shown below.

How would the design for this user view have turned out if you began with two entities, Branch and Book, instead of just the single entity Branch? In the first step, you would create the following tables:
Branch (
Book (

Adding the primary keys would give:
Branch (BranchNum,
Book (BookCode, 
Adding the other columns would give:
Branch (BranchNum, BranchName) 
Book (BookCode, Title) 

Finally, you have to implement the relationship between the Branch and Book tables. Because a branch can have many books and a book can be in stock at many branches, the relationship is many to many. To implement a many to many relationship, you add a new table whose primary key is the combination of the primary keys of the other tables. Doing this, you produce the following:
Branch (BranchNum, BranchName) 
Book (BookCode, Title)
Inventory (BranchNum, BookCode) 

Finally, you add any column that depends on both the BranchNum and PublisherCode columns to the Inventory table, giving: Thus, you end up with exactly the same collection of tables, which illustrates a point made earlier: There's more than one way of arriving at the correct result.
Branch (BranchNum, BranchName)
Book (BookCode, Title)
Inventory (BranchNum, BookCode, OnHand)

User View 6: For each book, list its code and title. In addition, for each branch currently having the book in stock, list the number and name of each branch along with the number of copies available. This user view leads to precisely the same set of tables that were created for User View 5.

You have satisfied all the requirements, and the design shown in the preceding picture represents the complete information-level design. Here's the database, with data, and an exercise for you: can you phrase the following queries in English?

SELECT AuthorNum, AuthorLast
FROM Author;

SELECT * FROM Branch;
SELECT PublisherName FROM Publisher WHERE City='Boston';
SELECT PublisherName FROM Publisher WHERE NOT City='Boston';
SELECT BranchName FROM Branch WHERE NumEmployees>=9;
SELECT BookCode, Title FROM Book WHERE Type='SFI';
SELECT BookCode, Title FROM Book WHERE Type='SFI' AND Paperback; or SELECT BookCode, Title FROM Book WHERE Type='SFI' AND Paperback=Yes; or SELECT BookCode, Title FROM Book WHERE Type='SFI' AND Paperback=True;
SELECT BookCode, Title FROM Book WHERE Type='SFI' OR PublisherCode='PE';
SELECT BookCode, Title, Price FROM Book WHERE Price BETWEEN 5 and 10;
SELECT BookCode, Title FROM Book WHERE Type='FIC' AND Price<10;
SELECT BookCode, Title, .85 * Price AS DiscountedPrice FROM Book;
SELECT BookCode, Title FROM Book WHERE Type IN ('SFI', 'HOR', 'ART');
SELECT BookCode, Title, PublisherCode FROM Book ORDER BY PublisherCode, Title;
SELECT COUNT(*) FROM BOOK WHERE Type='SFI';
SELECT AVG(Price) FROM Book;
SELECT BookCode, Title, Book.PublisherCode, PublisherName FROM Book, Publisher WHERE Book.PublisherCode=Publisher.PublisherCode;
SELECT Title, Price FROM Book, Publisher WHERE Book.PublisherCode=Publisher.PublisherCode AND PublisherName='Taunton Press';
SELECT Title, BookCode FROM Book, Publisher WHERE Book.PublisherCode=Publisher.PublisherCode AND PublisherName='Putnam Publishing Group' AND Price>15;
SELECT BookCode, Title, PublisherCode, Price INTO Fiction FROM Book WHERE Type='FIC';
UPDATE Fiction SET Price=14.50 WHERE Price=14.00;
DELETE FROM Fiction WHERE PublisherCode='VB';

Updated by Adrian German for A348/A548