Lecture notes for Wednesday, 7/2

Terminology for Relational Databases

We started by talking in general terms about relational databases, the way we think about them, and the terminology we use to describe them. I put a simple example table for an employee database on the board to point at.

Note that I've modified this table slightly from what we used in class so that it makes a better example.

First Name Last Name DOB Title
Erik Wennstrom 8/19/1978 lecturer
Lee Smith 8/19/1978 professor
Lee Smith 6/15/1946 lecturer

Tables like this one are the way in which we're used to thinking about data. We have a two-dimensional table with a fixed number of columns, each with its own name, and we can add as many rows to the table as we want. We think of the rows (except the first one) as the actual data, each typically representing some "thing" (in this case an employee) that has various different pieces of information about it.

We refer to the columns in the table as "columns", "fields", or "attributes". (Although we should be careful because "fields" is also sometimes used to refer to individual entries). And we don't usually think of the entire column as a single thing; typically when we say "column" we're just referring to name at the top of the column and what type of information is stored in the column (text, integer, date, etc.) In a relational database, information about the columns is relatively fixed. It's possible to add columns or change them, but that's something that doesn't happen very often: a job for the database maintainer.

The "domain" is the set of all possible values for the data in a particular column. Sometimes this is a huge set, including, for example, "all strings of length at most 30 characters". But sometimes it's quite limited, such as in a case where a column stores the state in which someone lives, which only has 50 possibilities.

The rows are called "rows" or "records", and they represent the actual data. In a relational database, we expect that new rows will be added all the time without much trouble at all; a job for anyone in data entry. (Heck, it's often updated by the users themselves through some sort of web app.)

Multiple Values for a Single Column

We talked about what we would need to do in order to add a "phone number" column to our table. This seems easy: why not just add a column called "phone number"? But a problem arises when you realize that a person could have more than one phone number. Several solutions were suggested.

First, it was suggested that we add several new columns to the database for things like "home phone number", "work phone number", etc. This is sometimes done, but has the disadvantage that it limits the number of phone numbers someone can have. What if someone has more than one job?

Next, it was suggested that we divide the phone number entry up into different "subrows". If we were creating a system for dealing with databases from scratch, this might even work, but in a relational databases like SQL, it's not even a bad idea; it's impossible.

This is related to what relational database people call "first normal form" (abbreviated 1NF). A database is in first normal form if for every record, each attribute has only one value from the domain. In other words, each entry in the table only has one thing in it. This might seem like a pointless definition because in most database systems (including MySQL), it's the only way to do things. It's just not possible to violate the first normal form if you're using MySQL.

Of course, we could've achieved something similar by storing a list of phone numbers in the "phone number" column, and this is possible, but it's a bit cumbersome, and relational databases has a better way.

The most common "bad" solution to this problem wasn't suggested, but it's worth mentioning. We could have created a separate row for each phone number, making sure to keep all the other information the same, resulting in a table that might look something like this:

First Name Last Name DOB Title Phone Number
Erik Wennstrom 8/19/1978 lecturer 555-1234
Lee Smith 8/19/1978 professor 555-8989
Lee Smith 8/19/1978 professor 555-1111
Lee Smith 6/15/1946 lecturer

And this will work. But it has an awful lot of redundancy in it. If you have to change Lee Smith's birth date, then you have to change it in multiple records, and that can introduce extra errors. So the relational database way of doing things is to make two different tables. One table will store things like the employee's name and date of birth, and the other will store the phone numbers. Using the above data, it might look something like this:

Employees
First Name Last Name DOB Title
Erik Wennstrom 8/19/1978 lecturer
Lee Smith 8/19/1978 professor
Lee Smith 6/15/1946 lecturer
Phone Numbers
??? Phone Number
? 555-1111
? 555-1234
? 555-8989

Only we ran into a sticking point here. What should go in that first column? It needed to be something that uniquely identifies a person. We couldn't use just someone's name, even if we used both first and last name because we had people with the same name.

There are two solutions to this problem, and neither one is inherently better than the other. They have their advantages and disadvantages. One technique is to add another column to our employees table including some kind of unique identifier, like an ID number or a computer username. What you're doing here is defining a "primary key" that allows you to quickly reference individual rows in a table. Another technique is to find (or add) a group of columns in the employee table that together can be used to uniquely identify rows in the table. You're still defining a primary key here, but in this case, the primary key consists of several columns. A full name together with a date of birth is sometimes used a primary key as it's very unusual (at least in the USA) to find two people with the same full name and the same date of birth.

Today, we took the first option, and added a column called "Login" to the employee table. It served as our primary key, and we used it to identify which person had which phone number. So our database might look something like this:

Employees
First Name Last Name DOB Title Login
Erik Wennstrom 8/19/1978 lecturer ewennstr
Lee Smith 8/19/1978 professor leesmith
Lee Smith 6/15/1946 lecturer lesmith2
Phone Numbers
Login Phone Number
ewennstr 555-1111
leesmith 555-1234
leesmith 555-8989

It's important to think about not just "primary" keys, which are the keys we actually decide to use to refer to things, but also about "candidate keys" which are things that we could have used as primary keys. So in this case, there are at least two candidate keys. One is the Login, which we used. The other candidate key is the combination of first name, last name, and date-of-birth (unless by some miracle there happens to be two people in the corporation who have the same first and last name and date of birth). In our case, you could technically get away with just last name and date of birth, but that's only because we've only got three employees listed here. In principle that would be a bad idea.

Next, we decided upon a primary key for the phone numbers table. In this case, the only choice we really had is to set a primary key that is both the Login and the phone number. (Because two different people could have the same phone number, we can't use just the phone number by itself as a key.)

Implementing the Example in MySQL

So then we actually went ahead and implemented this* in MySQL. (For those playing along at home: if your MySQL server isn't already running, now would be a good time to use that start script we created yesterday.) First, we connected to MySQL as root using the connect_as_root script that we created yesterday. (so that we could create a new database and set permissions for it).

*Well, technically we implemented a slightly simpler version, but I added a few things and changed a few things for the above notes to make for a clearer example. The notes below are exactly what we did in class. In particular, I left out the part about job titles.
[ewennstr@silo ~]$ cd mysql2
[ewennstr@silo mysql2]$ ./connect_as_root
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.6.19 MySQL Community Server (GPL)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Remember that the name of my MySQL directory is mysql2 (because reasons). Yours is probably mysql.

Now that we were connected to MySQL as root, we created a database called corporation and granted all permissions to our user. (If this were a homework assignment, you would also grant permissions to me and to your AI's, so that we can see your database.)

mysql> create database corporation;
Query OK, 1 row affected (0.01 sec)

mysql> grant all on corporation.* to 'ewennstr'@'silo.soic.indiana.edu';
Query OK, 0 rows affected (0.02 sec)

Remember to use your username, not ewennstr here, although you can feel free to grant me privileges too, if you want. Note that using all when granting privileges to a database doesn't actually grant all possible privileges, but it does grant privileges to add, change, or delete tables or data. The one important thing it doesn't grant is the ability to grant privileges to a third person. So any time you want to grant privileges (or create entire new databases), you need to be connected as root.

So then we exited MySQL and connected as ourselves. (Yes, we could have done all this as root, but you won't always have root access, so you should get used to connecting as a user.

mysql> exit
Bye
[ewennstr@silo mysql2]$ ./connect_as_ewennstr
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.6.19 MySQL Community Server (GPL)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

We loaded up the database we just created and created our first table.

mysql> use corporation;
Database changed
mysql> create table employees (login char(8) primary key, firstname char(30), 
    -> lastname char(50), dob date);
Query OK, 0 rows affected (0.08 sec)

mysql> show tables;
+-----------------------+
| Tables_in_corporation |
+-----------------------+
| employees             |
+-----------------------+
1 row in set (0.00 sec)

We specified four different columns for this table. login, firstname, and lastname are all character strings. Since we knew that our usernames would always be 8 characters long, we only gave enough room for 8 characters. We were more generous with the firstname andlastname. The date-of-birth column dob was assigned type date. This is a built-in MySQL datatype, and it has to be in the particular form YYYY-MM-DD, so when you enter in dates, remember to use dashes (not slashes), and remember to use two-digit months and days and four-digit years.

The primary key instruction tells MySQL that we intend to use the column login as a primary key. MySQL will now give us an error message if we try to create two rows with the same login.

Speaking of creating rows, the next thing we did was to add some data to the table.

mysql> insert into employees (firstname, lastname, login, dob) values
    -> ('Erik', 'Wennstrom', 'ewennstr', '1978-08-19'),
    -> ('Lee', 'Smith', 'leesmith', '1985-7-22'),
    -> ('Lee', 'Smith', 'lesmith1', '1946-6-15')
    -> ;
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

Note that we don't have to insert rows with the columns in the exact same order as we created them. We don't even have to enter in data for every single column. As long as we specify which columns we'll be entering the data in for and in which order, we're good to go. (If we leave out a column, then the entry for each row for that column will just be empty.)

Then we took a look at the data we'd just entered. To do that, we ran the simplest possible query, which just selects everything (*) from the table.

mysql> select * from employees;
+----------+-----------+-----------+------------+
| login    | firstname | lastname  | dob        |
+----------+-----------+-----------+------------+
| ewennstr | Erik      | Wennstrom | 1978-08-19 |
| leesmith | Lee       | Smith     | 1985-07-22 |
| lesmith1 | Lee       | Smith     | 1946-06-15 |
+----------+-----------+-----------+------------+
3 rows in set (0.00 sec)

Next, we created our phone number table.

mysql> create table phonenumbers (login char(8), phonenumber char(20));
Query OK, 0 rows affected (0.03 sec)

Only there was a problem. I forgot to set a primary key. I couldn't remember how to set a primary key after the fact, so I just got rid of the table entirely and started over again.

mysql> drop phonenumbers;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that  
corresponds to your MySQL server version for the right syntax to use near 'pho
nenumbers' at line 1
mysql> drop table phonenumbers;
Query OK, 0 rows affected (0.01 sec)

mysql> create table phonenumbers (login char(8), phonenumber char(20), 
    ->primary  key (login, phonenumber));
Query OK, 0 rows affected (0.02 sec)

I knew there was a command for changing the primary key of an existing table, but I couldn't remember what it was. So I looked it up after class. Here's what I could have typed to add the primary key:

alter table phonenumbers add primary key (login, phonenumber);

Then we added some data to the phone numbers table.

mysql> insert into phonenumbers (login, phonenumber) values
    -> ('leesmith', '555-1234'),
    -> ('ewennstr', '555-1111'),
    -> ('ewennstr', '555-8989');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> show tables;
+-----------------------+
| Tables_in_corporation |
+-----------------------+
| employees             |
| phonenumbers          |
+-----------------------+
2 rows in set (0.00 sec)

mysql> select * from phonenumbers;
+----------+-------------+
| login    | phonenumber |
+----------+-------------+
| ewennstr | 555-1111    |
| ewennstr | 555-8989    |
| leesmith | 555-1234    |
+----------+-------------+
3 rows in set (0.00 sec)

By the way, if you want to see information about a table (what columns there are, what type of data they contain, which ones are primary keys, etc.) instead of the data in the table, use the command describe tablename.