CSCI A114 / INFO I111
Final Review

First semester 2001-2002


In which we review for the A114/I111 Final Exam.

In this review we are going to focus on SQL. SQL (Structured Query Language) is a widely used tool for accessing data stored in relational database systems. We go through a large number of examples to expose the specifics of this (often ornery, by many accounts) language, step by step.

The SQL revolves around the relational model for databases (proposed by E.F. Codd and his colleagues at IBM in the 1970's). In this model the choice of a table structure with simple rows and columns provides superb clarity, and simplicity for what was once called "the arcana of storage management, pointers and hierarchies".

Introducing SQL

One of the most common uses of computers is to store and retrieve information. A collection of information is called a database, and so software called a database management system (DBMS) can be used to perform this task. A typical DBMS allows its users to store, modify and access data in an organized, efficient way. In the past accessing a database required the services of a programmer, so you had to first find one, then convince her/him to write programs for you to access the database.

Allowing users direct access on an ad hoc basis required giving them a language in which to express their requests. An instance of access to a database is often called a query, and so that language is called a query language. Many query languages exist, each developed for a particular DBMS. The one presented here is officially called Structured Query Language and its acronym is SQL. It is a standard query language supported by a great many database management systems. It does two things:

  1. eliminates the need for a programmer, because
  2. it makes a programmer out of the casual user (allowing direct access).

Kinds of Database Systems

Database management systems can be characterized by the way they model data as seen by their users. Relational systems represent data to their users as simply the contents of one or more tables, and nothing more. One reason for the popularity of relational systems is their ability to easily support access either by computer programs or directly by human beings.

SQL was developed as a query language for relational database systems. Its primary intent is to allow easy access to data stored in relational systems. Virtually all relational systems allow their users access to stored data through SQL. The wide availability of SQL in relational systems allows fast answers to an assortment of very specific questions about data stored in a database.

Relational Concepts

The relational model was first put forth by E. F. Codd, a researcher in IBM's San Jose laboratories, in a paper published in 1970. Along with the easy access it allows to people, a major reason for this dominance is the simple but powerful view of the data which relational systems offer to their users.

A relational database consists of a collection of tables. All data in the database is stored in one of these tables. Each table is a simple two-dimensional structure, made up of some number of rows (also called records) and columns. Each column in a table is assigned a unique name and contains a particular type of data such as characters and numbers.

Each record (we will use this term throughout instead of row) contains a value for each of the table's columns. The intersection of the columns with each row defines the fields in those records. Stored in the table, as values for the various fields in the records, is the actual data. Note that the order of the records in a table is not significant. It is not meaningful to ask if a particular record occurs before or after another in a table.

A very simple database might contain only one table, but most databases will contain several tables. It is a design problem to determine the tables that best model the problem you're building a database for. Our approach (in A114 and I111) relies on determining the entities (with their features) first, then the relationships between them, and drawing an E-R diagram, to produce the tables out of that.

Most queries against a database may require accessing more than one table. This is not a problem, since a single SQL query can retrieve information from one or more tables within a single database. Other databases may also exist on the same system, each comprising a set of tables appropriate to their specific purpose (modelling the problem for which the database has been built), but that's not our business (at least not directly, not now).

Keys

If the records in a table are not ordered, how are we to locate specific ones? More importantly, how can we distinguish one record from another? The answer lies in specifying one or more columns in each table which define the key for that table's records. The information in the key field or fields must be unique for each record. (That's also the key feature of that entity.)

For some tables, none of the information to be stored may provide an appropriate key. In these cases, it is necessary to define a column containing some kind of unique identification (typically a number) to serve as the key. For other tables, the contents of two or more columns taken together may provide the values necessary to uniquely identify each record. In tables like this, a record's key consists of its values in each of those columns.

Keys are not always required for accessing information. Depending on the question asked, it may or may not be necessary to refer to each record's key value. In spite of this, the ability to uniquely identify each record in a table is an integral part of the relational model for handling data, and every table must always define a key for its records, period.

Correctly selecting keys for the tables in a database is not an easy task. In fact, choosing keys is only one aspect of the much larger problem of designing a relational database. The designer, who may also be the database administrator (DBA) responsible for the system, must first determine how the data should be organized into tables. The DBA (or whoever the designer is) must determine the number of columns in each table, what the names and legal contents of those columns should be, and many other specifics about the database. Here's our recipe, good enough for all we might be dealing with.

Luckily, accessing the data through SQL doesn't require any specialized knowledge about how a relational database should be defined. All we need to understand is how it looks to us.

(So we skirt the design issues from now on, and assume the database is given. Let's review its design briefly. We start with one of its tables, first. OK, let's get going. )

An Example Table

Looking at Homework Assignments 5 (five) and 6 (six) we see a table of TEACHERS.

The structure of this table is as follows: four columns, with the first as the key.

Every record in the table has four fields:

  1. teacher#
  2. teacher_name
  3. phone
  4. salary
In the sample database for this assignment we have 6 rows in this table, each record describing a particular teacher). The objective of SQL is to allow you to work with the information stored in a relational database's tables. Using SQL you can create new tables, destroy existing ones, and modify a table's records. Most importantly, though, you can ask questions about the information stored in the tables, and it is this sort of questions which make up the bulk of SQL queries.

Example SQL Queries

Suppose the question is:

"What are the names, phone numbers of teachers earning more than $30,000 yearly?"
In SQL that looks as follows:

SELECT teacher_name, phone
FROM TEACHERS
WHERE salary >= 30000; 
We first determine what we want to look at, where it's coming from, and build the query using the Query Construction Method (QCM): identify the data, data source, any restrictions, then the format.

For many queries, information must be retrieved from two or more tables at once. SQL can be used to express both very simple queries and very complex ones. After describing the language's basic elements we step through a sequence of examples. That's your Homework Assignment 5 and 6 (for A114 and I111). Starting with simple queries we move step by step to increasingly powerful operations. Ultimately (and hopefully) the structure of SQL will be made clear to all.

Elements of the Language

SQL allows its users to work with data stored on a computer in a relational database system. Most languages for human-computer communication are rigid and formal, and SQL is not exception. We begin our discussion by looking at the fundamental components which comprise this useful language.

Ways to Use SQL

SQL was designed with interactive access in mind. Its various commands are intended to be directly typed into computer terminals or personal computers. The results of those commands are then displayed on the display screen for the user to examine.

If we look at the history and evolution of access to database systems, it went like this: starting with access by computer programs only, database systems evolved to allow direct acces by people. Query languages like SQL are an essential part of this direct access.

The need for access to data by computer programs did not go away with the arrival of SQL, however. Embedded SQL exists to make this possible. As originally defined, SQL wasn't really usable from inside a computer program. Extensions to the language have been made, however, which allow programmers to embed SQL statements in their programs.

We use such extensions in A348 and A548 by accessing a MySQL database with

  1. Perl (using the Perl DBI.pm module,)
  2. PHP (which has native SQL support) and with
  3. JDBC (from Java).
Embedded or interactive SQL is the same as a language, though. So the descriptions that follow should be of use to both A348 and A114 in understanding how SQL is structured.

Versions of SQL

Natural languages have dialects, and so does SQL. Vendors of SQL have made small (or not so small) modifications and additions to the language originally invented by IBM. This slightly chaotic situation is not necessarily a bad thing. The only aspect you need to keep in mind is that if the examples that follow don't quite work on your system they may have to be slightly adjusted.

But the adjustments will be absolutely minor.

So if some of the examples don't quite work for you exactly as specified here don't despair; check your local documentation. (In Access, for example, you can build the query with the Visual QBE Interface, and check to see what SQL gets produced.)

"While SQL is the lingua franca for relational database access, it is not yet the Esperanto."

Defining the Language

As with any language, natural or computer, SQL is made up of several parts.

Data Types

Relational databases store data. Every element of that stored data must be of some type. In every table, each column (and thus every field in every record) stores only data of a certain type. SQL precisely defines the types of data which can be stored in its tables.

Here are examples of types:

Note though that these keyword types vary from system to system. But you get the idea.

One other kind of information can be stored in a record's fields: the value NULL. NULL doesn't really belong to any type, but instead represents the lack of a value. Any field of any type can have the value NULL assigned to it (unless NULLs are explicitly barred from that field when it is created).

Please note that NULL is not the same thing as

NULL is simply NULL, a unique "value" indicating the absence of a normal value in a particular field.

The Structure of SQL Statements

SQL's full name, Structured Query Language, is a slight misnomer. It implies that only queries are supported, and that all an SQL user can do is ask questions about information stored in a database's tables. In fact, SQL allows its user to create tables, destroy tables, grant and revoke access to those tables, and to do several other tasks required for full use of a relational database. Although queries are its most common use, SQL is really much more than just a query language.

To allow access to the information stored in a relational database, SQL defines a number of operations, sometimes called the verbs of the language. Just as speakers of English rely on English verbs to indicate actions, SQL users rely on its verbs to indicate the actions to be performed on a database. Each SQL verb has a specific meaning and serves a specific purpose.

Each SQL statement, whether entered by a human being at a keyboard or embedded within some computer program, begins with an SQL verb. Following this comes further information specifying exactly what the verb should do. If the verb requests information about a table, for instance, the name of the table must appear. At the end of each statement appears a termination character. Different SQL systems define different termination characters, but the most common is the semicolon (;).

For now the important thing is to note this basic form of SQL statements:

  1. verb
  2. information specifying what the verb should do
  3. and the termination character
As an example, the statement to list the names of all teachers in the TEACHERS table is
SELECT teacher_name
FROM TEACHERS;
The verb is SELECT, the termination character is semicolon, and everything in between is a specification of what the verb should do. All SQL statements follow this basic structure.

SQL is free-format. In other words, blanks and the line on which various parts of a statement appear are not significant. The example above could have been written as

SELECT teacher_name FROM TEACHERS; 
and the results would have been the same. Also, although conformance to the SQL standard requires support only of upper case letters, most implementations of SQL treat upper and lower case letters as equivalent. In most products which support SQL, therefore, typing the above examples in all lower case letters would have produced equivalent results.

Keywords

Like many computer languages, SQL reserves certain words for its own use. These words, called reserved words or keywords, may not be used as the name of the table, as a name for one of a table's columns, or for anything else. They always have only the meaning assigned to them by SQL.

In the examples that follow we will be writing the keywords in upper case. Names of tables are also in upper case. Column names, however, are always in lower case. Since SQL is not usually case sensitive, these distinctions serve only to make the language more readable, and have no effect on the operation of the queries.

An Example Database

As described earlier, a relational database is just a collection of tables. Each table has some number of columns, and contains rows (or records) with the actual information. Each SQL query accesses the data stored in some particular database.

Throughout the set of examples that follow, we will use an example database with five tables.

This database contains the registration information for a fictitious (and very small) university. Each table contains a related set of information, described by that table's column names. Each column contains information of a certain type, drawn from the list of possible types described earlier.

We will give a complete list of the records stored in each table below. Note that in the diagram above some columns' titles are in color (blue or red). They are keys, that's why we marked them.

The table in the database with the most columns is called STUDENTS. It contains information on all students currently enrolled at this university. Its columns are:

Because it contains a unique value for each record the student# column is designated as the key for this table (STUDENTS). The table contains 12 records, see below (download here).
student#,student_name,address,zip,city,state,sex
148,Susan Powell,534 East River Dr.,19041,Haverford,PA,F
210,Bob Dawson,120 South Jefferson,02891,Newport,RI,M
298,Howard Mansfield,290 Wynkoop Drive,22180,Vienna,VA,M
348,Susan Pugh,534 East Hampton Dr.,06107,Hartford,CT,F
349,Joe Adams,473 Emmerson Street,19702,Newark,DE,M
354,Janet Ladd,441 10th Street,18073,Pennsburg,PA,F
410,Bill Jones,120 South Harrison,92660,Newport,CA,M
473,Larry Bird,983 Park Avenue,02169,Boston,MA,M
548,Allen Thomas,238 West Ox Road,60624,Chicago,IL,M
558,Val Shipp,238 Westport Road,60556,Chicago,IL,F
649,John Anderson,473 Emmory Street,10008,New York,NY,M
654,Janet Thomas,441 6th Street,16510,Erie,PA,F
Another table in the database is called TEACHERS. This table describes the teachers currently active at this university. Its columns are: Again, because of the column's unique values, teacher# is the key for this table. The TEACHERS table contains 6 records (as can be seen below, and downloaded from here.)

teacher#,teacher_name,phone,salary
180,Cooke,257-8088,$29560.00
290,Lowe,257-2390,$31450.00
303,Horn,256-3049,$27540.00
430,Engle,257-4621,$38200.00
560,Olsen,257-8086,$31778.00
784,Scango,257-3046,$32098.00
A third member of the database is the COURSES table. COURSES lists information on the courses offerred by the university this term and has the following columns: You can find this kind of information in your Schedule of Classes.

Here's the sample data we use:

course#,course_name,department,num_credits
A114,Databases,Computer Science,3
A201,English Composition,English,3
H480,Western Civilization,History,3
M119,Calculus IV,Math,4
You can download this table from here.

Because course numbers are unique, the course# column serves as this table's key.

Another of the database's tables is called SECTIONS. Many university courses are divided into two or more sections, and this table describes each section of each course. (You can find these sections in INSITE). It connects TEACHERS and COURSES. This table's columns are as listed below:

Here's the data:
section#,teacher#,course#,num_students
1200,290,M119,20
1251,430,A201,15
1333,303,A114,10
1335,560,A114,10
1601,180,H480,15
1602,784,H480,10
And here's where you can download it from.

(Colored columns are keys. Two of them are foreign keys and the third one could be a key by itself here at IU but in general may not, for example when we simply count the sections in a course

1, 2, 3, ...,
for each course in the same way, starting at 1 for each course).

The final table in this example database is the ENROLLS table. It contains a record with each student's grade for each section of every course in which that student is enrolled. It relates the information from STUDENTS to the one in COURSES. Here is the contents of the table:

course#,section#,student#,grade
A201,1251,298,3.00
A201,1251,349,4.00
A201,1251,548,2.00
A114,1333,473,2.00
A114,1333,654,4.00
A114,1335,210,3.00
A114,1335,548,
H480,1601,358,4.00
H480,1601,410,2.00
H480,1601,649,4.00
H480,1602,298,3.00
H480,1602,473,0.00
M119,1200,148,3.00
M119,1200,210,1.00
M119,1200,348,2.00
M119,1200,473,3.00
M119,1200,558,3.00
M119,1200,649,4.00
Here's where you can download this from. The structure of this table is: Once again no single column contains enough information to act as the key for this table. So the combination of three colored columns will be (collectively) used as the key for this table. So that's the database we will be working with, and now let's get going.

SQL Examples

By far the most used of SQL's verbs is SELECT. Since the most common operation performed on a database is to examine its data, it should not be surprising that the SQL statement which does this is the workhorse of the language. In any real database, of course, other SQL statements would have to be used first to create the database's tables and fill them with records. We assume your tables have been created, and populated. So we begin with SELECT (and that's also the only verb we'll cover).

Selecting Specific Columns in a Table

To see the values of certain columns for all of a table's records, you must give both the names of those columns and the name of the table. The general form is

SELECT <column names>
FROM   <table>;
where <column names> is replaced by the names of the desired columns, separated by commas, and <table> is replaced by the name of the table which contains those columns. Throughout these examples words flanked by < and > represent generalities, and are to be replaced by specifics in actual SQL queries. Generalities are, simply said, syntactic categories. I am sure you apreciate that.

Examples of such queries:

1. List the names of all courses, their department, and the number of credits for each.

2. List the names, home towns, and home states of all students.

Selecting All Columns in a Table

It is often useful to see the value of every field for every record in a table. One way of doing this is by listing the names of every column in that table, similar to the examples above. Because this is such a frequent operation, however, SQL provides a shorthand way to list all values in a table.

Instead of listing the column names you can type an asterisk. The general form is:

SELECT *
FROM   <table>;
Here are some examples:
3. List all values in the COURSES table.

4. List all values in the TEACHERS table.

Selecting Only Some of a Table's Records: The WHERE Clause

So far, every SELECT statement has returned at least one value (field) for every record in the table. What if we wish to see values only for records that meet some specific criteria? To do this, we must use the SELECT statement's WHERE clause. The WHERE clause lets us specify a predicate, something that is either true or false about each record in the table. Only those records for which the predicate is true will be listed in the results.

The general form of a SELECT using WHERE is

SELECT <column names>
FROM   <table>
WHERE  <predicate>;
As before, <column names> and <table> are replaced by appropriate column and table names. <predicate> can be replaced by a number of different things, depending on exactly what restrictions you wish to place on the results.

Comparisons in a WHERE Clause

Probably the most common predicates are those which compare values. You may wish, for instance, to see only the records in the COURSES table for three credit courses, or only those in the TEACHERS table for teachers with salaries greater than $30,000 a year. For these types of queries, the general form is

SELECT <column names>
FROM <table>
WHERE <column name> <operator> <value>
As before, <column names> and <table> represent the names of the desired columns and the name of the table from which they should be drawn. The WHERE clause is more complex, however. The first item, <column name>, must name a particular column in the table. This column may or may not be among those listed in <column names>.

After the <column name> but before the <value> comes an <operator>. The possible operators are:

= (equals)
<> (not equal, also written as != sometimes)
< (less than)
<= (less than or equal)
> (greater than)
>= (greater than or equal)
The predicate ends with a <value>. The exact form of this value varies depending on the type of the named column. For columns of numeric type, a numeric value is simply placed after the operator. If the column is CHAR however, the character(s) comprising the value must be enclosed in single quotes.

Although SQL is usually not sensitive to the difference between upper and lower case letters, case does make a difference for comparisons using quoted character values. For example, the character strings 'Hello' and 'HELLO' are not considered equal to one another. Also, some strings are longer than others. According to the SQL standard, comparing two strings of unequal length conceptually adds blanks to the end of the shorter string, then performs the comparison. And like numeric values you can ask whether one character string is less than or greater than another. The comparison is performed based on the collating sequence (ASCII code) used by your system. In general, this collating sequence will result in a normal alphabetic comparison, with 'A' less than 'B', 'B' less than 'C', and so on. (Whether digits are less than letters varies, though. On large IBM systems, digits are greater than letters, while on nearly all others digits are less than letters).

In some cases, the <value> may also be a <column name>. If the <column name> identifies a column in <table> the values in the two columns are compared for each record; only those records in which the two values satisfy the condition (e.g., =) are returned. Alternatively, the <column name> may identify a column in a different table. In this case, records from both tables may be examined and retrieved. Retrieving data from more than one table at a time is called a join and is discussed later.

Here are some examples for you to practice with:

5. List the course name, department, and number of credits for all 3-credit courses.

6. List the names and salaries of teachers earning more than $30,000.

7. List all the male students.

Combining Predicates in a WHERE Clause

It is often useful to combine two or more predicates in a single WHERE clause. You may want, for example, to identify all students who both live in Chicago and are women. Doing this requires using SQL's AND and OR operators. AND and OR, together with a third operator called NOT are known as boolean operators.

Using AND

SQL uses the word AND in much the same way as does English. Two predicates may be combined with AND, and the entire predicate is true only if both of its parts are also true. If desired, either or both of the predicates may be enclosed in parentheses. We will soon see examples where parentheses are required, but for now they are optional.

Examples of such queries:

8. List the names and addresses of all female students from Chicago.

9. Which Math courses have 3 or more credits?

Using OR

OR is used in a similar way to AND. Once again, two predicates may be combined, and the value of both is used in determining which records are returned from a table. With AND, both predicates had to be true for a particular record for it to be selected. With OR, on the other hand, a record is selected and appears in the results of the query if either of the predicates is true, or if both are true.

AND and OR can both be used in WHERE clause. In fact, arbitrarily complex expressions can be created using various combinations of AND and OR. If both AND and OR appear in the same WHERE clause, all of ANDs are evaluated first, followed by all of the ORs. There is one exception to this rule: anything enclosed in parentheses will be executed first. In some cases, then, you may be required to use parentheses to correctly express what you mean.

For example, to see the names of all male students who live in either Connecticut or New York, you could type

SELECT student_name
FROM   students 
WHERE (state = 'CT' OR state = 'NY') AND sex = 'M';
If the parentheses were omitted, the statement would become
SELECT student_name
FROM   students 
WHERE state = 'CT' OR state = 'NY' AND sex = 'M';
It now has a quite different meaning. Since SQL first evaluates all ANDs, it is as if the query were written
SELECT student_name
FROM   students 
WHERE state = 'CT' OR (state = 'NY' AND sex = 'M');
This will select the records of all male students from New York, as well as those of all students, both male and female, from Connecticut. The moral is this: use parentheses whenever you combine AND and OR in a WHERE clause. They never hurt, and they sometimes can prevent you from inadvertently making the wrong request.

Examples of using OR in a WHERE clause for you to practice with:

10. List the name, sex, city, and state for all students who are either from Connecticut or from a city called Erie.

11. List the names, cities, states, and zip codes for all students whose zip codes are between 20000 and 29999 or who live in a city called Erie.

Using NOT

SQL's final boolean operator is NOT. Unlike AND and OR, NOT isn't used to combine conditions in a WHERE clause. Instead, it negates a specified condition. For example, to see the names and home states of all students who are not from Illinois, you could type

SELECT student_name, state
FROM   STUDENTS
WHERE  NOT (state = 'IL'); 
Inserting the word NOT simply reverses the condition, returning all records in which the state field does not contain the value 'IL'. Once again, the parentheses around state = 'IL' are not strictly required, but nevertheless a good idea.

Here's now an example:

12. List the names and states of all students who are not from Illinois.
Well, that was easy!

Other Uses of WHERE

Predicates using a WHERE clause can do more than just make simple comparisons. In this section, we'll look at other ways to select only certain records from a table.

BETWEEN and NOT BETWEEN

Using >=, <=, and AND, it is possible to select records where values of a specific field in the record fall within a specified range. In an earlier example, for instance, we listed the name and other information for all students whose zip codes ranged from 20000 to 29999. This kind of query arises quite often, and so SQL provides a shorthand way of expressing it. By using BETWEEN and NOT BETWEEN in a WHERE clause, we can directly specify ranges of values. The general form is

SELECT <column names>
FROM   <table>
WHERE  <column name> BETWEEN <value1> AND <value2>
The negative form is also available
SELECT <column names>
FROM   <table>
WHERE  <column name> NOT BETWEEN <value1> AND <value2>
In the first case, only those records in which the values in <column name> are greater than or equal to <value1> and less than or equal to <value2> will be selected. In the second, only those records whose values are not within the specified range will be selected.

In both forms, the two values are assumed to be specified in ascending order, i.e., <value1> is less than <value2>.For example, to see the names and salaries of all teachers who earn between $30,000 and $35,000 a year, you could type

SELECT teacher_name, salary
FROM   TEACHERS
WHERE  salary BETWEEN 30000 and 35000; 
Only those records from TEACHERS with salaries in the specified range wil be returned.

The values specified in the WHERE clause can be of any type, but <value1> and <value2> must both be of a type comparable to that defined for <column name>. For example the query

SELECT student_name
FROM   STUDENTS
WHERE  student_name BETWEEN 0 and 100;
is not legal, because student_name is a CHAR column, while the values 0 and 100 are numeric. Now, some examples for you to practice with:
13. List names and salaries of teachers who make between $30,000 and $35,000/year.

14. List the names, cities, states and zip codes of those students whose zip codes are not between 20000 and 29999.

LIKE and NOT LIKE

When searching for information, we often already know part of what we're looking for. We may, for instance, know someone's last name but not their first name, or we may know only that their last name begins with "Jo". For situations like these, SQL provides LIKE and NOT LIKE. The general form of a SELECT using LIKE is

SELECT <column names>
FROM   <table>
WHERE  <column name> LIKE <value>;
The form with NOT LIKE is similar.

SELECT <column names>
FROM   <table>
WHERE  <column name> NOT LIKE <value>;
LIKE and NOT LIKE may only be used with character columns. <column name>, therefore, must identify a CHAR column, and <value> must be a character string enclosed in single quotes.

In the simplest case, LIKE and NOT LIKE function like = and <>.

So, for example, the statement

SELECT teacher#
FROM   TEACHERS
WHERE  teacher_name LIKE 'Engle';
is equivalent to saying
SELECT teacher#
FROM   TEACHERS
WHERE  teacher_name = 'Engle';
The power of these two operators is based on defining two special wild card characters: a percent sign (%) and an underscore (_). When contained within the quoted character string value, a percent sign matches any string of characters, of any length. An underscore, on the other hand, matches any single character. Using these two wild cards, a wide variety of searches can be performed.

The WHERE clause in the example above would also have returned information about Dr. Engle's record if it had read

WHERE teacher_name LIKE '%'
In this case, however, it would have returned information about other records in the TEACHERS table, too. To further limit the records returned, we could have said
WHERE teacher name LIKE '%e'
This would have returned records for all teachers with a name that ends in 'e' (recall that the difference between upper and lower case letters is significant inside quoted strings).

We could have also said

WHERE teacher_name LIKE 'E___e'
This would return information about teachers whose last names begin with 'E', end with 'e', and have three letters (any three letters) in between.

Since LIKE returns records with fields that match a specified string (possibly with embedded wild card characters), NOT LIKE does just the opposite: it returns all records where the specified field does not match the string. For example, the SQL statement

SELECT teacher#
FROM   TEACHERS
WHERE  teacher_name NOT LIKE 'Engle';
would return the teacher number for every teacher except Dr. Engle.

Examples for you to work on:

15. List the name and sex of every student whose name begins with "Jo".

16. List the names and home states of all students from states that begin with a 'C'.

17. List names, phone numbers of teachers whose phone #'s do not begin with 257.

IN and NOT IN

Another possible way to restrict which records are selected is to use IN and NOT IN. Like the other mechanisms discussed so far, they appear in the WHERE clause, and only records that match their requirements are returned. The general form of an SQL statement using IN is

SELECT <column names>
FROM   <table>
WHERE  <column name> IN (<values>);
The form with NOT IN is
SELECT <column names>
FROM   <table>
WHERE  <column name> NOT IN (<values>);
In both cases <values> consists of one or more values, separated by commas and enclosed in parentheses, all of a type comparable to that of the column identified by <column name>.

In other words, if <column name> is a CHAR column, the values must all be quoted character strings. If <column name> identifies a numeric column, such as INTEGER or SMALLINT, then the values must also be numbers.

For IN, the records returned are those where the field named <column name> contains one of the values specified in the parenthesized list. For NOT IN, the reverse is true: the records returned are those whose <column name> field does not contain any of the values in the list.

Using IN is equivalent to grouping several tests for equality together with ORs. This is yet another example of a shorthand form provided by SQL for common requests. For example the statement

SELECT course_name, department
FROM COURSES
WHERE department in ('Math', 'English'); 
will return all records where the department field contains This is equivalent to saying
SELECT course_name, department
FROM COURSES
WHERE department = 'Math' OR
      department = 'English'; 
Similarly, NOT IN is equivalent to grouping tests for inequality together with ANDs.

The statement

SELECT course_name, department
FROM COURSES
WHERE department NOT IN ('Math', 'English'); 
is equivalent to
SELECT course_name, department
FROM COURSES
WHERE department <> 'Math'
  AND department <> 'English'; 

Some examples for you to work with:

18. List names and departments of all Math and English courses.

19. List names, cities, states of students not from CA or IL.

IS NULL and IS NOT NULL

In the last chapter we described the special value NULL.

Unlike CHAR or numeric values, which can only be assigned to CHAR or numeric fields, respectively, the value NULL can be assigned to any field, regardless of its type.

NULL is different from other values as well in that it can't be used in simple comparisons.

A WHERE clause using the = operator to ask whether some field is NULL is undefined; it is neither true nor false. Instead, the special construction IS NULL and IS NOT NULL must be used.

The general form of a SELECT statement with IS NULL is

SELECT <column names>
FROM <table>
WHERE <column name> IS NULL
For one using IS NOT NULL, the form is
SELECT <column names>
FROM <table>
WHERE <column name> IS NOT NULL
Like all WHERE clauses, these two select only records that meet their requirements. A query with IS NULL will return only thise records in which the field identified by <column name> contains the value NULL, while those with IS NOT NULL will return only those in which <column name> contains a value other than NULL.

The IS NULL construction is the only way to search for records containing the value NULL.

BETWEEN, LIKE, and the other WHERE clause options can never find NULL values.

Examples for you to work with:

20. List entire contents of records in ENROLLS with no grade recorded.

21. List names, cities, states, zip codes for students whose zip codes we know.

More on SELECT

So far, we have seen several ways to use the SELECT statement to retrieve all or part of a table's data. In this chapter, we will see how that data can be ordered, grouped, or otherwise manipulated while it is being retrieved.

Ordering Selected Data

Recall that the order of records within a table has no significance to SQL.

When records are retrieved from a table, therefore, the order in which they appear is arbitrary. Sometimes, we wish to see the data resulting from a query ordered in a specific way. To allow this, SQL provides the ORDER BY option for the SELECT statement.

The general for of a SELECT statement using ORDER BY is

SELECT <column names>
FROM <table>
<WHERE clause>
ORDER BY <column names>
As before, <column names> after SELECT and <table> indicate the columns which should be selected from the named table. A <WHERE clause> may optionally appear, specifying conditions which records must meet to be selected. Most importantly for our purposes, the <column names> following ORDER BY indicate the column or columns whose values should be used to order the selected records. For example the statement
SELECT teacher_name, salary
FROM TEACHERS 
ORDER BY teacher_name; 
will produce an alphabetical list of all teachers. Note that the columns specified after ORDER BY must be drawn from those following SELECT in the query.

Results may also be ordered by more than one column. Saying

SELECT student_name, city, state
FROM STUDENTS
ORDER BY state, student_name; 
will yield a list of students ordered first by state, then alphabetically by name within that state.

One more possibility: instead of giving the name of a column by which to order selected records, it is possible to give a column's number. Column numbers refer to columns in the results of the query, not in the original table, and go from left to right, beginning with 1 (one).

This means, for instance, that the first example shown above could have also been expressed as

SELECT teacher_name, salary
FROM TEACHERS
ORDER BY 1;
The results would have been exactly the same, because teacher_name is the first column from the left in the results of this query.

SQL also allows records to be listed in reverse order.

Suppose, for example, that we wished for some reason to see a list of teachers and their salaries in reverse alphabetical order. We could use the DESC option with ORDER BY:

SELECT teacher_name, salary
FROM TEACHERS
ORDER BY teacher_name DESC;
It's hard to see why it would be useful, but it is also legal to replace DESC with ASC. This causes the results to be listed in ascending order, just as if neither ASC or DESC were specified.

Some examples for you to work on, to practice:

22. Give an alphabetical list of teachers and their phone numbers.

23. Give an alphabetical list of teachers and their phone numbers (again?!...).

24. Give a list of teachers, phone numbers in reverse alphabetical order.

25. List names, cities, and states of all students ordered first by state, then by city within that state.

26. List all three credit courses alphabetically by department.

Doing Arithmetic With Selected Information

Retrieving the information stored in a table is undoubtedly useful. Sometimes, though, the stored information is not exactly what we're interested in.

Suppose, for example, that we wish to know how much each teacher's salary would be if we gave everyone a flat $1,000 raise. Or, perhaps we'd like to compare this with what the new salaries would be after an across-the-board 10 percent increase.

For these kinds of queries, SQL provides the four basic arithmetic operations of addition, subtraction, multiplication, and division. (While addition and subtraction are represented in queries by the familiar + and - symbols, multiplication and division use the symbols * and / respectively.)

Within a query, for instance, we can add a fixed amount to every value in a column, or multiply each of a column's values by some amount. It is even possible to add two columns together or subtract the values of one column in a record from those of another.

To see what each teacher's salary would be after a flat $1,000 raise, you could type

SELECT teacher_name, new_salary = salary + 1000
FROM TEACHERS; 
Because the second column of the results will be derived from one of the table's original columns, we give it a new name by adding
new_salary =
in front of the arithmetic expression. In the results table, that column now will be labelled with our new name. If you had not specified this new name, the name assigned to the newly created column would depend on your particular implementation of SQL.

Similarly, to see what the teachers' salaries would be after a 10 percent raise, the query is

SELECT teacher_name, new_salary = salary * 1.10 
FROM TEACHERS; 
As usual, a WHERE clause can be added to the query to restrict which records are selected. To see, for example, the results of giving a 10 percent raise only to those teachers earning less than $30,000, one could type
SELECT teacher_name, new_salary = salary * 1.10
FROM TEACHERS
WHERE salary > 30000;
As might be expected, these arithmetic operations can only be used on columns containing numeric values (in case you were wondering).

Also, the four operations can be combined to form arbitrarily complex expressions. When two or more operations are combined in a single query, SQL defines rules for the precedence of each. In general, all occurrences of * and / (multiplication and division) are performed first, followed by all occurrences of + and - (addition and subtraction). Operations of equal precedence are carried out left to right.

What about NULL values? Recall from the chapter on NULL values that a NULL in a numeric field isn't the same as a 0 (zero), but is instead a value unique to itself. How then are NULLs handled during arithmetic operations?

The rule turns out to be somewhat extreme, but very simple: any arithmetic operation applied to a column containing a NULL value in one of its fields will return the value NULL. Check to see if the rule is respected or not by Access: What does Access do when you're adding values some of which are NULL? (Also what does Access do when you're taking the AVG of several values, some of which are NULL? What does this mean? )

Example, to practice with these notions:

27. What would each teacher's salary be if he or she received a 5.5 percent cost of living increase and a $1,500 merit increase?
Using Aggregates

SQL provides a number of built-in functions for performing common operations. These functions, called "set functions" by the SQL standard but more often referred to as aggregates, make it easy for us to find the maximum or minimum value in a column, find the sum or average of a column's values, and count the number of records in a table. While all of these things would be possible without them, SQL's aggregates make the task much simpler.

Finding Maximum and Minimum Values

It is often interesting to ask about the limits of data values stored in a table.

For answering questions like these, SQL provides the MAX and MIN aggregates.

Both MAX and MIN can be used like the name of a column in a SELECT.

You could type, for example,

SELECT MAX(salary)
FROM TEACHERS; 
or
SELECT MIN(grade)
FROM ENROLLS; 
The first query would select and display the maximum value from the salary column of the TEACHERS table, while the second would display the minimum value from the grade column of the ENROLLS table. The name of the query's result defaults to, for example, MAX(salary). A different name can be assigned, if desired, through the mechanism seen earlier, e.g.,
SELECT maximum = MAX(salary) 
FROM TEACHERS; 
(It's worth noting that both the default name assigned to the query's result and the exact syntax used to change that default name may vary from one implementation of SQL to another).

It is also possible to use a WHERE clause to select the maximum or minimum value from a group of records meeting some specified criteria. We could, for example, say

SELECT MAX(salary) 
FROM TEACHERS
WHERE teacher# > 300;
This query selects the largest salary only from those records in the TEACHERS table that have a teacher# greater than 300 (whatever that may signify).

What if, instead of the query using MAX above, you wanted to know not only what the maximum teacher salary was, but also who earned it? You might have naively typed

SELECT teacher_name, MAX(salary)
FROM TEACHERS; 
Unfortunately, you would have received an error message from SQL, as the above query is illegal. It's illegal because, in simple cases such as this, SQL requires that a query return the same number of values for every column name specified. Thus far, this has not been a problem, since each table has the same number of values in each of its columns.

When using MAX and MIN, however, the situation changes. As mentioned above, MAX and MIN function somewhat like column names, and yet by their nature, both always return only a single value. Because of SQL's rule that queries must always return the same number of values for each specified column name, no other column names (except for those appearing within other aggregates) can be specified along with MAX and MIN. As we shall see, all SQL aggregates share this same limitation. (There is one significant exception: using GROUP BY, described at the end of this chapter, allows other column names to appear in a SELECT along with MAX, MIN, and other aggregates).

Although they probably make the most sense when used with numeric columns, MAX and MIN can also be used with character columns. With character columns it's as if all the values in the column are first placed in alphabetical order, then the last (for MAX) or the first (for MIN) value is returned.

As was the case with arithmetic operations, we must once more ask: what about NULLs? Unlike the arithmetic operators, all of which return NULL if any of a column's values are NULL, MAX and MIN ignore NULL values. This makes some sense, since it's hard to see how NULL could be compared to other values in a particular column.

All aggregates, including MAX and MIN, share some other restrictions on their use. For one thing, aggregates cannot be nested, i.e., the value passed to an aggregate can't include some other aggregate. Also, aggregates cannot appear in a WHERE clause. To find the name of the teacher with the highest salary, for example, you can't just type

SELECT teacher_name
FROM TEACHERS
WHERE salary = MAX(salary); 
This query is illegal. Instead, specifying conditions including aggregates requires using either GROUP BY and the HAVING clause, described at the end of the chapter, or, more likely, a subquery, described at the end of this document.

Examples for you to work with:

28. What is the largest salary paid to a teacher?

29. What is the lowest grade earned by a student?

Totalling Columns

Another useful piece of information to extract from a table is the total of all values in a particular column. The president of this university, for example, might wish to know the total of all teacher salaries. To allow this, SQL provides the SUM aggregate. As its name suggests, SUM returns the sum of a particular column. To answer this question, the president could type

SELECT SUM(salary)
FROM TEACHERS; 
The result would be a single value, the total of all teacher salaries.

It is also possible to determine the sum of a column's values only for records meeting some specific criteria. For example, to see the total salaries for all teachers earning over $30,000, the query is

SELECT SUM)(salary)
FROM TEACHERS
WHERE salary > 30000; 
Like MAX and MIN, SUM cannot be used exactly like an ordinary column name in a query. Because it returns only a single value, it may only be used by itself or with other aggregates that also return a single value, such as MAX and MIN. Also like MAX and MIN, SUM ignores any NULL values it encounters in the specified column.

A couple of examples for you to work with:

30. What is the total of all teacher salaries?

31. What is the total salary for all teachers earning over $30,000?

Finding Averages

One more useful piece of information is the average of a column's values.

To answer questions like this, SQL provides the AVG aggregate. To learn the average teacher's salary, for instance, one could type

SELECT AVG(salary)
FROM TEACHERS; 
The result of this query is the average of all values in the salary column of the TEACHERS table. Like MAX, MIN, and SUM, AVG returns only a single value, and so cannot be mixed with regular column names in a query (again, unless GROUP BY is used, as described later).

And unsurprisingly, AVG also ignores any NULL values it encounters in the specified column.

Again, a WHERE clause can be used to limit which records have their values included in the average. To find the average salary for all teachers earning over $30,000, one could type

SEKECT AVG(salary)
FROM TEACHERS
WHERE salary > 30000; 
A few problems for you to think about these things and practice:
32. What is the average of all teacher salaries?

Counting Records

Yet another facility SQL gives its users is the ability to count the number of records in a table which meet some specified criteria. This is accomplished with the COUNT aggregate. COUNT appears where the column names would normally go in a SELECT statement and is followed either by one or more column names or by just an asterisk. Both the column names and the asterisk must be enclosed in parentheses. To learn how many students are enrolled, one could type either

SELECT COUNT(student_name)
FROM STUDENTS; 
or just
SELECT COUNT(*)
FROM STUDENTS; 
Since student_name appears in every record in the table, both queries probably would produce the same results. If any of the values for student_name were NULL, however, the two queries almost certainly would not produce equivalent results: (One other note: according to the SQL standard, the first query shown above is illegal. The standard insists that whenever a column name is used instead of an asterisk, that column name must be preceded by the word DISTINCT, whose use is described in the next section. Many implementations of SQL do not enforce this requirement, however.)

As with other aggregates, the records to be counted can be restricted using a WHERE clause.

Using our same example of teachers, we could learn how many of them earn over $30,000 by typing

SELECT COUNT(*)
FROM TEACHERS
WHERE salary > 30000; 
Like MAX, MIN, and AVG, COUNT returns a single value, and therefore can't be mixed with regular column names in a query (except with GROUP BY, as described shortly). And unlike AVG and SUM, but like MAX and MIN, COUNT can be used on both character and numeric columns.

Here, now, are two questions for you to practice:

33. How many students are there?

34. How many teachers earn over $30,000?

Using DISTINCT

It's often the case that several records in a table will contain the same value in one or more of their fields. Sometimes, it can be useful to form queries which select each of those duplicate values only once. We may, for instance, wish to know from how many different states the students in this university come. For answering questions like this, SQL provides the DISTINCT option.

DISTINCT can be used in two somewhat different ways. When used with COUNT and the other aggregates, the word DISTINCT can appear before the column name, like

SELECT COUNT(DISTINCT state)
FROM STUDENTS; 
Adding DISTINCT before state tells SQL that each time the same value appears more than once in that column, only one of those values should be counted. This query, therefore, will return the number of different student home states. If desired, DISTINCT can also be used with SUM, AVG, and other parts of the language.

Using DISTINCT on a column where no two records have the same value is legal, but not very useful. One could, for instance type

SELECT COUNT(DISTINCT student#)
FROM   STUDENTS;
but because no two values for student# are the same, results would be just the same as those from
SELECT COUNT(student#)
FROM   STUDENTS;
(As mentioned earlier, the SQL standard requires DISTINCT before a column name used with the COUNT aggregate, but many implementations don't enforce this).

DISTINCT can also be used on queries which do not use aggregates.

In this case, the word DISTINCT appears immediately after the word SELECT but before the list of column names. Because we will often select the values from only a subset of the columns in a table, our results may contain duplicate records even though all complete records in the original table were unique. DISTINCT can be used to list only one occurrence of any duplicates which would otherwise appear. We show this in the next example.

Suppose, for instance, that we wished to list alphabetically the different states from which students come. We could do this with

SELECT   state
FROM     STUDENTS
ORDER BY state; 
Run this query to see the results.

If the DISTINCT option is used, however, like

SELECT DISTINCT  state
FROM             STUDENTS
ORDER BY         state; 
duplicates are eliminated from the results, yielding a list of unique values.

Examples, to practice with:

35. How many different states do students come from?

Grouping Selected Data

With all the flexibility described so far for retrieving data from a table, one major function has not yet been discussed: using GROUP BY to organize the selected data into groups.

When used with aggregates, GROUP BY controls the level at which the aggregates are computed. For example, with GROUP BY we could find the maximum grade in each section of each course, or the average number of students taught by each teacher.

The general form of a SELECT statement using GROUP BY is

SELECT   <column names>
FROM     <table>
<WHERE   clause>
GROUP BY <column names>
ORDER BY <column names>;
As always, the first <column names> and <table> indicate the columns which should be selected from the named table. A <WHERE clause> may optionally appear, specifying restrictions which individual records must meet to be selected. Next appears GROUP BY and its <column names>, indicating which column or columns contain the values to be used in forming the groups. Optionally, the SELECT statement may end with an ORDER BY followed by one or more <column names> by whose values the results should be ordered. The column names following both GROUP BY and ORDER BY must be drawn from those specified immediately after SELECT.

Consider, for example, the information contained in the ENROLLS table. For each class in which a student is currently enrolled, this table contains a record indicating that student's student#, the class's course# and section#, and the grade that student has received. Several students have more than one record in this table, since they are enrolled in more than one class.

Suppose we wish to know the number of students in each course. One way to learn this would be to simply list the entire table, ordered by course number, then actually count the number of records for each course. Using GROUP BY and the COUNT aggregate, however, we can let SQL do the counting for us. For example, the query

SELECT   course#, num_enrolled = COUNT(*)
FROM     ENROLLS 
GROUP BY course#; 
would return two columns of information: the first a list of course numbers, the second, entitled num_enrolled, a count of the number of students enrolled in that course. This query is (conceptually) processed as follows:
  1. first the chosen fields from each record in ENROLLS are placed in ascending order according to course#. (Try to imagine how this looks, to be able to comprehend the whole process).

  2. Next the COUNT(*) operation is applied to each group of records which contain the same value for course#. All the records with a course# value of A114 make up the first group, while all those with a course# of A201 would make up the second group, and so on.

  3. Finally, the results of the query are displayed, with each course number listed only once, followed by a count of the number of students in each course.

Those results are

course# num_enrolled
--------------------
 A114       4
 A201       3
 H480       5
 M119       6
Recall that earlier we said that aggregates like COUNT always returned only a single value, and so could only be used in limited ways. When using GROUP BY, such operations return a single value for each group, and so can be pressed into service in a more versatile way. GROUP BY places restrictions on exactly how we can use column names in a query, however. In particular, the column names following SELECT must either be part of aggregates, such as COUNT or AVG, or they must appear as one of the column names in the GROUP BY clause.

For instance, suppose we wished to learn both the number of courses in which a student was enrolled and his or her average grade. A query which accomplishes this is

SELECT   student#, AVG(grade), courses = COUNT(*)
FROM     ENROLLS
GROUP BY student#; 
As before, the ENROLLS table is first ordered (again, conceptually) by student#. Next, the two operations specified, AVG and COUNT, are applied to each group of records with the same value for student#. Finally, the results are displayed in three columns, labelled

Examples, to think about, and work on:

36. List the number of students enrolled in each course.

36.1 List the maximum number of students that can enroll in each course.

37. List the number of courses taken by each student.

38. List the average grade and number of courses taken by each student.

39. Assuming a fee of $450.00 per course, determine each student's total tuition bill.

40. List the average grade and number of students for each section of each course.

More on Grouping: the HAVING Clause

Using GROUP BY is in some ways like doing several queries simultaneously on a single table. As with most queries, though, we would like some way to restrict the values of records which are included in the query. Up to now, this has always been done with a WHERE clause. Using WHERE clauses with GROUP BY can be tricky, however. Remember that WHERE clauses specify criteria which individual records must meet to be selected by the query. For example you could say:

SELECT student#, tuition_costs = COUNT(*) * 450.00 
FROM ENROLLS 
WHERE grade = 2.0 
GROUP BY student#;
to learn what each student's tuition bill was only for those classes in which they received a C (2.0) or better. If we want to place restrictions on the groups themselves, however, we must use the HAVING clause. A HAVING clause works like a WHERE clause, but it is applied to groups rather than to individual records. Now, only those groups which meet the criteria specified in the HAVING clause will be included in the query's results.

The general form of a SELECT statement using GROUP BY and HAVING is

SELECT   <column names>
FROM     <table>
<WHERE   clause>
GROUP BY <column names>
HAVING   <predicate>
ORDER BY <column names>
As before, the first occurrence of <column names> and <table> indicate the columns which should be selected from the named table. The <WHERE clause> may optionally appear (placing restrictions only on values of individual records selected for the groups), followed by GROUP BY and one or more <column names>, indicating the column or columns to be used in forming the groups. Next comes the word HAVING and a <predicate>, specifying the conditions which the groups themselves must meet. And once again, the SELECT may optionally end with an ORDER BY followed by one or more <column names>.

For example, to see the average grade and number of courses taken only by those students with an average grade greater than 2.5, one could type

SELECT   student#, AVG(grade), courses = COUNT(*)
FROM     ENROLLS
GROUP BY student#
HAVING   AVG(grade) > 2.50;
A WHERE clause could not be used in this case, since the restriction on average grades of students applies only to each group, not to each record. Similarly, to see the average grade and number of students for only those sections with more than four students, the query would be
SELECT   section#, AVG(grade), students = COUNT(*), 
FROM     ENROLLS
GROUP BY section#
HAVING   COUNT(*) > 4; 
Again, a WHERE clause could not be used, since the restriction of having more than four students applies to each group, not to any of the individual records within a group. In both this and the previous example, the groups appearing in the results of each query are restricted to those which meet the requirements specified in the HAVING clause.

A HAVING clause may also be used without GROUP BY. In this case, it places a restriction on the entire result of the query. With no GROUP BY clause specified, those results are treated as one big group. Using HAVING without GROUP BY might seem useless, and in most cases it is. When used with certain kinds of subqueries, however, as described in the chapter on subqueries below, HAVING clauses can be quite useful on their own.

Some examples to think about:

41. List average grade and number of courses taken by students with an average grade of more than 2.5

42. List the average grade and number of students for each section with more than four students.

Retrieving Data From Several Tables: Joins

Every query we have seen so far retrieves data from only a single table. If all of a database's information were contained in one large table, these type of queries would be all that was required. Practically, though, storing all the information in just one table would require maintaining several duplicate copies of the same values. In real systems, therefore, all but the very simplest databases divide their information among several different tables.

Both in real databases and in our simple example database, then, many interesting questions cannot be answered by retrieving data from only a single table. Instead, we must form queries which simultaneously access two or more tables. Any query that extracts data from more than one table must perform a join. As its name suggests, a join means that some or all of the specified tables' contents are joined together in the results of the query.

The simplest join is the cartesian product.

Qualified Names

In every SELECT statement, we must indicate the names of the columns we are interested in. So far, all of those columns have been from the same table. With joins, however, we will be selecting columns from two or more tables simultaneously. Although all of the columns in a single table must have unique names, columns in two different tables may have the same name.To identify a column in the database uniquely, then, we must use its qualified name.

A qualified name consists of the name of a table, followed by a period and the name of a column in that table. For example, the qualified name for the teacher# column in the SECTIONS table is

SECTION.teacher#
while the qualified name for that same column in the TEACHERS table is
TEACHERS.teacher#

Because every column in a table must have a different name, and because SQL requires every table in a database to have a different name, a qualified name is guaranteed to identify exactly one column in one table, so we're safe.

To see how qualified names are used in a query, suppose that we wish to see each value for teacher_name from TEACHERS along with each possible value for teacher# and course# from SECTIONS. Requesting teacher_name from TEACHERS along with course# from SECTIONS is easy: both column names are unique in the database. But selecting teacher# from SECTIONS requires using the qualified name for teacher#, since a column with that name appears in both tables. A formulation of this query might be:

SELECT teacher_name, SECTIONS.teacher#, course#
FROM   TEACHERS, SECTIONS
(What are the results of this query? Probably not what you'd expect -- see the next section).

As we have already seen, qualified names are not required when there is no possibility of ambiguity. In the above query, for instance, just the column name was required for course#, since TEACHERS and SECTIONS have only one column with this name between them. Despite the fact that a column called course# also appears in the COURSES table, the list of tables in the above query's FROM clause limits its scope to TEACHERS and SECTIONS.

We have been taking advantage of this automatic limitation all along. For instance, in the query

SELECT student#
FROM   STUDENTS;
student# must refer to the column of that name in the STUDENTS table. Similarly, in the query
SELECT teacher#
FROM   TEACHERS;
there is still no ambiguity, because the FROM clause specifies from which table the values for teacher# are to be drawn. Even simple queries can use qualified names if desired, like
SELECT STUDENTS.student_name
FROM   STUDENTS;
although it's hard to see why this would be useful. In the first example given above, however, ambiguity was possible, and so a qualified name was required.

Because it is always legal to use qualified names, even when they're not required, another way to express that first query would've been

SELECT TEACHERS.teacher_name, 
       SECTIONS.teacher#, 
       SECTIONS.course#
FROM   TEACHERS, SECTIONS;
What is a Join?

What are the results of the first query above?

If you type

SELECT teacher_name, SECTIONS.teacher#, course#
FROM   TEACHERS, SECTIONS; 
what will come back in return? Obviously, the result will have three columns: teacher_name, SECTIONS.teacher#, and course#. But what values will be contained in those three columns?

To carry out this query, SQL will list values from each record of both tables. In fact, it will list all possible combinations of the selected columns from all records in the two tables.

In other words, the results will begin with a line containing the values of teacher# and course# from the first record in SECTIONS matched with the value of the teacher_name from the first record in TEACHERS. Next will come a line matching the next teacher_name with the selected values from the first record in the SECTIONS table, and so on, until each of the records in the TEACHERS table has been matched with the first SECTIONS record. Next, the results will contain a series of lines that match the selected values from the second record in SECTIONS with the teacher_name value from every record in the TEACHERS table, and so on.

By match we simply mean put together. In this query there's no testing of the values that participate in the resulting records. Nowhere does anything restrict the output to only those records that refer to the same teacher.

So the result of this seemingly simple query is a long list of combinations, each consisting of a value from a record in TEACHERS paired with two values from a record in SECTIONS.

We called this: cartesian product. Check the section below on restricting the results of a join.

Examples, a few, for you to work with:

43. List all teacher names along with all the values for teacher number and course number contained in the SECTIONS table.

43.1 What if you try this:

SELECT teacher_name, SECTIONS.teacher#, course#
FROM   TEACHERS, SECTIONS;
Question: How many records in the output?

Another question: How's 43 supposed to be different so its result be meaningful?

Restricting the Results of a Join

What happens if, in a query accessing two tables with columns of the same name, we specify the same column name twice, once for each table?

Doing this requires that we give the qualified name for each, like

SELECT teacher_name, TEACHERS.teacher#, SECTIONS.teacher#, course#
FROM   TEACHERS, SECTIONS;
The results are like those specified above - a list of all possible combinations of the selected columns from the two tables - except that the column called teacher# appears twice. Try the query yourself to see the results.

As always, each column is labelled with the appropriate name, so for the two teacher# columns, both column names are qualified names. These two columns contains all possible combinations of values for teacher# from the two tables.

Usually, such general results as these are not very useful. Too much information is given, and it is difficult to pick out what's interesting. To reduce the size of the results and thereby zero in on the answer to some particular question, we can add a WHERE clause to the above query.

We might wish to know, for example, which teachers are teaching which courses. Further, we may wish to identify those teachers by name, not just by their teacher numbers. This information is available in the results from the query above, but it is not in a very concise or usable form. By adding a WHERE clause, we can retrieve only those results of the join in which we are interested.

To see which teachers teach which courses, look through the results just given. Note that whenever the value of teacher# from the TEACHERS table is equal to the value of teacher# from the SECTIONS table, the course whose course# appears in that record is one which is taught by that teacher. Because of the way this database is defined, the two teacher# columns must have this relationship.

Columns like these, sometimes called join columns, exist in each of the tables in our example database. Similar columns will also exist in most or all of the tables in real relational databases. The information stored in these join columns gives us a very powerful tool for accessing the stored data.

Back to the original question: which teachers teach which classes?

To select just those records from the mass of information given above, one could type

SELECT teacher_name, SECTIONS.teacher#, course# 
FROM   TEACHERS, SECTIONS
WHERE  TEACHERS.teacher# = SECTIONS.teacher#; 
The results are only those records which meet the condition specified in the WHERE clause: those where the two teacher# values are equal. We can now see easily which courses are taught by each teacher, with teachers identified by name (run the query to see for yourself).

Even though they are used in a WHERE clause, the values for teacher# need not be selected by the query. If we were interested in seeing only teachers' names and the numbers of the courses they teach, we would have typed

SELECT teacher_name, course#
FROM   TEACHERS, SECTIONS
WHERE  TEACHERS.teacher# = SECTIONS.teacher#;
This time, the results include only the teacher_name and the course# values for those records which meet the WHERE clause's condition, i.e., a list of teachers and the courses taught by each.

Because the SECTIONS table contains records only for currently offered sections, we could see only the names of those teachers who are currently teaching some section with

SELECT teacher_name
FROM   TEACHERS, SECTIONS
WHERE  TEACHERS.teacher# = SECTIONS.teacher#; 
Since all teachers in our sample database are assigned to a section, the teacher_names listed by this query are identical to those of the previous one.

The important thing about this example, though, is its FROM clause: despite the fact that no values from the SECTIONS table are selected, that table must still appear in the FROM clause, since a value from SECTIONS is referenced in the WHERE clause. The names of all tables which have any of their columns referenced anywhere in the query must appear in the query's FROM clause.

Examples, more, to think about this and practice:

44. Which teachers teach which courses? (List teachers by name.)

45. What is the maximum enrollment in each section of each course?

46. List the grades received by each student.

Manipulating the Results of a Join

Just as with queries which access a single table, queries accessing two or more tables can use the full power of SQL. For instance, all options of the WHERE clause are available, including LIKE and NOT LIKE, IN and NOT IN, NULL and NOT NULL, and AND, OR, and NOT. Also, arithmetic operations can be performed on the results before they are displayed, just as with simpler queries. And by using ORDER BY, the results of a join can be listed in either ascending or descending order, based on the values of any selected columns.

The aggregates described in the previous chapter can also be applied. To learn the number of rows in a table, for example, the COUNT aggregate can be used, while average values can be computed with AVG. In general, SQL's features are additive: its basic tools can also be used in more complex contexts, such as within joins.

Some examples, or problems, to think about, that illustrate these points:

47. Which students from California or Illinois got A's or B's?

48. Assuming that tuition is $450.00 per course, what is Larry Bird's total tuition bill?

49. List the teacher for and number of students in each section. Order this list of courses by number of students, from largest to smallest.

Aliases

Every table in a database must have a unique name. Sometimes it is useful to assign an alias to a table, a name which can be used in place of the table's real name.

Aliases, called "correlation names" in the SQL standard, exist only for the life of a particular query, e.g., for the duration of one SELECT statement.

A common reason to use aliases is to reduce the amount of typing required to enter a query. To create an alias name for a table, simply type the alias name after the table name in the FROM clause. Leave at least one space between the table's name and the alias. For example the query

SELECT student_name, grade
FROM   STUDENTS S, ENROLLS E
WHERE  course# = 'H480'         AND
       section# = 1601          AND 
       S.student# = E.student# 
defines two aliases: S for STUDENTS and E for ENROLLS. This query could also be entered as
SELECT student_name, grade
FROM   STUDENTS, ENROLLS
WHERE  course# = 'H480'        AND
       section# = 1601         AND 
       STUDENTS.student# = ENROLLS.student#
and the results would be identical. The only difference is in the number of characters typed.

If qualified names are used in specifying the selected columns, it is even possible to use an alias before it is defined. We could for example say

SELECT C.department, 
       C.course#, 
       S.section#, 
       S.num_students 
FROM   COURSES C, SECTIONS S
WHERE  S.course# = C.course#; 
Here, the aliases are defined as always in the FROM clause, but are used in both the qualified names and the WHERE clause.

Because their most common use is to reduce the amount of typing, alias names are often kept to a single character. It is legal, however, to use longer alias names. It's even possible to create an alias name which is longer than the original table name.

Some examples of using aliases in a join for you to produce:

50. List the names and grades of all students enrolled in section 1333 of A114.

51. For each department, list its courses and sections together with the number of students in each.

Joins With Three Tables

Earlier in this chapter, we saw which teachers taught which courses. In that example, the teachers were listed by both name and number, while the courses were listed by number only. We may also wish to list the courses by name. To do this, we must extract information from three different tables:

A join with three tables is a fairly straightforward extension of the two-table join. We must list all necessary tables in the FROM clause, then include appropriate restrictions in the WHERE clause. To answer the question posed above, we could type

SELECT teacher_name, 
       SECTIONS.teacher#, 
       SECTIONS.course#, 
       course_name
FROM   TEACHERS, SECTIONS, COURSES
WHERE  TEACHERS.teacher# = SECTIONS.teacher# AND
       SECTIONS.course# = COURSES.course#;
The results contain the four named columns, each with values only for those records that meet the restrictions in the WHERE clause. In other words, this query produces a list of teachers and teacher numbers, together with both the names and numbers of the courses they teach.

(If you're still having trouble believing that the simple restrictions given in the WHERE clause produce the desired results, try the query without the WHERE clause. If you manually search through the results, you'll find that the resulting rows for which the conditions in the WHERE clause are true are exactly those desired.)

Examples, to work on and practice with:

52. List the names of all teachers, along with the names and numbers of the courses taught by each.

53. In which courses did any female student get an A? (List both the courses and the students by name).

54. List the names of all teachers and their departments, ordered alphabetically by department.

55. List each student's grade for each course. Order the list alphabetically by course name, then, for each course, by student name.

56. Which courses are the students from Pennsylvania, California, and Connecticut taking?

57. List alphabetically the names and grades of all male students enrolled in M119.

Join With More Than Three Tables

A single query can access information stored in any number of tables within a database. Four, five, or more tables can be joined together, usually in combination with an appropriate WHERE clause, to retrieve the required information. In SQL, these larger joins are constructed just like those using only two or three tables. The only differences are the number of tables in the FROM clause and the complexity of the WHERE clause.

More examples, of joins with four tables, to work out:

58. How many students are in Dr. Engle's English class?

59. Produce a list of the names and addresses of Dr. Horn's students. Order the list alphabetically by student name.

60. Produce a roster for each course by teacher, listing the courses and sections they teach along with the names of all students enrolled in each.

Using GROUP BY in a Join

The GROUP BY clause is a powerful, but sometimes complex, part of SQL. When used in a join, both its power and its complexity are magnified.

Recall how GROUP BY works with single-table queries:

  1. The table is (conceptually) ordered according to the values of the columns specified in the GROUP BY clause. All records that have the same value in that field comprise a single group.

  2. Next, the values of any specified aggregates are calculated for all of the records in each group.

  3. Finally, the results are printed, with one line for each group.

  4. And any column names specified after SELECT must either appear within an aggregate or in the GROUP BY clause itself.

These same basic rules apply to joins using GROUP BY. Suppose, for instance, that we wish to know the total number of students possibly taught by each teacher. Answering this question requires information from both the TEACHERS table and the SECTIONS table, so a join is required. Since we want to look at students taught by each teacher, we must join the two tables using the teacher# column in each. Also, because we wish to learn the total maximum number of students taught by each teacher, we can group the data by teacher_name and use SUM to add the groups.

The resulting SQL query is

SELECT   teacher_name, SUM(num_students)
FROM     TEACHERS, SECTIONS
WHERE    TEACHERS.teacher# = SECTIONS.teacher#
GROUP BY teacher_name; 
This query will (again, conceptually) first perform the join, yielding only the selected columns from those records in the two tables where
TEACHERS.teacher# = SECTIONS.teacher#
Next, these chosen records are ordered according to the values in their teacher_name field, and all records with the same value for teacher_name are assigned to a distinct group. Finally, the values in the num_students fields of the records in each group are added together, and the results are printed. (In the example database, each teacher teaches only one class. For this query, then, each group contains only a single record. If any teachers taught more than one class, their group would also contain more than one record.)

In general GROUP BY in joins behaves just like GROUP BY in single-table queries. As in those simpler queries, a HAVING clause may also be specified to restrict the groups which are included in the query's results.

Examples, to think about using GROUP BY in a join:

61. What is the total number of students taught by each teacher?

61.1 What is the maximum number of students a teacher could teach?

62. List in descending order the total number of credits awarded for each course.

63. How many students of each sex are enrolled in A201.

Queries Within Queries: Subqueries

A relational database consists entirely of tables. Every piece of information is stored in some column within one of the databases' tables. SQL, since it is a query language for relational database systems, operates on these tables.

But you may already have noticed an interesting fact: the results of an SQL query are also a table. True, these tables are somewhat ephemeral, often existing only on a terminal screen. But even though a results table does not become part of the stored database, it, too, can be operated on via SQL queries. Performing this sort of access requires the use of subqueries.

What is a Subquery?

The result of every SQL query is a table. Sometimes, that table is very simple, containing only a single row and column (or even being entirely empty). In other cases, a results table may be much larger than any of these tables actually stored in the database. In either case, the table which results from executing one query (the subquery) can be the input for another query. It is even possible for the results of that query to be the input for yet another, and so on. In practice, though, queries are seldom nested more than two deep.

Many of the queries that we have already seen can be recast using subqueries. For some questions, including some apparently straightforward ones, using a subquery is the easiest route to the answer. For others, using a subquery is virtually a requirement.

Suppose we wish to know the names of all students who received a B (expressed numerically, a 3) in any of their classes. We could learn this with:

SELECT DISTINCT student_name
FROM   STUDENTS, ENROLLS
WHERE  grade = 3 AND STUDENTS.student# = ENROLLS.student#; 
(Since we want each selected student's name listed only once, and since some students may have received B's in two or more classes, this query uses the DISTINCT option).

We could also answer this question using a subquery, like:

SELECT DISTINCT student_name
FROM   STUDENTS
WHERE  student# IN 
                     (SELECT student# 
                      FROM   ENROLLS
                      WHERE  grade = 3); 
The query within parentheses is the subquery. When the entire query is executed, this subquery must be completed first. The result of the subquery is the following table:
student#
________

     148
     210
     298
     298 
     473
     558
This table is then used as input to the WHERE clause in the main query.

The condition expressed in that WHERE clause (student# IN ...) will be true only for those records in the STUDENTS table whose student# is in the list above, the table returned by the subquery. Since the above table contains only the student#'s of students who received a B, only the student_name's from those records in STUDENTS will appear in the results of the entire query.

Note also that DISTINCT need not be used in this case, because of the way IN works: the WHERE clause is true for any student# that appears at least once in the result of the subquery. Even if the same student# appears more than once, as does the number 298 in the example above, that student's name will not appear more than once in the result of the entire query.

Subqueries can appear in HAVING clauses (and thus in SELECTs using GROUP BY) as well as in WHERE clauses. The rules are the same: the subquery is executed first, and its results are used as input to the condition specified in the HAVING clause. Like a WHERE clause, the HAVING clause then evaluates to either true or false.

According to the SQL standard, all subqueries must be surrounded by parentheses. As with so many parts of the standard, some implementations don't enforce this restriction. Using subqueries can become rather complicated, however, and so using clarifying parentheses is always a good idea.

Example, to think about and practice with:

1. Which students received B's in any course?
Using Comparison Operators With Subqueries

The various comparison operators that have been described before (=, <, <=, >, >=) can be used with subqueries. In addition, the IN and NOT IN keywords may also be used.

We have already seen an example using IN. NOT IN is used with subqueries in an analogous way to IN. With NOT IN, however, the WHERE clause is true if and only if the desired values are not found in the table resulting from execution of the subquery.

Using the comparison operators is more problematic. Consider how = is used in

SELECT student_name
FROM   STUDENTS
WHERE  student# = (SELECT student#
                   FROM   ENROLLS
                   WHERE  grade = 1); 
As it happens, this query will work as expected, and return the student_name of the one student in our example database who received a D (or numerically, a 1). But suppose that more than one student had received a D? Or suppose we had entered this same query, but asked about B's instead of D's?

In this case, SQL is faced with the impossible task of comparing a single value to a table full of values. The exact results of such a query vary from system to system, but in general, the = operator should not be used with subqueries unless you know in advance that the result of the subquery is a table with only one record. When you don't know this (which is most of the time), use IN. The effect is the same, but the subquery is allowed to return a table with multiple values.

Using <> with subqueries causes similar problems. Instead of this potentially troublesome comparison operator, then, it is usually simpler to achieve the same effect by using NOT IN.

The other comparison operators, including <, >, <=, and >=, also can't be used alone unless the subquery will return a single value. Unlike = and <>, however, there is no simple substitute. (They can, however, be used in conjuction with ANY and ALL, as described in the next section).

Nevertheless, subqueries that are guaranteed to return only a single value are not that unusual, so the comparison operators are still useful on their own in some cases. Since a subquery can use aggregates like any other query, we could see, for instance, which teachers earn a lower than average salary with the following SQL code:

SELECT teacher_name 
FROM   TEACHERS
WHERE  salary < (SELECT AVG(salary)
                 FROM   TEACHERS); 
This query will return the teacher_names of only those teachers whose salaries are below the average. Note that since aggregates like AVG can't appear in a WHERE clause, the simplest way to answer this question is with a subquery.

Similarly, to retrieve information about sections with enrollments greater than average, we could type

SELECT course#, section#, num_students
FROM   SECTIONS
WHERE  num_students > (SELECT AVG(num_students) 
                       FROM SECTIONS); 
In both cases, the table resulting from execution of the parenthesized subquery contains only a single value, and therefore is fair game to be used immediately following a comparison operator.

It is also possible to use AND, OR, and NOT, in a WHERE clause involving subqueries. In fact, a single query can tie together two or more subqueries using these constructs. More commonly, though, a WHERE clause will combine a subquery with a some more common condition, such as a test for equality, in a single query.

Example, to think about:

2. Which teachers earn less than average?

3. Which sections have more than the average number of students?

Using ANY and ALL

As was just described, a subquery which immediately follows one of the comparison operators is expected to return only a single value. It is possible, however, to use comparison operators with subqueries returning multiple values by following the operator with either of the words ANY or ALL.

When ANY is used, the WHERE or HAVING clause containing the subquery will be true if the specified operator is true for any of the values returned by the subquery. In other words, if the expression containing the operator is true for at least one of the values returned by the subquery, the entire WHERE or HAVING clause will be true. (The SQL standard also allows the word SOME to be used as a synonym for ANY, with no change in meaning. Once again, however, not all implementations of SQL support this substitution). For example we could say

SELECT student_name
FROM   STUDENTS
WHERE  student# = ANY (SELECT student# 
                       FROM   ENROLLS
                       WHERE  grade = 3);
The result of this query will be the names of all students who received at least one B. Because the subquery will return more than one value, we could not use the = operator by itself. Following it with ANY, however, allows us to compare the student# from STUDENTS with each of the results of the subquery.

When used to test for equality, using ANY is exactly equivalent to using IN. Therefore the query

SELECT student#
FROM   STUDENTS
WHERE  student# IN (SELECT student#
                    FROM   ENROLLS
                    WHERE  grade = 3); 
will return results identical to those of the previous example. With the other comparison operators, however, ANY can be a useful addition, since it allows comparisons against subqueries which return more than one value.

For example, to see the names of students whose average grade is higher than that of the student with the lowest average grade (and to provide an example of a HAVING clause containing a subquery), we could type:

SELECT   student_name, AVG(grade)
FROM     STUDENTS, ENROLLS
WHERE    STUDENTS.student# = ENROLLS.student#
GROUP BY student_name
HAVING   AVG(grade) > ANY (SELECT AVG(grade)
                           FROM ENROLLS
                           GROUP BY student#); 
The result of the subquery is a table containing the average grade of every student (note that this is not the same as the student's grade point average, or GPA, since the number of credits given for each course is not included in this calculation). The outer query also calculates every student's average grade, but relies on the subquery within its HAVING clause to determine which of those students' names and average grades appear in its results. Only those students whose average grade is greater than at least one other student's average grade will be selected. In other words, this query will list the names of all students except those with the lowest average grade.

ANY works in an analogous fashion. Preceding a subquery with some comparison operator and the word ALL means that the WHERE or HAVING clause containing the subquery will be true only if the comparison is true for all values returned by the subquery. For example, to see the names of all students who received a grade in some course that equaled or exceeded the highest all-around average grade, one could type

SELECT DISTINCT student_name, grade
FROM   STUDENTS, ENROLLS
WHERE  STUDENTS.student# = ENROLLS.student# 
  AND  grade >= ALL (SELECT AVG(grade)
                     FROM ENROLLS 
                     GROUP BY student#);
The results of this query would be the names and grades of students meeting the requirement specified above. Because a single student may have received a qualifying grade in more than one class, DISTINCT is used to cause that student's name to appear only once in the query's results.

Examples, for you to look at, work with, and think about:

3.1 Which students received B's in any course?

4. List all students whose average grade is greater than the lowest average grade.

5. List the names and grades of students who received some grade greater than or equal the highest all-around average grade.

Existence Tests

All subqueries return a table, which is then used as input to the WHERE clause in the enclosing query. So far, we have been concerned with comparing the actual values in this returned table to other values, using operators like =, >, and <. As always, the records for which the WHERE clause was true were selected. SQL also allows us to test not just the values contained in records returned by a subquery, but the existence or non-existence of any such records as well. For this, the language provides EXISTS and NOT EXISTS.

Recall that each part of a WHERE clause is either true or false for each record in the table. A subquery preceded by the word EXISTS will be true if the subquery returns any records at all, regardless of the value of those records (even a record containing all NULLs qualifies). Similarly, a subquery preceded by NOT EXISTS will be true if the subquery returns no records, i.e., an empty table with no rows and no columns.

Existence tests, although conceptually very simple, can be quite useful. Their most common use, however, is in correlated subqueries, described next.

Correlated Subqueries

In the examples shown so far, the subquery is first evaluated, and then the table resulting from that evaluation is used as input to the main query. SQL also allows subqueries to be evaluated multiple times, once for each record accessed by the main query.

Such subqueries are called correlated subqueries.

Suppose, for example, that we wished to find the names and student numbers of any students who are not currently enrolled in any classes. One way to express this request in SQL is

SELECT student#, student_name
FROM   STUDENTS S
WHERE  NOT EXISTS  (SELECT * 
                    FROM ENROLLS E
                    WHERE E.student# = S.student#); 
Unlike the earlier examples, the parenthesized subquery here is not executed just once. Instead, this subquery is executed once for each record in the STUDENTS table. For each execution, the value of the student# field from a record in STUDENTS is compared with the value of student# from every record in ENROLLS (via the condition E.student# = S.student# in the subquery's WHERE clause). Whenever the two student#'s match, the subquery adds that record from ENROLLS to its result table.

Recall from last section how NOT EXISTS works: if the table returned by the subquery is not empty, i.e., if it contains at least one record, the NOT EXISTS condition is false. If, however, the table returned by the subquery is empty, i.e., if it contains no records, the NOT EXISTS condition is true. In the example above, the subquery compares the student# field of each record in STUDENTS with every student# field in the ENROLLS table. If at least one match is found, then this student is enrolled in some course, the table resulting from the subquery will not be empty, and the NOT EXISTS condition will be false. If no matches are found, however, then the subquery will return an empty table, the NOT EXISTS condition will be true, and the student_name and student# fields of the current record in the main query will be selected.

Note that the alias S is defined for the STUDENTS table in the main query, and that the alias is used within the subquery. Note also that the value of S.student#, which changes each time the subquery is executed, is taken from (or correlated with) the current record in the main query.

It is from this relationship that the term "correlated query" is derived. To recognize correlated queries, look for the use within the subquery of an alias (or, occasionally, a complete table name) defined only in the main query. The above example could also have been written without the alias name for ENROLLS and with only one qualified name, like

SELECT student#, student_name, 
FROM   STUDENTS S
WHERE  NOT EXISTS (SELECT * 
                   FROM ENROLLS 
                   WHERE student# = S.student#); 
Again, S.student# takes its value from the current record in STUDENTS. The reference to student# in the subquery's WHERE clause still refers to that column in the ENROLLS table; in the absence of a qualified name, the column is assumed to come from the "nearest" table. In this case, that table is ENROLLS. The results of this query are identical to those of the one shown previously.

Correlated subqueries can also be used with comparison operators, ANY and ALL, and EXISTS. For instance, to see which students received a grade higher than the average in their section, and to order the results by course number, one could type

SELECT student_name, grade, course#, section#
FROM   STUDENTS, ENROLLS E
WHERE  STUDENTS.student# = E.student# AND 
       grade > (SELECT AVG(grade)
                FROM ENROLLS 
                WHERE course# = E.course# AND
                      section# = E.section#)
ORDER BY course#; 
Unlike the previous example, the main query accesses two different tables. Conceptually, this query is processed as follows: first, the cartesian product of STUDENTS and ENROLLS is formed, a table containing all possible combination of records from the two tables.

Then, only those records in which

STUDENTS.student# = ENROLLS.student#
are selected. For each of these records, the subquery is executed, returning the average grade for the course and section associated with the current record from the ENROLLS table. If the grade in that record is greater than this average, the main query's WHERE clause is true, and the student_name, grade, course#, and section# of that record is selected. Finally, all selected records are ordered by the value in their course# field.

Using HAVING With Correlated Subqueries

The HAVING clause was discussed in the context of GROUP BY earlier. HAVING is also useful with correlated subqueries. While a WHERE clause specifies restrictions that individual records must meet to be selected, a HAVING clause places a restriction on the results of an entire query (or, when used with GROUP BY, of a group). When used without GROUP BY, a HAVING clause can prevent a query from returning anything at all.

To see why this is useful, suppose we wanted to know which students are taking more than two courses. One way to express this in SQL is

SELECT student_name
FROM   STUDENTS S
WHERE  EXISTS (SELECT COUNT(*) 
               FROM ENROLLS
               WHERE student# = S.student#
               HAVING COUNT(*) > 2); 
As before, the subquery will be executed once for each record in the STUDENTS table (we know this because of the reference to the alias S within the subquery). And as always, the student_name field from the current STUDENTS record will be selected whenever the main query's WHERE clause is true. The question to be answered, then, is when will this WHERE clause be true?

Once again, the EXISTS operation (and thus the main query's WHERE clause) returns true if and only if the subquery returns a non-empty table, i.e., a table with at least one row and column. Without the HAVING clause, the subquery would return the number of courses in which each student was enrolled. The effect of the HAVING clause, however, is to eliminate entirely the results of any subquery executions which return two or fewer records. Once again, since a HAVING clause (without GROUP BY) places a restriction on the query as a whole, any query which would otherwise return less than three records will now return an empty table. The EXISTS in the main query, and thus the main query's WHERE clause, will therefore be true only for those students who are enrolled in more than two courses.

Examples, to think about:

6. Which students are not enrolled in any courses?

7. Which students received a grade higher than the average in their section?

8. Which courses have only male students?

9. Which students are taking more than 2 courses?

10. List the departments that have courses with more than 4 students.

We can do other things with SQL as well, such as:

  1. Create and Destroy Tables (Data Definition Language)
  2. Add, Modify, and Delete Records
  3. Create Views
  4. Grant and Revoke Privileges
  5. Define Indexes
All these topics are important, but they fall outside the intended scope of these notes.


Last updated: October 14, 2001 by Adrian German for A114/I111