CSCI A114 / INFO I111
In this review we are going to focus on SQL. SQL (Structured Query Language) is a widely used tool for accessing data stored in relational database systems. We go through a large number of examples to expose the specifics of this (often ornery, by many accounts) language, step by step.
The SQL revolves around the relational model for databases (proposed by E.F. Codd and his colleagues at IBM in the 1970's). In this model the choice of a table structure with simple rows and columns provides superb clarity, and simplicity for what was once called "the arcana of storage management, pointers and hierarchies".
One of the most common uses of computers is to store and retrieve information. A collection of information is called a database, and so software called a database management system (DBMS) can be used to perform this task. A typical DBMS allows its users to store, modify and access data in an organized, efficient way. In the past accessing a database required the services of a programmer, so you had to first find one, then convince her/him to write programs for you to access the database.
Allowing users direct access on an ad hoc basis required giving them a language in which to express their requests. An instance of access to a database is often called a query, and so that language is called a query language. Many query languages exist, each developed for a particular DBMS. The one presented here is officially called Structured Query Language and its acronym is SQL. It is a standard query language supported by a great many database management systems. It does two things:
Kinds of Database Systems
Database management systems can be characterized by the way they model data as seen by their users. Relational systems represent data to their users as simply the contents of one or more tables, and nothing more. One reason for the popularity of relational systems is their ability to easily support access either by computer programs or directly by human beings.
SQL was developed as a query language for relational database systems. Its primary intent is to allow easy access to data stored in relational systems. Virtually all relational systems allow their users access to stored data through SQL. The wide availability of SQL in relational systems allows fast answers to an assortment of very specific questions about data stored in a database.
The relational model was first put forth by E. F. Codd, a researcher in IBM's San Jose laboratories, in a paper published in 1970. Along with the easy access it allows to people, a major reason for this dominance is the simple but powerful view of the data which relational systems offer to their users.
A relational database consists of a collection of tables. All data in the database is stored in one of these tables. Each table is a simple two-dimensional structure, made up of some number of rows (also called records) and columns. Each column in a table is assigned a unique name and contains a particular type of data such as characters and numbers.
Each record (we will use this term throughout instead of row) contains a value for each of the table's columns. The intersection of the columns with each row defines the fields in those records. Stored in the table, as values for the various fields in the records, is the actual data. Note that the order of the records in a table is not significant. It is not meaningful to ask if a particular record occurs before or after another in a table.
A very simple database might contain only one table, but most databases will contain several tables. It is a design problem to determine the tables that best model the problem you're building a database for. Our approach (in A114 and I111) relies on determining the entities (with their features) first, then the relationships between them, and drawing an E-R diagram, to produce the tables out of that.
Most queries against a database may require accessing more than one table. This is not a problem, since a single SQL query can retrieve information from one or more tables within a single database. Other databases may also exist on the same system, each comprising a set of tables appropriate to their specific purpose (modelling the problem for which the database has been built), but that's not our business (at least not directly, not now).
If the records in a table are not ordered, how are we to locate specific ones? More importantly, how can we distinguish one record from another? The answer lies in specifying one or more columns in each table which define the key for that table's records. The information in the key field or fields must be unique for each record. (That's also the key feature of that entity.)
For some tables, none of the information to be stored may provide an appropriate key. In these cases, it is necessary to define a column containing some kind of unique identification (typically a number) to serve as the key. For other tables, the contents of two or more columns taken together may provide the values necessary to uniquely identify each record. In tables like this, a record's key consists of its values in each of those columns.
Keys are not always required for accessing information. Depending on the question asked, it may or may not be necessary to refer to each record's key value. In spite of this, the ability to uniquely identify each record in a table is an integral part of the relational model for handling data, and every table must always define a key for its records, period.
Correctly selecting keys for the tables in a database is not an easy task. In fact, choosing keys is only one aspect of the much larger problem of designing a relational database. The designer, who may also be the database administrator (DBA) responsible for the system, must first determine how the data should be organized into tables. The DBA (or whoever the designer is) must determine the number of columns in each table, what the names and legal contents of those columns should be, and many other specifics about the database. Here's our recipe, good enough for all we might be dealing with.
Luckily, accessing the data through SQL doesn't require any specialized knowledge about how a relational database should be defined. All we need to understand is how it looks to us.
(So we skirt the design issues from now on, and assume the database is given. Let's review its design briefly. We start with one of its tables, first. OK, let's get going. )
An Example Table
Looking at Homework Assignments 5 (five) and 6 (six) we see a table of TEACHERS.
The structure of this table is as follows: four columns, with the first as the key.
Every record in the table has four fields:
Example SQL Queries
Suppose the question is:
"What are the names, phone numbers of teachers earning more than $30,000 yearly?"In SQL that looks as follows:
We first determine what we want to look at, where it's coming from, and build the query using the Query Construction Method (QCM): identify the data, data source, any restrictions, then the format.SELECT teacher_name, phone FROM TEACHERS WHERE salary >= 30000;
For many queries, information must be retrieved from two or more tables at once. SQL can be used to express both very simple queries and very complex ones. After describing the language's basic elements we step through a sequence of examples. That's your Homework Assignment 5 and 6 (for A114 and I111). Starting with simple queries we move step by step to increasingly powerful operations. Ultimately (and hopefully) the structure of SQL will be made clear to all.
Elements of the Language
SQL allows its users to work with data stored on a computer in a relational database system. Most languages for human-computer communication are rigid and formal, and SQL is not exception. We begin our discussion by looking at the fundamental components which comprise this useful language.
Ways to Use SQL
SQL was designed with interactive access in mind. Its various commands are intended to be directly typed into computer terminals or personal computers. The results of those commands are then displayed on the display screen for the user to examine.
If we look at the history and evolution of access to database systems, it went like this: starting with access by computer programs only, database systems evolved to allow direct acces by people. Query languages like SQL are an essential part of this direct access.
The need for access to data by computer programs did not go away with the arrival of SQL, however. Embedded SQL exists to make this possible. As originally defined, SQL wasn't really usable from inside a computer program. Extensions to the language have been made, however, which allow programmers to embed SQL statements in their programs.
We use such extensions in A348 and A548 by accessing a MySQL database with
Versions of SQL
Natural languages have dialects, and so does SQL. Vendors of SQL have made small (or not so small) modifications and additions to the language originally invented by IBM. This slightly chaotic situation is not necessarily a bad thing. The only aspect you need to keep in mind is that if the examples that follow don't quite work on your system they may have to be slightly adjusted.
But the adjustments will be absolutely minor.
So if some of the examples don't quite work for you exactly as specified here don't despair; check your local documentation. (In Access, for example, you can build the query with the Visual QBE Interface, and check to see what SQL gets produced.)
"While SQL is the lingua franca for relational database access, it is not yet the Esperanto."
Defining the Language
As with any language, natural or computer, SQL is made up of several parts.
Relational databases store data. Every element of that stored data must be of some type. In every table, each column (and thus every field in every record) stores only data of a certain type. SQL precisely defines the types of data which can be stored in its tables.
Here are examples of types:
One other kind of information can be stored in a record's fields: the value NULL. NULL doesn't really belong to any type, but instead represents the lack of a value. Any field of any type can have the value NULL assigned to it (unless NULLs are explicitly barred from that field when it is created).
Please note that NULL is not the same thing as
NULL is simply NULL, a unique "value" indicating the absence of a normal value in a particular field.
The Structure of SQL Statements
SQL's full name, Structured Query Language, is a slight misnomer. It implies that only queries are supported, and that all an SQL user can do is ask questions about information stored in a database's tables. In fact, SQL allows its user to create tables, destroy tables, grant and revoke access to those tables, and to do several other tasks required for full use of a relational database. Although queries are its most common use, SQL is really much more than just a query language.
To allow access to the information stored in a relational database, SQL defines a number of operations, sometimes called the verbs of the language. Just as speakers of English rely on English verbs to indicate actions, SQL users rely on its verbs to indicate the actions to be performed on a database. Each SQL verb has a specific meaning and serves a specific purpose.
Each SQL statement, whether entered by a human being at a keyboard or embedded within some computer program, begins with
an SQL verb. Following this comes further information specifying exactly what the verb should do. If the verb requests
information about a table, for instance, the name of the table must appear. At the end of each statement appears a
termination character. Different SQL systems define different termination characters, but the most common is the
For now the important thing is to note this basic form of SQL statements:
The verb is SELECT, the termination character is semicolon, and everything in between is a specification of what the verb should do. All SQL statements follow this basic structure.SELECT teacher_name FROM TEACHERS;
SQL is free-format. In other words, blanks and the line on which various parts of a statement appear are not significant. The example above could have been written as
and the results would have been the same. Also, although conformance to the SQL standard requires support only of upper case letters, most implementations of SQL treat upper and lower case letters as equivalent. In most products which support SQL, therefore, typing the above examples in all lower case letters would have produced equivalent results.SELECT teacher_name FROM TEACHERS;
Like many computer languages, SQL reserves certain words for its own use. These words, called reserved words or keywords, may not be used as the name of the table, as a name for one of a table's columns, or for anything else. They always have only the meaning assigned to them by SQL.
In the examples that follow we will be writing the keywords in upper case. Names of tables are also in upper case. Column names, however, are always in lower case. Since SQL is not usually case sensitive, these distinctions serve only to make the language more readable, and have no effect on the operation of the queries.
An Example Database
As described earlier, a relational database is just a collection of tables. Each table has some number of columns, and contains rows (or records) with the actual information. Each SQL query accesses the data stored in some particular database.
Throughout the set of examples that follow, we will use an example database with five tables.
This database contains the registration information for a fictitious (and very small) university. Each table contains a related set of information, described by that table's column names. Each column contains information of a certain type, drawn from the list of possible types described earlier.
We will give a complete list of the records stored in each table below. Note that in the diagram above some columns' titles are in color (blue or red). They are keys, that's why we marked them.
The table in the database with the most columns is called STUDENTS. It contains information on all students currently enrolled at this university. Its columns are:
student#: an INTEGER containing the student ID number
student_name: a CHAR(18) containing the student's name
address: a CHAR(30) containing the student's street address
city: a CHAR(20) containing the name of the student's home city
state: a CHAR(2) with the two-letter postal abbreviation of the home state
zip: a CHAR(5) with the student's home zip code
sex: a CHAR(1) containing 'M' for males and 'F' for females
student#column is designated as the key for this table (STUDENTS). The table contains 12 records, see below (download here).
Another table in the database is called TEACHERS. This table describes the teachers currently active at this university. Its columns are:student#,student_name,address,zip,city,state,sex 148,Susan Powell,534 East River Dr.,19041,Haverford,PA,F 210,Bob Dawson,120 South Jefferson,02891,Newport,RI,M 298,Howard Mansfield,290 Wynkoop Drive,22180,Vienna,VA,M 348,Susan Pugh,534 East Hampton Dr.,06107,Hartford,CT,F 349,Joe Adams,473 Emmerson Street,19702,Newark,DE,M 354,Janet Ladd,441 10th Street,18073,Pennsburg,PA,F 410,Bill Jones,120 South Harrison,92660,Newport,CA,M 473,Larry Bird,983 Park Avenue,02169,Boston,MA,M 548,Allen Thomas,238 West Ox Road,60624,Chicago,IL,M 558,Val Shipp,238 Westport Road,60556,Chicago,IL,F 649,John Anderson,473 Emmory Street,10008,New York,NY,M 654,Janet Thomas,441 6th Street,16510,Erie,PA,F
teacher#: an INTEGER containing the teacher ID number
teacher_name: a CHAR(18) containing the teacher's name
phone: a CHAR(10) containing the teacher's phone number
salary: a FLOAT containing this teacher's annual salary
teacher#is the key for this table. The TEACHERS table contains 6 records (as can be seen below, and downloaded from here.)
A third member of the database is the COURSES table. COURSES lists information on the courses offerred by the university this term and has the following columns:teacher#,teacher_name,phone,salary 180,Cooke,257-8088,$29560.00 290,Lowe,257-2390,$31450.00 303,Horn,256-3049,$27540.00 430,Engle,257-4621,$38200.00 560,Olsen,257-8086,$31778.00 784,Scango,257-3046,$32098.00
course#: CHAR(4) starts with an upper case letter followed by three digits
course_name: CHAR(20) containing the name of the course
department: CHAR(16) indicating which department gives this course
num_credits: SMALLINT indicating the number of credits this course is worth
Here's the sample data we use:
You can download this table from here.course#,course_name,department,num_credits A114,Databases,Computer Science,3 A201,English Composition,English,3 H480,Western Civilization,History,3 M119,Calculus IV,Math,4
Because course numbers are unique, the
column serves as this table's key.
Another of the database's tables is called SECTIONS. Many university courses are divided into two or more sections, and this table describes each section of each course. (You can find these sections in INSITE). It connects TEACHERS and COURSES. This table's columns are as listed below:
course#: the course ID (as in COURSES)
section#: a SMALLINT that identifies the section
teacher#: the teacher ID (as in TEACHERS)
num_students: a SMALLINT describing the capacity of the section
And here's where you can download it from.section#,teacher#,course#,num_students 1200,290,M119,20 1251,430,A201,15 1333,303,A114,10 1335,560,A114,10 1601,180,H480,15 1602,784,H480,10
(Colored columns are keys. Two of them are foreign keys and the third one could be a key by itself here at IU but in general may not, for example when we simply count the sections in a course
1, 2, 3, ...,for each course in the same way, starting at 1 for each course).
The final table in this example database is the ENROLLS table. It contains a record with each student's grade for each section of every course in which that student is enrolled. It relates the information from STUDENTS to the one in COURSES. Here is the contents of the table:
Here's where you can download this from. The structure of this table is:course#,section#,student#,grade A201,1251,298,3.00 A201,1251,349,4.00 A201,1251,548,2.00 A114,1333,473,2.00 A114,1333,654,4.00 A114,1335,210,3.00 A114,1335,548, H480,1601,358,4.00 H480,1601,410,2.00 H480,1601,649,4.00 H480,1602,298,3.00 H480,1602,473,0.00 M119,1200,148,3.00 M119,1200,210,1.00 M119,1200,348,2.00 M119,1200,473,3.00 M119,1200,558,3.00 M119,1200,649,4.00
course#: course number as in the COURSES table
section#: a section number. Same as used in SECTIONS.
student#: a student ID as in the STUDENTS table
grade: a DECIMAL indicating the student's grade in this section
By far the most used of SQL's verbs is SELECT. Since the most common operation performed on a database is to examine its data, it should not be surprising that the SQL statement which does this is the workhorse of the language. In any real database, of course, other SQL statements would have to be used first to create the database's tables and fill them with records. We assume your tables have been created, and populated. So we begin with SELECT (and that's also the only verb we'll cover).
Selecting Specific Columns in a Table
To see the values of certain columns for all of a table's records, you must give both the names of those columns and the name of the table. The general form is
whereSELECT <column names> FROM <table>;
<column names>is replaced by the names of the desired columns, separated by commas, and
<table>is replaced by the name of the table which contains those columns. Throughout these examples words flanked by
>represent generalities, and are to be replaced by specifics in actual SQL queries. Generalities are, simply said, syntactic categories. I am sure you apreciate that.
Examples of such queries:
1. List the names of all courses, their department, and the number of credits for each.Selecting All Columns in a Table
2. List the names, home towns, and home states of all students.
It is often useful to see the value of every field for every record in a table. One way of doing this is by listing the names of every column in that table, similar to the examples above. Because this is such a frequent operation, however, SQL provides a shorthand way to list all values in a table.
Instead of listing the column names you can type an asterisk. The general form is:
Here are some examples:SELECT * FROM <table>;
3. List all values in the COURSES table.Selecting Only Some of a Table's Records: The WHERE Clause
4. List all values in the TEACHERS table.
So far, every SELECT statement has returned at least one value (field) for every record in the table. What if we wish to see values only for records that meet some specific criteria? To do this, we must use the SELECT statement's WHERE clause. The WHERE clause lets us specify a predicate, something that is either true or false about each record in the table. Only those records for which the predicate is true will be listed in the results.
The general form of a SELECT using WHERE is
As before,SELECT <column names> FROM <table> WHERE <predicate>;
<table>are replaced by appropriate column and table names.
<predicate>can be replaced by a number of different things, depending on exactly what restrictions you wish to place on the results.
Comparisons in a WHERE Clause
Probably the most common predicates are those which compare values. You may wish, for instance, to see only the records in the COURSES table for three credit courses, or only those in the TEACHERS table for teachers with salaries greater than $30,000 a year. For these types of queries, the general form is
As before,SELECT <column names> FROM <table> WHERE <column name> <operator> <value>
<table>represent the names of the desired columns and the name of the table from which they should be drawn. The WHERE clause is more complex, however. The first item,
<column name>, must name a particular column in the table. This column may or may not be among those listed in
<column name> but before the
<value> comes an
<operator>. The possible operators are:
The predicate ends with a
(not equal, also written as
(less than or equal)
(greater than or equal)
<value>. The exact form of this value varies depending on the type of the named column. For columns of numeric type, a numeric value is simply placed after the operator. If the column is CHAR however, the character(s) comprising the value must be enclosed in single quotes.
Although SQL is usually not sensitive to the difference between upper and lower case letters, case does make a difference for comparisons using quoted character values. For example, the character strings 'Hello' and 'HELLO' are not considered equal to one another. Also, some strings are longer than others. According to the SQL standard, comparing two strings of unequal length conceptually adds blanks to the end of the shorter string, then performs the comparison. And like numeric values you can ask whether one character string is less than or greater than another. The comparison is performed based on the collating sequence (ASCII code) used by your system. In general, this collating sequence will result in a normal alphabetic comparison, with 'A' less than 'B', 'B' less than 'C', and so on. (Whether digits are less than letters varies, though. On large IBM systems, digits are greater than letters, while on nearly all others digits are less than letters).
In some cases, the
<value> may also be a
<column name>. If the
<column name> identifies a column in
<table> the values in the two
columns are compared for each record; only those records in which the two values satisfy the condition
=) are returned. Alternatively, the
<column name> may identify a
column in a different table. In this case, records from both tables may be examined and retrieved.
Retrieving data from more than one table at a time is called a join and is discussed later.
Here are some examples for you to practice with:
5. List the course name, department, and number of credits for all 3-credit courses.Combining Predicates in a WHERE Clause
6. List the names and salaries of teachers earning more than $30,000.
7. List all the male students.
It is often useful to combine two or more predicates in a single WHERE clause. You may want, for example, to identify
all students who both live in Chicago and are women. Doing this requires using SQL's AND and OR
operators. AND and OR, together with a third operator called
NOT are known as
SQL uses the word AND in much the same way as does English. Two predicates may be combined with AND, and the entire predicate is true only if both of its parts are also true. If desired, either or both of the predicates may be enclosed in parentheses. We will soon see examples where parentheses are required, but for now they are optional.
Examples of such queries:
8. List the names and addresses of all female students from Chicago.Using OR
9. Which Math courses have 3 or more credits?
OR is used in a similar way to AND. Once again, two predicates may be combined, and the value of both is used in determining which records are returned from a table. With AND, both predicates had to be true for a particular record for it to be selected. With OR, on the other hand, a record is selected and appears in the results of the query if either of the predicates is true, or if both are true.
AND and OR can both be used in WHERE clause. In fact, arbitrarily complex expressions can be created using various combinations of AND and OR. If both AND and OR appear in the same WHERE clause, all of ANDs are evaluated first, followed by all of the ORs. There is one exception to this rule: anything enclosed in parentheses will be executed first. In some cases, then, you may be required to use parentheses to correctly express what you mean.
For example, to see the names of all male students who live in either Connecticut or New York, you could type
If the parentheses were omitted, the statement would becomeSELECT student_name FROM students WHERE (state = 'CT' OR state = 'NY') AND sex = 'M';
It now has a quite different meaning. Since SQL first evaluates all ANDs, it is as if the query were writtenSELECT student_name FROM students WHERE state = 'CT' OR state = 'NY' AND sex = 'M';
This will select the records of all male students from New York, as well as those of all students, both male and female, from Connecticut. The moral is this: use parentheses whenever you combine AND and OR in a WHERE clause. They never hurt, and they sometimes can prevent you from inadvertently making the wrong request.SELECT student_name FROM students WHERE state = 'CT' OR (state = 'NY' AND sex = 'M');
Examples of using OR in a WHERE clause for you to practice with:
10. List the name, sex, city, and state for all students who are either from Connecticut or from a city called Erie.Using NOT
11. List the names, cities, states, and zip codes for all students whose zip codes are between 20000 and 29999 or who live in a city called Erie.
SQL's final boolean operator is NOT. Unlike AND and OR, NOT isn't used to combine conditions in a WHERE clause. Instead, it negates a specified condition. For example, to see the names and home states of all students who are not from Illinois, you could type
Inserting the word NOT simply reverses the condition, returning all records in which theSELECT student_name, state FROM STUDENTS WHERE NOT (state = 'IL');
statefield does not contain the value 'IL'. Once again, the parentheses around
state = 'IL'are not strictly required, but nevertheless a good idea.
Here's now an example:
12. List the names and states of all students who are not from Illinois.Well, that was easy!
Other Uses of WHERE
Predicates using a WHERE clause can do more than just make simple comparisons. In this section, we'll look at other ways to select only certain records from a table.
BETWEEN and NOT BETWEEN
AND, it is possible to select records where values of a
specific field in the record fall within a specified range. In an earlier example, for instance, we listed the name and
other information for all students whose zip codes ranged from 20000 to 29999. This kind of query arises quite often, and so
SQL provides a shorthand way of expressing it. By using BETWEEN and NOT BETWEEN in a WHERE clause, we can directly specify
ranges of values. The general form is
The negative form is also availableSELECT <column names> FROM <table> WHERE <column name> BETWEEN <value1> AND <value2>
In the first case, only those records in which the values inSELECT <column names> FROM <table> WHERE <column name> NOT BETWEEN <value1> AND <value2>
<column name>are greater than or equal to
<value1>and less than or equal to
<value2>will be selected. In the second, only those records whose values are not within the specified range will be selected.
In both forms, the two values are assumed to be specified in ascending order, i.e.,
<value1> is less
<value2>.For example, to see the names and salaries of all teachers who earn between $30,000 and
$35,000 a year, you could type
Only those records from TEACHERS with salaries in the specified range wil be returned.SELECT teacher_name, salary FROM TEACHERS WHERE salary BETWEEN 30000 and 35000;
The values specified in the WHERE clause can be of any type, but
<value2> must both be of a type comparable to that defined for
For example the query
is not legal, becauseSELECT student_name FROM STUDENTS WHERE student_name BETWEEN 0 and 100;
student_nameis a CHAR column, while the values 0 and 100 are numeric. Now, some examples for you to practice with:
13. List names and salaries of teachers who make between $30,000 and $35,000/year.LIKE and NOT LIKE
14. List the names, cities, states and zip codes of those students whose zip codes are not between 20000 and 29999.
When searching for information, we often already know part of what we're looking for. We may, for instance, know someone's last name but not their first name, or we may know only that their last name begins with "Jo". For situations like these, SQL provides LIKE and NOT LIKE. The general form of a SELECT using LIKE is
The form with NOT LIKE is similar.SELECT <column names> FROM <table> WHERE <column name> LIKE <value>;
LIKE and NOT LIKE may only be used with character columns.SELECT <column names> FROM <table> WHERE <column name> NOT LIKE <value>;
<column name>, therefore, must identify a CHAR column, and
<value>must be a character string enclosed in single quotes.
In the simplest case, LIKE and NOT LIKE function like
So, for example, the statement
is equivalent to sayingSELECT teacher# FROM TEACHERS WHERE teacher_name LIKE 'Engle';
The power of these two operators is based on defining two special wild card characters: a percent sign (SELECT teacher# FROM TEACHERS WHERE teacher_name = 'Engle';
%) and an underscore (
_). When contained within the quoted character string value, a percent sign matches any string of characters, of any length. An underscore, on the other hand, matches any single character. Using these two wild cards, a wide variety of searches can be performed.
The WHERE clause in the example above would also have returned information about Dr. Engle's record if it had read
In this case, however, it would have returned information about other records in the TEACHERS table, too. To further limit the records returned, we could have saidWHERE teacher_name LIKE '%'
This would have returned records for all teachers with a name that ends in 'e' (recall that the difference between upper and lower case letters is significant inside quoted strings).WHERE teacher name LIKE '%e'
We could have also said
This would return information about teachers whose last names begin with 'E', end with 'e', and have three letters (any three letters) in between.WHERE teacher_name LIKE 'E___e'
Since LIKE returns records with fields that match a specified string (possibly with embedded wild card characters), NOT LIKE does just the opposite: it returns all records where the specified field does not match the string. For example, the SQL statement
would return the teacher number for every teacher except Dr. Engle.SELECT teacher# FROM TEACHERS WHERE teacher_name NOT LIKE 'Engle';
Examples for you to work on:
15. List the name and sex of every student whose name begins with "Jo".IN and NOT IN
16. List the names and home states of all students from states that begin with a 'C'.
17. List names, phone numbers of teachers whose phone #'s do not begin with 257.
Another possible way to restrict which records are selected is to use IN and NOT IN. Like the other mechanisms discussed so far, they appear in the WHERE clause, and only records that match their requirements are returned. The general form of an SQL statement using IN is
The form with NOT IN isSELECT <column names> FROM <table> WHERE <column name> IN (<values>);
In both casesSELECT <column names> FROM <table> WHERE <column name> NOT IN (<values>);
<values>consists of one or more values, separated by commas and enclosed in parentheses, all of a type comparable to that of the column identified by
<column name> is a CHAR column, the values must
all be quoted character strings. If
<column name> identifies
a numeric column, such as INTEGER or SMALLINT, then the values must also be numbers.
For IN, the records returned are those where the field named
contains one of the values specified in the parenthesized list. For NOT IN, the reverse is
true: the records returned are those whose
<column name> field
does not contain any of the values in the list.
Using IN is equivalent to grouping several tests for equality together with ORs. This is yet another example of a shorthand form provided by SQL for common requests. For example the statement
will return all records where theSELECT course_name, department FROM COURSES WHERE department in ('Math', 'English');
Similarly, NOT IN is equivalent to grouping tests for inequality together with ANDs.SELECT course_name, department FROM COURSES WHERE department = 'Math' OR department = 'English';
is equivalent toSELECT course_name, department FROM COURSES WHERE department NOT IN ('Math', 'English');
SELECT course_name, department FROM COURSES WHERE department <> 'Math' AND department <> 'English';
Some examples for you to work with:
18. List names and departments of all Math and English courses.IS NULL and IS NOT NULL
19. List names, cities, states of students not from CA or IL.
In the last chapter we described the special value NULL.
Unlike CHAR or numeric values, which can only be assigned to CHAR or numeric fields, respectively, the value NULL can be assigned to any field, regardless of its type.
NULL is different from other values as well in that it can't be used in simple comparisons.
A WHERE clause using the
operator to ask whether some field is NULL is undefined; it is neither true nor false.
Instead, the special construction IS NULL and IS NOT NULL must be used.
The general form of a SELECT statement with IS NULL is
For one using IS NOT NULL, the form isSELECT <column names> FROM <table> WHERE <column name> IS NULL
Like all WHERE clauses, these two select only records that meet their requirements. A query with IS NULL will return only thise records in which the field identified bySELECT <column names> FROM <table> WHERE <column name> IS NOT NULL
<column name>contains the value NULL, while those with IS NOT NULL will return only those in which
<column name>contains a value other than NULL.
The IS NULL construction is the only way to search for records containing the value NULL.
BETWEEN, LIKE, and the other WHERE clause options can never find NULL values.
Examples for you to work with:
20. List entire contents of records in ENROLLS with no grade recorded.More on SELECT
21. List names, cities, states, zip codes for students whose zip codes we know.
So far, we have seen several ways to use the SELECT statement to retrieve all or part of a table's data. In this chapter, we will see how that data can be ordered, grouped, or otherwise manipulated while it is being retrieved.
Ordering Selected Data
Recall that the order of records within a table has no significance to SQL.
When records are retrieved from a table, therefore, the order in which they appear is arbitrary. Sometimes, we wish to see the data resulting from a query ordered in a specific way. To allow this, SQL provides the ORDER BY option for the SELECT statement.
The general for of a SELECT statement using ORDER BY is
As before,SELECT <column names> FROM <table> <WHERE clause> ORDER BY <column names>
<column names>after SELECT and
<table>indicate the columns which should be selected from the named table. A
<WHERE clause>may optionally appear, specifying conditions which records must meet to be selected. Most importantly for our purposes, the
<column names>following ORDER BY indicate the column or columns whose values should be used to order the selected records. For example the statement
will produce an alphabetical list of all teachers. Note that the columns specified after ORDER BY must be drawn from those following SELECT in the query.SELECT teacher_name, salary FROM TEACHERS ORDER BY teacher_name;
Results may also be ordered by more than one column. Saying
will yield a list of students ordered first by state, then alphabetically by name within that state.SELECT student_name, city, state FROM STUDENTS ORDER BY state, student_name;
One more possibility: instead of giving the name of a column by which to order selected records, it is possible to give a column's number. Column numbers refer to columns in the results of the query, not in the original table, and go from left to right, beginning with 1 (one).
This means, for instance, that the first example shown above could have also been expressed as
The results would have been exactly the same, becauseSELECT teacher_name, salary FROM TEACHERS ORDER BY 1;
teacher_nameis the first column from the left in the results of this query.
SQL also allows records to be listed in reverse order.
Suppose, for example, that we wished for some reason to see a list of teachers and their salaries in reverse alphabetical order. We could use the DESC option with ORDER BY:
It's hard to see why it would be useful, but it is also legal to replace DESC with ASC. This causes the results to be listed in ascending order, just as if neither ASC or DESC were specified.SELECT teacher_name, salary FROM TEACHERS ORDER BY teacher_name DESC;
Some examples for you to work on, to practice:
22. Give an alphabetical list of teachers and their phone numbers.Doing Arithmetic With Selected Information
23. Give an alphabetical list of teachers and their phone numbers (again?!...).
24. Give a list of teachers, phone numbers in reverse alphabetical order.
25. List names, cities, and states of all students ordered first by state, then by city within that state.
26. List all three credit courses alphabetically by department.
Retrieving the information stored in a table is undoubtedly useful. Sometimes, though, the stored information is not exactly what we're interested in.
Suppose, for example, that we wish to know how much each teacher's salary would be if we gave everyone a flat $1,000 raise. Or, perhaps we'd like to compare this with what the new salaries would be after an across-the-board 10 percent increase.
these kinds of queries, SQL provides the four basic arithmetic operations of addition, subtraction,
multiplication, and division. (While addition and subtraction are represented in queries by the familiar
- symbols, multiplication and division use the symbols
Within a query, for instance, we can add a fixed amount to every value in a column, or multiply each of a column's values by some amount. It is even possible to add two columns together or subtract the values of one column in a record from those of another.
To see what each teacher's salary would be after a flat $1,000 raise, you could type
Because the second column of the results will be derived from one of the table's original columns, we give it a new name by addingSELECT teacher_name, new_salary = salary + 1000 FROM TEACHERS;
in front of the arithmetic expression. In the results table, that column now will be labelled with our new name. If you had not specified this new name, the name assigned to the newly created column would depend on your particular implementation of SQL.
Similarly, to see what the teachers' salaries would be after a 10 percent raise, the query is
As usual, a WHERE clause can be added to the query to restrict which records are selected. To see, for example, the results of giving a 10 percent raise only to those teachers earning less than $30,000, one could typeSELECT teacher_name, new_salary = salary * 1.10 FROM TEACHERS;
As might be expected, these arithmetic operations can only be used on columns containing numeric values (in case you were wondering).SELECT teacher_name, new_salary = salary * 1.10 FROM TEACHERS WHERE salary > 30000;
Also, the four operations can be combined to form arbitrarily complex expressions. When two or more operations
are combined in a single query, SQL defines rules for the precedence of each. In general, all occurrences of
/ (multiplication and division) are performed first, followed by all occurrences of
- (addition and subtraction). Operations of equal precedence are carried out left
What about NULL values? Recall from the chapter on NULL values that a NULL in a numeric field isn't the same as a 0 (zero), but is instead a value unique to itself. How then are NULLs handled during arithmetic operations?
The rule turns out to be somewhat extreme, but very simple: any arithmetic operation applied to a column containing a NULL value in one of its fields will return the value NULL. Check to see if the rule is respected or not by Access: What does Access do when you're adding values some of which are NULL? (Also what does Access do when you're taking the AVG of several values, some of which are NULL? What does this mean? )
Example, to practice with these notions:
27. What would each teacher's salary be if he or she received a 5.5 percent cost of living increase and a $1,500 merit increase?Using Aggregates
SQL provides a number of built-in functions for performing common operations. These functions, called "set functions" by the SQL standard but more often referred to as aggregates, make it easy for us to find the maximum or minimum value in a column, find the sum or average of a column's values, and count the number of records in a table. While all of these things would be possible without them, SQL's aggregates make the task much simpler.
Finding Maximum and Minimum Values
It is often interesting to ask about the limits of data values stored in a table.
Both MAX and MIN can be used like the name of a column in a SELECT.
You could type, for example,
orSELECT MAX(salary) FROM TEACHERS;
The first query would select and display the maximum value from theSELECT MIN(grade) FROM ENROLLS;
salarycolumn of the TEACHERS table, while the second would display the minimum value from the
gradecolumn of the ENROLLS table. The name of the query's result defaults to, for example, MAX(salary). A different name can be assigned, if desired, through the mechanism seen earlier, e.g.,
(It's worth noting that both the default name assigned to the query's result and the exact syntax used to change that default name may vary from one implementation of SQL to another).SELECT maximum = MAX(salary) FROM TEACHERS;
It is also possible to use a WHERE clause to select the maximum or minimum value from a group of records meeting some specified criteria. We could, for example, say
This query selects the largest salary only from those records in the TEACHERS table that have aSELECT MAX(salary) FROM TEACHERS WHERE teacher# > 300;
teacher#greater than 300 (whatever that may signify).
What if, instead of the query using MAX above, you wanted to know not only what the maximum teacher salary was, but also who earned it? You might have naively typed
Unfortunately, you would have received an error message from SQL, as the above query is illegal. It's illegal because, in simple cases such as this, SQL requires that a query return the same number of values for every column name specified. Thus far, this has not been a problem, since each table has the same number of values in each of its columns.SELECT teacher_name, MAX(salary) FROM TEACHERS;
When using MAX and MIN, however, the situation changes. As mentioned above, MAX and MIN function somewhat like column names, and yet by their nature, both always return only a single value. Because of SQL's rule that queries must always return the same number of values for each specified column name, no other column names (except for those appearing within other aggregates) can be specified along with MAX and MIN. As we shall see, all SQL aggregates share this same limitation. (There is one significant exception: using GROUP BY, described at the end of this chapter, allows other column names to appear in a SELECT along with MAX, MIN, and other aggregates).
Although they probably make the most sense when used with numeric columns, MAX and MIN can also be used with character columns. With character columns it's as if all the values in the column are first placed in alphabetical order, then the last (for MAX) or the first (for MIN) value is returned.
As was the case with arithmetic operations, we must once more ask: what about NULLs? Unlike the arithmetic operators, all of which return NULL if any of a column's values are NULL, MAX and MIN ignore NULL values. This makes some sense, since it's hard to see how NULL could be compared to other values in a particular column.
All aggregates, including MAX and MIN, share some other restrictions on their use. For one thing, aggregates cannot be nested, i.e., the value passed to an aggregate can't include some other aggregate. Also, aggregates cannot appear in a WHERE clause. To find the name of the teacher with the highest salary, for example, you can't just type
This query is illegal. Instead, specifying conditions including aggregates requires using either GROUP BY and the HAVING clause, described at the end of the chapter, or, more likely, a subquery, described at the end of this document.SELECT teacher_name FROM TEACHERS WHERE salary = MAX(salary);
Examples for you to work with:
28. What is the largest salary paid to a teacher?Totalling Columns
29. What is the lowest grade earned by a student?
Another useful piece of information to extract from a table is the total of all values in a particular column. The president of this university, for example, might wish to know the total of all teacher salaries. To allow this, SQL provides the SUM aggregate. As its name suggests, SUM returns the sum of a particular column. To answer this question, the president could type
The result would be a single value, the total of all teacher salaries.SELECT SUM(salary) FROM TEACHERS;
It is also possible to determine the sum of a column's values only for records meeting some specific criteria. For example, to see the total salaries for all teachers earning over $30,000, the query is
Like MAX and MIN, SUM cannot be used exactly like an ordinary column name in a query. Because it returns only a single value, it may only be used by itself or with other aggregates that also return a single value, such as MAX and MIN. Also like MAX and MIN, SUM ignores any NULL values it encounters in the specified column.SELECT SUM)(salary) FROM TEACHERS WHERE salary > 30000;
A couple of examples for you to work with:
30. What is the total of all teacher salaries?Finding Averages
31. What is the total salary for all teachers earning over $30,000?
One more useful piece of information is the average of a column's values.
To answer questions like this, SQL provides the AVG aggregate. To learn the average teacher's salary, for instance, one could type
The result of this query is the average of all values in theSELECT AVG(salary) FROM TEACHERS;
salarycolumn of the TEACHERS table. Like MAX, MIN, and SUM, AVG returns only a single value, and so cannot be mixed with regular column names in a query (again, unless GROUP BY is used, as described later).
And unsurprisingly, AVG also ignores any NULL values it encounters in the specified column.
Again, a WHERE clause can be used to limit which records have their values included in the average. To find the average salary for all teachers earning over $30,000, one could type
A few problems for you to think about these things and practice:SEKECT AVG(salary) FROM TEACHERS WHERE salary > 30000;
32. What is the average of all teacher salaries?Counting Records
Yet another facility SQL gives its users is the ability to count the number of records in a table which meet some specified criteria. This is accomplished with the COUNT aggregate. COUNT appears where the column names would normally go in a SELECT statement and is followed either by one or more column names or by just an asterisk. Both the column names and the asterisk must be enclosed in parentheses. To learn how many students are enrolled, one could type either
or justSELECT COUNT(student_name) FROM STUDENTS;
SinceSELECT COUNT(*) FROM STUDENTS;
student_nameappears in every record in the table, both queries probably would produce the same results. If any of the values for
student_namewere NULL, however, the two queries almost certainly would not produce equivalent results:
As with other aggregates, the records to be counted can be restricted using a WHERE clause.
Using our same example of teachers, we could learn how many of them earn over $30,000 by typing
Like MAX, MIN, and AVG, COUNT returns a single value, and therefore can't be mixed with regular column names in a query (except with GROUP BY, as described shortly). And unlike AVG and SUM, but like MAX and MIN, COUNT can be used on both character and numeric columns.SELECT COUNT(*) FROM TEACHERS WHERE salary > 30000;
Here, now, are two questions for you to practice:
33. How many students are there?Using DISTINCT
34. How many teachers earn over $30,000?
It's often the case that several records in a table will contain the same value in one or more of their fields. Sometimes, it can be useful to form queries which select each of those duplicate values only once. We may, for instance, wish to know from how many different states the students in this university come. For answering questions like this, SQL provides the DISTINCT option.
DISTINCT can be used in two somewhat different ways. When used with COUNT and the other aggregates, the word DISTINCT can appear before the column name, like
Adding DISTINCT beforeSELECT COUNT(DISTINCT state) FROM STUDENTS;
statetells SQL that each time the same value appears more than once in that column, only one of those values should be counted. This query, therefore, will return the number of different student home states. If desired, DISTINCT can also be used with SUM, AVG, and other parts of the language.
Using DISTINCT on a column where no two records have the same value is legal, but not very useful. One could, for instance type
but because no two values forSELECT COUNT(DISTINCT student#) FROM STUDENTS;
student#are the same, results would be just the same as those from
(As mentioned earlier, the SQL standard requires DISTINCT before a column name used with the COUNT aggregate, but many implementations don't enforce this).SELECT COUNT(student#) FROM STUDENTS;
DISTINCT can also be used on queries which do not use aggregates.
In this case, the word DISTINCT appears immediately after the word SELECT but before the list of column names. Because we will often select the values from only a subset of the columns in a table, our results may contain duplicate records even though all complete records in the original table were unique. DISTINCT can be used to list only one occurrence of any duplicates which would otherwise appear. We show this in the next example.
Suppose, for instance, that we wished to list alphabetically the different states from which students come. We could do this with
Run this query to see the results.SELECT state FROM STUDENTS ORDER BY state;
If the DISTINCT option is used, however, like
duplicates are eliminated from the results, yielding a list of unique values.SELECT DISTINCT state FROM STUDENTS ORDER BY state;
Examples, to practice with:
35. How many different states do students come from?Grouping Selected Data
With all the flexibility described so far for retrieving data from a table, one major function has not yet been discussed: using GROUP BY to organize the selected data into groups.
When used with aggregates, GROUP BY controls the level at which the aggregates are computed. For example, with GROUP BY we could find the maximum grade in each section of each course, or the average number of students taught by each teacher.
The general form of a SELECT statement using GROUP BY is
As always, the firstSELECT <column names> FROM <table> <WHERE clause> GROUP BY <column names> ORDER BY <column names>;
<table>indicate the columns which should be selected from the named table. A
<WHERE clause>may optionally appear, specifying restrictions which individual records must meet to be selected. Next appears GROUP BY and its
<column names>, indicating which column or columns contain the values to be used in forming the groups. Optionally, the SELECT statement may end with an ORDER BY followed by one or more
<column names>by whose values the results should be ordered. The column names following both GROUP BY and ORDER BY must be drawn from those specified immediately after SELECT.
Consider, for example, the information contained in the ENROLLS table. For each class in which a student is currently
enrolled, this table contains a record indicating that student's
student#, the class's
section#, and the
grade that student has received. Several students have more than one record
in this table, since they are enrolled in more than one class.
Suppose we wish to know the number of students in each course. One way to learn this would be to simply list the entire table, ordered by course number, then actually count the number of records for each course. Using GROUP BY and the COUNT aggregate, however, we can let SQL do the counting for us. For example, the query
would return two columns of information: the first a list of course numbers, the second, entitledSELECT course#, num_enrolled = COUNT(*) FROM ENROLLS GROUP BY course#;
num_enrolled, a count of the number of students enrolled in that course. This query is (conceptually) processed as follows:
course#. (Try to imagine how this looks, to be able to comprehend the whole process).
course#. All the records with a
course#value of A114 make up the first group, while all those with a
course#of A201 would make up the second group, and so on.
Those results are
Recall that earlier we said that aggregates like COUNT always returned only a single value, and so could only be used in limited ways. When using GROUP BY, such operations return a single value for each group, and so can be pressed into service in a more versatile way. GROUP BY places restrictions on exactly how we can use column names in a query, however. In particular, the column names following SELECT must either be part of aggregates, such as COUNT or AVG, or they must appear as one of the column names in the GROUP BY clause.course# num_enrolled -------------------- A114 4 A201 3 H480 5 M119 6
For instance, suppose we wished to learn both the number of courses in which a student was enrolled and his or her average grade. A query which accomplishes this is
As before, the ENROLLS table is first ordered (again, conceptually) bySELECT student#, AVG(grade), courses = COUNT(*) FROM ENROLLS GROUP BY student#;
student#. Next, the two operations specified, AVG and COUNT, are applied to each group of records with the same value for
student#. Finally, the results are displayed in three columns, labelled
Examples, to think about, and work on:
36. List the number of students enrolled in each course.More on Grouping: the HAVING Clause
36.1 List the maximum number of students that can enroll in each course.
37. List the number of courses taken by each student.
38. List the average grade and number of courses taken by each student.
39. Assuming a fee of $450.00 per course, determine each student's total tuition bill.
40. List the average grade and number of students for each section of each course.
Using GROUP BY is in some ways like doing several queries simultaneously on a single table. As with most queries, though, we would like some way to restrict the values of records which are included in the query. Up to now, this has always been done with a WHERE clause. Using WHERE clauses with GROUP BY can be tricky, however. Remember that WHERE clauses specify criteria which individual records must meet to be selected by the query. For example you could say:
to learn what each student's tuition bill was only for those classes in which they received a C (2.0) or better. If we want to place restrictions on the groups themselves, however, we must use the HAVING clause. A HAVING clause works like a WHERE clause, but it is applied to groups rather than to individual records. Now, only those groups which meet the criteria specified in the HAVING clause will be included in the query's results.SELECT student#, tuition_costs = COUNT(*) * 450.00 FROM ENROLLS WHERE grade = 2.0 GROUP BY student#;
The general form of a SELECT statement using GROUP BY and HAVING is
As before, the first occurrence ofSELECT <column names> FROM <table> <WHERE clause> GROUP BY <column names> HAVING <predicate> ORDER BY <column names>
<table>indicate the columns which should be selected from the named table. The
<WHERE clause>may optionally appear (placing restrictions only on values of individual records selected for the groups), followed by GROUP BY and one or more
<column names>, indicating the column or columns to be used in forming the groups. Next comes the word HAVING and a
<predicate>, specifying the conditions which the groups themselves must meet. And once again, the SELECT may optionally end with an ORDER BY followed by one or more
For example, to see the average grade and number of courses taken only by those students with an average grade greater than 2.5, one could type
A WHERE clause could not be used in this case, since the restriction on average grades of students applies only to each group, not to each record. Similarly, to see the average grade and number of students for only those sections with more than four students, the query would beSELECT student#, AVG(grade), courses = COUNT(*) FROM ENROLLS GROUP BY student# HAVING AVG(grade) > 2.50;
Again, a WHERE clause could not be used, since the restriction of having more than four students applies to each group, not to any of the individual records within a group. In both this and the previous example, the groups appearing in the results of each query are restricted to those which meet the requirements specified in the HAVING clause.SELECT section#, AVG(grade), students = COUNT(*), FROM ENROLLS GROUP BY section# HAVING COUNT(*) > 4;
A HAVING clause may also be used without GROUP BY. In this case, it places a restriction on the entire result of the query. With no GROUP BY clause specified, those results are treated as one big group. Using HAVING without GROUP BY might seem useless, and in most cases it is. When used with certain kinds of subqueries, however, as described in the chapter on subqueries below, HAVING clauses can be quite useful on their own.
Some examples to think about:
41. List average grade and number of courses taken by students with an average grade of more than 2.5Retrieving Data From Several Tables: Joins
42. List the average grade and number of students for each section with more than four students.
Every query we have seen so far retrieves data from only a single table. If all of a database's information were contained in one large table, these type of queries would be all that was required. Practically, though, storing all the information in just one table would require maintaining several duplicate copies of the same values. In real systems, therefore, all but the very simplest databases divide their information among several different tables.
Both in real databases and in our simple example database, then, many interesting questions cannot be answered by retrieving data from only a single table. Instead, we must form queries which simultaneously access two or more tables. Any query that extracts data from more than one table must perform a join. As its name suggests, a join means that some or all of the specified tables' contents are joined together in the results of the query.
The simplest join is the cartesian product.
In every SELECT statement, we must indicate the names of the columns we are interested in. So far, all of those columns have been from the same table. With joins, however, we will be selecting columns from two or more tables simultaneously. Although all of the columns in a single table must have unique names, columns in two different tables may have the same name.To identify a column in the database uniquely, then, we must use its qualified name.
A qualified name consists of the name of a table, followed by a period and the name of a column in
that table. For example, the qualified name for the
teacher# column in the SECTIONS table
while the qualified name for that same column in the TEACHERS table is
Because every column in a table must have a different name, and because SQL requires every table in a database to have a different name, a qualified name is guaranteed to identify exactly one column in one table, so we're safe.
To see how qualified names are used in a query, suppose that we wish to see each value for
from TEACHERS along with each possible value for
course# from SECTIONS. Requesting
teacher_name from TEACHERS along with
course# from SECTIONS is easy: both column names are unique
in the database. But selecting
teacher# from SECTIONS requires using the qualified name for
since a column with that name appears in both tables. A formulation of this query might be:
(What are the results of this query? Probably not what you'd expect -- see the next section).SELECT teacher_name, SECTIONS.teacher#, course# FROM TEACHERS, SECTIONS
As we have already seen, qualified names are not required when there is no possibility of ambiguity.
In the above query, for instance, just the column name was required for
TEACHERS and SECTIONS have only one column with this name between them. Despite the fact that a column
course# also appears in the COURSES table, the list of tables in the above query's
FROM clause limits its scope to TEACHERS and SECTIONS.
We have been taking advantage of this automatic limitation all along. For instance, in the query
SELECT student# FROM STUDENTS;
student#must refer to the column of that name in the STUDENTS table. Similarly, in the query
there is still no ambiguity, because the FROM clause specifies from which table the values forSELECT teacher# FROM TEACHERS;
teacher#are to be drawn. Even simple queries can use qualified names if desired, like
although it's hard to see why this would be useful. In the first example given above, however, ambiguity was possible, and so a qualified name was required.SELECT STUDENTS.student_name FROM STUDENTS;
Because it is always legal to use qualified names, even when they're not required, another way to express that first query would've been
What is a Join?SELECT TEACHERS.teacher_name, SECTIONS.teacher#, SECTIONS.course# FROM TEACHERS, SECTIONS;
What are the results of the first query above?
If you type
what will come back in return? Obviously, the result will have three columns:SELECT teacher_name, SECTIONS.teacher#, course# FROM TEACHERS, SECTIONS;
course#. But what values will be contained in those three columns?
To carry out this query, SQL will list values from each record of both tables. In fact, it will list all possible combinations of the selected columns from all records in the two tables.
In other words, the results will begin with
a line containing the values of
course# from the first record in SECTIONS matched
with the value of the
teacher_name from the first record in TEACHERS. Next will come a line
matching the next
teacher_name with the selected values from the first record in the SECTIONS table,
and so on, until each of the records in the TEACHERS table has been matched with the first SECTIONS record.
the results will contain a series of lines that match the selected values from the second record in SECTIONS with
teacher_name value from every record in the TEACHERS table, and so on.
By match we simply mean put together. In this query there's no testing of the values that participate in the resulting records. Nowhere does anything restrict the output to only those records that refer to the same teacher.
So the result of this seemingly simple query is a long list of combinations, each consisting of a value from a record in TEACHERS paired with two values from a record in SECTIONS.
We called this: cartesian product. Check the section below on restricting the results of a join.
Examples, a few, for you to work with:
43. List all teacher names along with all the values for teacher number and course number contained in the SECTIONS table.Restricting the Results of a Join
43.1 What if you try this:Question: How many records in the output?SELECT teacher_name, SECTIONS.teacher#, course# FROM TEACHERS, SECTIONS;
Another question: How's 43 supposed to be different so its result be meaningful?
What happens if, in a query accessing two tables with columns of the same name, we specify the same column name twice, once for each table?
Doing this requires that we give the qualified name for each, like
The results are like those specified above - a list of all possible combinations of the selected columns from the two tables - except that the column calledSELECT teacher_name, TEACHERS.teacher#, SECTIONS.teacher#, course# FROM TEACHERS, SECTIONS;
teacher#appears twice. Try the query yourself to see the results.
As always, each column is labelled with the appropriate name, so for the two
teacher# columns, both column names are
qualified names. These two columns contains all possible combinations of values for
teacher# from the two tables.
Usually, such general results as these are not very useful. Too much information is given, and it is difficult to pick out what's interesting. To reduce the size of the results and thereby zero in on the answer to some particular question, we can add a WHERE clause to the above query.
We might wish to know, for example, which teachers are teaching which courses. Further, we may wish to identify those teachers by name, not just by their teacher numbers. This information is available in the results from the query above, but it is not in a very concise or usable form. By adding a WHERE clause, we can retrieve only those results of the join in which we are interested.
To see which teachers teach which courses, look through the results just given. Note that whenever the
teacher# from the TEACHERS table is equal to the value of
the SECTIONS table, the course whose
course# appears in that record is one which is taught by
that teacher. Because of the way this database is defined, the two
teacher# columns must have
Columns like these, sometimes called join columns, exist in each of the tables in our example database. Similar columns will also exist in most or all of the tables in real relational databases. The information stored in these join columns gives us a very powerful tool for accessing the stored data.
Back to the original question: which teachers teach which classes?
To select just those records from the mass of information given above, one could type
The results are only those records which meet the condition specified in the WHERE clause: those where the twoSELECT teacher_name, SECTIONS.teacher#, course# FROM TEACHERS, SECTIONS WHERE TEACHERS.teacher# = SECTIONS.teacher#;
teacher#values are equal. We can now see easily which courses are taught by each teacher, with teachers identified by name (run the query to see for yourself).
Even though they are used in a WHERE clause, the values for
teacher# need not be selected by the query.
If we were interested in seeing only teachers' names and the numbers of the courses they teach, we would have typed
This time, the results include only theSELECT teacher_name, course# FROM TEACHERS, SECTIONS WHERE TEACHERS.teacher# = SECTIONS.teacher#;
course#values for those records which meet the WHERE clause's condition, i.e., a list of teachers and the courses taught by each.
Because the SECTIONS table contains records only for currently offered sections, we could see only the names of those teachers who are currently teaching some section with
Since all teachers in our sample database are assigned to a section, theSELECT teacher_name FROM TEACHERS, SECTIONS WHERE TEACHERS.teacher# = SECTIONS.teacher#;
teacher_names listed by this query are identical to those of the previous one.
The important thing about this example, though, is its FROM clause: despite the fact that no values from the SECTIONS table are selected, that table must still appear in the FROM clause, since a value from SECTIONS is referenced in the WHERE clause. The names of all tables which have any of their columns referenced anywhere in the query must appear in the query's FROM clause.
Examples, more, to think about this and practice:
44. Which teachers teach which courses? (List teachers by name.)Manipulating the Results of a Join
45. What is the maximum enrollment in each section of each course?
46. List the grades received by each student.
Just as with queries which access a single table, queries accessing two or more tables can use the full power of SQL. For instance, all options of the WHERE clause are available, including LIKE and NOT LIKE, IN and NOT IN, NULL and NOT NULL, and AND, OR, and NOT. Also, arithmetic operations can be performed on the results before they are displayed, just as with simpler queries. And by using ORDER BY, the results of a join can be listed in either ascending or descending order, based on the values of any selected columns.
The aggregates described in the previous chapter can also be applied. To learn the number of rows in a table, for example, the COUNT aggregate can be used, while average values can be computed with AVG. In general, SQL's features are additive: its basic tools can also be used in more complex contexts, such as within joins.
Some examples, or problems, to think about, that illustrate these points:
47. Which students from California or Illinois got A's or B's?Aliases
48. Assuming that tuition is $450.00 per course, what is Larry Bird's total tuition bill?
49. List the teacher for and number of students in each section. Order this list of courses by number of students, from largest to smallest.
Every table in a database must have a unique name. Sometimes it is useful to assign an alias to a table, a name which can be used in place of the table's real name.
Aliases, called "correlation names" in the SQL standard, exist only for the life of a particular query, e.g., for the duration of one SELECT statement.
A common reason to use aliases is to reduce the amount of typing required to enter a query. To create an alias name for a table, simply type the alias name after the table name in the FROM clause. Leave at least one space between the table's name and the alias. For example the query
defines two aliases: S for STUDENTS and E for ENROLLS. This query could also be entered asSELECT student_name, grade FROM STUDENTS S, ENROLLS E WHERE course# = 'H480' AND section# = 1601 AND S.student# = E.student#
and the results would be identical. The only difference is in the number of characters typed.SELECT student_name, grade FROM STUDENTS, ENROLLS WHERE course# = 'H480' AND section# = 1601 AND STUDENTS.student# = ENROLLS.student#
If qualified names are used in specifying the selected columns, it is even possible to use an alias before it is defined. We could for example say
Here, the aliases are defined as always in the FROM clause, but are used in both the qualified names and the WHERE clause.SELECT C.department, C.course#, S.section#, S.num_students FROM COURSES C, SECTIONS S WHERE S.course# = C.course#;
Because their most common use is to reduce the amount of typing, alias names are often kept to a single character. It is legal, however, to use longer alias names. It's even possible to create an alias name which is longer than the original table name.
Some examples of using aliases in a join for you to produce:
50. List the names and grades of all students enrolled in section 1333 of A114.Joins With Three Tables
51. For each department, list its courses and sections together with the number of students in each.
Earlier in this chapter, we saw which teachers taught which courses. In that example, the teachers were listed by both name and number, while the courses were listed by number only. We may also wish to list the courses by name. To do this, we must extract information from three different tables:
A join with three tables is a fairly straightforward extension of the two-table join. We must list all necessary tables in the FROM clause, then include appropriate restrictions in the WHERE clause. To answer the question posed above, we could type
The results contain the four named columns, each with values only for those records that meet the restrictions in the WHERE clause. In other words, this query produces a list of teachers and teacher numbers, together with both the names and numbers of the courses they teach.SELECT teacher_name, SECTIONS.teacher#, SECTIONS.course#, course_name FROM TEACHERS, SECTIONS, COURSES WHERE TEACHERS.teacher# = SECTIONS.teacher# AND SECTIONS.course# = COURSES.course#;
(If you're still having trouble believing that the simple restrictions given in the WHERE clause produce the desired results, try the query without the WHERE clause. If you manually search through the results, you'll find that the resulting rows for which the conditions in the WHERE clause are true are exactly those desired.)
Examples, to work on and practice with:
52. List the names of all teachers, along with the names and numbers of the courses taught by each.Join With More Than Three Tables
53. In which courses did any female student get an A? (List both the courses and the students by name).
54. List the names of all teachers and their departments, ordered alphabetically by department.
55. List each student's grade for each course. Order the list alphabetically by course name, then, for each course, by student name.
56. Which courses are the students from Pennsylvania, California, and Connecticut taking?
57. List alphabetically the names and grades of all male students enrolled in M119.
A single query can access information stored in any number of tables within a database. Four, five, or more tables can be joined together, usually in combination with an appropriate WHERE clause, to retrieve the required information. In SQL, these larger joins are constructed just like those using only two or three tables. The only differences are the number of tables in the FROM clause and the complexity of the WHERE clause.
More examples, of joins with four tables, to work out:
58. How many students are in Dr. Engle's English class?Using GROUP BY in a Join
59. Produce a list of the names and addresses of Dr. Horn's students. Order the list alphabetically by student name.
60. Produce a roster for each course by teacher, listing the courses and sections they teach along with the names of all students enrolled in each.
The GROUP BY clause is a powerful, but sometimes complex, part of SQL. When used in a join, both its power and its complexity are magnified.
Recall how GROUP BY works with single-table queries:
These same basic rules apply to joins using GROUP BY. Suppose, for instance, that we wish to know the total number
of students possibly taught by each teacher. Answering this question requires information from both the TEACHERS table and the
SECTIONS table, so a join is required. Since we want to look at students taught by each teacher, we must join the two
tables using the
teacher# column in each. Also, because we wish to learn the total maximum number of students
taught by each teacher, we can group the data by
teacher_name and use SUM to add the groups.
The resulting SQL query is
This query will (again, conceptually) first perform the join, yielding only the selected columns from those records in the two tables whereSELECT teacher_name, SUM(num_students) FROM TEACHERS, SECTIONS WHERE TEACHERS.teacher# = SECTIONS.teacher# GROUP BY teacher_name;
Next, these chosen records are ordered according to the values in theirTEACHERS.teacher# = SECTIONS.teacher#
teacher_namefield, and all records with the same value for
teacher_nameare assigned to a distinct group. Finally, the values in the
num_studentsfields of the records in each group are added together, and the results are printed. (In the example database, each teacher teaches only one class. For this query, then, each group contains only a single record. If any teachers taught more than one class, their group would also contain more than one record.)
In general GROUP BY in joins behaves just like GROUP BY in single-table queries. As in those simpler queries, a HAVING clause may also be specified to restrict the groups which are included in the query's results.
Examples, to think about using GROUP BY in a join:
61. What is the total number of students taught by each teacher?Queries Within Queries: Subqueries
61.1 What is the maximum number of students a teacher could teach?
62. List in descending order the total number of credits awarded for each course.
63. How many students of each sex are enrolled in A201.
A relational database consists entirely of tables. Every piece of information is stored in some column within one of the databases' tables. SQL, since it is a query language for relational database systems, operates on these tables.
But you may already have noticed an interesting fact: the results of an SQL query are also a table. True, these tables are somewhat ephemeral, often existing only on a terminal screen. But even though a results table does not become part of the stored database, it, too, can be operated on via SQL queries. Performing this sort of access requires the use of subqueries.
What is a Subquery?
The result of every SQL query is a table. Sometimes, that table is very simple, containing only a single row and column (or even being entirely empty). In other cases, a results table may be much larger than any of these tables actually stored in the database. In either case, the table which results from executing one query (the subquery) can be the input for another query. It is even possible for the results of that query to be the input for yet another, and so on. In practice, though, queries are seldom nested more than two deep.
Many of the queries that we have already seen can be recast using subqueries. For some questions, including some apparently straightforward ones, using a subquery is the easiest route to the answer. For others, using a subquery is virtually a requirement.
Suppose we wish to know the names of all students who received a B (expressed numerically, a 3) in any of their classes. We could learn this with:
(Since we want each selected student's name listed only once, and since some students may have received B's in two or more classes, this query uses the DISTINCT option).SELECT DISTINCT student_name FROM STUDENTS, ENROLLS WHERE grade = 3 AND STUDENTS.student# = ENROLLS.student#;
We could also answer this question using a subquery, like:
The query within parentheses is the subquery. When the entire query is executed, this subquery must be completed first. The result of the subquery is the following table:SELECT DISTINCT student_name FROM STUDENTS WHERE student# IN (SELECT student# FROM ENROLLS WHERE grade = 3);
This table is then used as input to the WHERE clause in the main query.student# ________ 148 210 298 298 473 558
The condition expressed in that WHERE clause (
student# IN ...) will be true
only for those records in the STUDENTS table whose
student# is in the list above,
the table returned by the subquery. Since the above table contains only the
of students who received a B, only the
student_name's from those records in STUDENTS
will appear in the results of the entire query.
Note also that DISTINCT need not be used in this case, because of the way IN works: the WHERE clause is
true for any
student# that appears at least once in the result of the subquery. Even if
student# appears more than once, as does the number 298 in the example above, that student's
name will not appear more than once in the result of the entire query.
Subqueries can appear in HAVING clauses (and thus in SELECTs using GROUP BY) as well as in WHERE clauses. The rules are the same: the subquery is executed first, and its results are used as input to the condition specified in the HAVING clause. Like a WHERE clause, the HAVING clause then evaluates to either true or false.
According to the SQL standard, all subqueries must be surrounded by parentheses. As with so many parts of the standard, some implementations don't enforce this restriction. Using subqueries can become rather complicated, however, and so using clarifying parentheses is always a good idea.
Example, to think about and practice with:
1. Which students received B's in any course?Using Comparison Operators With Subqueries
The various comparison operators that have been described before (
>=) can be used
with subqueries. In addition, the IN and NOT IN keywords may also be used.
We have already seen an example using IN. NOT IN is used with subqueries in an analogous way to IN. With NOT IN, however, the WHERE clause is true if and only if the desired values are not found in the table resulting from execution of the subquery.
Using the comparison operators is more problematic. Consider how
= is used in
As it happens, this query will work as expected, and return theSELECT student_name FROM STUDENTS WHERE student# = (SELECT student# FROM ENROLLS WHERE grade = 1);
student_nameof the one student in our example database who received a D (or numerically, a 1). But suppose that more than one student had received a D? Or suppose we had entered this same query, but asked about B's instead of D's?
In this case, SQL is faced with the impossible task of
comparing a single value to a table full of values. The exact
results of such a query vary from system to system, but in general,
= operator should not be used with subqueries unless
you know in advance that the result of the subquery is a table with
only one record. When you don't know this (which is most of the
time), use IN. The effect is the same, but the subquery is allowed
to return a table with multiple values.
<> with subqueries causes similar problems.
Instead of this potentially troublesome comparison operator, then, it
is usually simpler to achieve the same effect by using NOT IN.
The other comparison operators, including
can't be used alone unless the subquery will return a single value. Unlike
<>, however, there is no simple substitute. (They can, however, be
used in conjuction with ANY and ALL, as described in the next section).
Nevertheless, subqueries that are guaranteed to return only a single value are not that unusual, so the comparison operators are still useful on their own in some cases. Since a subquery can use aggregates like any other query, we could see, for instance, which teachers earn a lower than average salary with the following SQL code:
This query will return theSELECT teacher_name FROM TEACHERS WHERE salary < (SELECT AVG(salary) FROM TEACHERS);
teacher_names of only those teachers whose salaries are below the average. Note that since aggregates like AVG can't appear in a WHERE clause, the simplest way to answer this question is with a subquery.
Similarly, to retrieve information about sections with enrollments greater than average, we could type
In both cases, the table resulting from execution of the parenthesized subquery contains only a single value, and therefore is fair game to be used immediately following a comparison operator.SELECT course#, section#, num_students FROM SECTIONS WHERE num_students > (SELECT AVG(num_students) FROM SECTIONS);
It is also possible to use AND, OR, and NOT, in a WHERE clause involving subqueries. In fact, a single query can tie together two or more subqueries using these constructs. More commonly, though, a WHERE clause will combine a subquery with a some more common condition, such as a test for equality, in a single query.
Example, to think about:
2. Which teachers earn less than average?Using ANY and ALL
3. Which sections have more than the average number of students?
As was just described, a subquery which immediately follows one of the comparison operators is expected to return only a single value. It is possible, however, to use comparison operators with subqueries returning multiple values by following the operator with either of the words ANY or ALL.
When ANY is used, the WHERE or HAVING clause containing the subquery will be true if the specified operator is true for any of the values returned by the subquery. In other words, if the expression containing the operator is true for at least one of the values returned by the subquery, the entire WHERE or HAVING clause will be true. (The SQL standard also allows the word SOME to be used as a synonym for ANY, with no change in meaning. Once again, however, not all implementations of SQL support this substitution). For example we could say
The result of this query will be the names of all students who received at least one B. Because the subquery will return more than one value, we could not use theSELECT student_name FROM STUDENTS WHERE student# = ANY (SELECT student# FROM ENROLLS WHERE grade = 3);
=operator by itself. Following it with ANY, however, allows us to compare the
student#from STUDENTS with each of the results of the subquery.
When used to test for equality, using ANY is exactly equivalent to using IN. Therefore the query
will return results identical to those of the previous example. With the other comparison operators, however, ANY can be a useful addition, since it allows comparisons against subqueries which return more than one value.SELECT student# FROM STUDENTS WHERE student# IN (SELECT student# FROM ENROLLS WHERE grade = 3);
For example, to see the names of students whose average grade is higher than that of the student with the lowest average grade (and to provide an example of a HAVING clause containing a subquery), we could type:
The result of the subquery is a table containing the average grade of every student (note that this is not the same as the student's grade point average, or GPA, since the number of credits given for each course is not included in this calculation). The outer query also calculates every student's average grade, but relies on the subquery within its HAVING clause to determine which of those students' names and average grades appear in its results. Only those students whose average grade is greater than at least one other student's average grade will be selected. In other words, this query will list the names of all students except those with the lowest average grade.SELECT student_name, AVG(grade) FROM STUDENTS, ENROLLS WHERE STUDENTS.student# = ENROLLS.student# GROUP BY student_name HAVING AVG(grade) > ANY (SELECT AVG(grade) FROM ENROLLS GROUP BY student#);
ANY works in an analogous fashion. Preceding a subquery with some comparison operator and the word ALL means that the WHERE or HAVING clause containing the subquery will be true only if the comparison is true for all values returned by the subquery. For example, to see the names of all students who received a grade in some course that equaled or exceeded the highest all-around average grade, one could type
The results of this query would be the names and grades of students meeting the requirement specified above. Because a single student may have received a qualifying grade in more than one class, DISTINCT is used to cause that student's name to appear only once in the query's results.SELECT DISTINCT student_name, grade FROM STUDENTS, ENROLLS WHERE STUDENTS.student# = ENROLLS.student# AND grade >= ALL (SELECT AVG(grade) FROM ENROLLS GROUP BY student#);
Examples, for you to look at, work with, and think about:
3.1 Which students received B's in any course?Existence Tests
4. List all students whose average grade is greater than the lowest average grade.
5. List the names and grades of students who received some grade greater than or equal the highest all-around average grade.
All subqueries return a table, which is then used as input to the WHERE clause in the enclosing
query. So far, we have been concerned with comparing the actual values in this returned table to
other values, using operators like
always, the records for which the WHERE clause was true were selected. SQL also allows us to test
not just the values contained in records returned by a subquery, but the existence or non-existence
of any such records as well. For this, the language provides EXISTS and NOT EXISTS.
Recall that each part of a WHERE clause is either true or false for each record in the table. A subquery preceded by the word EXISTS will be true if the subquery returns any records at all, regardless of the value of those records (even a record containing all NULLs qualifies). Similarly, a subquery preceded by NOT EXISTS will be true if the subquery returns no records, i.e., an empty table with no rows and no columns.
Existence tests, although conceptually very simple, can be quite useful. Their most common use, however, is in correlated subqueries, described next.
In the examples shown so far, the subquery is first evaluated, and then the table resulting from that evaluation is used as input to the main query. SQL also allows subqueries to be evaluated multiple times, once for each record accessed by the main query.
Such subqueries are called correlated subqueries.
Suppose, for example, that we wished to find the names and student numbers of any students who are not currently enrolled in any classes. One way to express this request in SQL is
Unlike the earlier examples, the parenthesized subquery here is not executed just once. Instead, this subquery is executed once for each record in the STUDENTS table. For each execution, the value of theSELECT student#, student_name FROM STUDENTS S WHERE NOT EXISTS (SELECT * FROM ENROLLS E WHERE E.student# = S.student#);
student#field from a record in STUDENTS is compared with the value of
student#from every record in ENROLLS (via the condition
E.student# = S.student#in the subquery's WHERE clause). Whenever the two
student#'s match, the subquery adds that record from ENROLLS to its result table.
Recall from last section how NOT EXISTS works: if the table returned by the subquery is not empty, i.e., if it
contains at least one record, the NOT EXISTS condition is false. If, however, the table returned by the subquery
is empty, i.e., if it contains no records, the NOT EXISTS condition is true. In the example above, the subquery
student# field of each record in STUDENTS with every
student# field in the
ENROLLS table. If at least one match is found, then this student is enrolled in some course, the table resulting
from the subquery will not be empty, and the NOT EXISTS condition will be false. If no matches are found, however,
then the subquery will return an empty table, the NOT EXISTS condition will be true, and the
student# fields of the current record in the main query will be selected.
Note that the alias S is defined for the STUDENTS table in the main query, and that the alias is used within the subquery.
Note also that the value of
S.student#, which changes each time the subquery is executed, is taken from (or
correlated with) the current record in the main query.
It is from this relationship that the term "correlated query" is derived. To recognize correlated queries, look for the use within the subquery of an alias (or, occasionally, a complete table name) defined only in the main query. The above example could also have been written without the alias name for ENROLLS and with only one qualified name, like
Again,SELECT student#, student_name, FROM STUDENTS S WHERE NOT EXISTS (SELECT * FROM ENROLLS WHERE student# = S.student#);
S.student#takes its value from the current record in STUDENTS. The reference to
student#in the subquery's WHERE clause still refers to that column in the ENROLLS table; in the absence of a qualified name, the column is assumed to come from the "nearest" table. In this case, that table is ENROLLS. The results of this query are identical to those of the one shown previously.
Correlated subqueries can also be used with comparison operators, ANY and ALL, and EXISTS. For instance, to see which students received a grade higher than the average in their section, and to order the results by course number, one could type
Unlike the previous example, the main query accesses two different tables. Conceptually, this query is processed as follows: first, the cartesian product of STUDENTS and ENROLLS is formed, a table containing all possible combination of records from the two tables.SELECT student_name, grade, course#, section# FROM STUDENTS, ENROLLS E WHERE STUDENTS.student# = E.student# AND grade > (SELECT AVG(grade) FROM ENROLLS WHERE course# = E.course# AND section# = E.section#) ORDER BY course#;
Then, only those records in which
are selected. For each of these records, the subquery is executed, returning the average grade for the course and section associated with the current record from the ENROLLS table. If the grade in that record is greater than this average, the main query's WHERE clause is true, and theSTUDENTS.student# = ENROLLS.student#
section#of that record is selected. Finally, all selected records are ordered by the value in their
Using HAVING With Correlated Subqueries
The HAVING clause was discussed in the context of GROUP BY earlier. HAVING is also useful with correlated subqueries. While a WHERE clause specifies restrictions that individual records must meet to be selected, a HAVING clause places a restriction on the results of an entire query (or, when used with GROUP BY, of a group). When used without GROUP BY, a HAVING clause can prevent a query from returning anything at all.
To see why this is useful, suppose we wanted to know which students are taking more than two courses. One way to express this in SQL is
As before, the subquery will be executed once for each record in the STUDENTS table (we know this because of the reference to the alias S within the subquery). And as always, theSELECT student_name FROM STUDENTS S WHERE EXISTS (SELECT COUNT(*) FROM ENROLLS WHERE student# = S.student# HAVING COUNT(*) > 2);
student_namefield from the current STUDENTS record will be selected whenever the main query's WHERE clause is true. The question to be answered, then, is when will this WHERE clause be true?
Once again, the EXISTS operation (and thus the main query's WHERE clause) returns true if and only if the subquery returns a non-empty table, i.e., a table with at least one row and column. Without the HAVING clause, the subquery would return the number of courses in which each student was enrolled. The effect of the HAVING clause, however, is to eliminate entirely the results of any subquery executions which return two or fewer records. Once again, since a HAVING clause (without GROUP BY) places a restriction on the query as a whole, any query which would otherwise return less than three records will now return an empty table. The EXISTS in the main query, and thus the main query's WHERE clause, will therefore be true only for those students who are enrolled in more than two courses.
Examples, to think about:
6. Which students are not enrolled in any courses?
7. Which students received a grade higher than the average in their section?
8. Which courses have only male students?
9. Which students are taking more than 2 courses?
10. List the departments that have courses with more than 4 students.
We can do other things with SQL as well, such as: