![]() |
![]() Fall Semester 2005 |
The main issue in preserving state information is where to store it. Six frequently used places are shown in the following list. They can be broadly broken down into client-side techniques (items 1 through 3) and server-side techniques (items 4 through 6).
The first approach is covered in the previous set of notes and the homework assignment. Let's now move to addressing the approach at the other end of the spectrum; but before doing that let's briefly say something about all the other approaches.
2. Maintaining State with Cookies
HTTP cookies are named bits of information that are transmitted between
the server and the browser within the HTTP header. Ordinarily the server
creates a cookie by including a Set-Cookie
field in the HTTP
header. The browser then stashes away the cookie information in a small
in-memory or on-disk database. The next time the browser makes a request
from that particular server, it returns that cookie in a Cookie
field.
Cookies are relatively flexible. You can create cookies that will be returned to only one specific server or to any server in your domain. You can set them up so that they're returned only when users access a particular part of a document tree or any URI in the document hierarchy. They can be set to expire immediately when the user exits the browser, or they can be made to persist on the user's disk database for an extended period of time. You can also create secure cookies that are only returned to the server when a secure protocol, such as SSL, is in effect. This prevents cookies from being intercepted in transit by network eavesdroppers.
The exact format of HTTP cookies is somewhat involved and is described in
the HTTP specification. Fortunately
it's easy to make cookies in the right format using the
CGI::Cookie
module. As a matter of fact
CGI.pm
allows you to set and retrieve cookies that
consist of entire hashes, and the conversion is handled automatically. We
don't plan to look at an implementation involving cookies, but
this overview is meant to give us an idea of what that would look like,
just so we know if we were to.
(Stein and MacEahern do provide such an implementation, though).
3. Protecting Client Side Information
The cookie-based implementation of the hangman game is a lot classier than the first implementation. Not only does it have the advantage of maintaining state across browser sessions, but the game is also somewhat harder to cheat. While the user is actively playing the game, the cookie is kept in memory where it is difficult to read without the benefit of a debugger. However, after the user quits the browsing session, the cookie is written out to disk; determined cheaters could still find and edit the cookie database file if they wanted to make their statistics look better.
When you store information on the client side of the connection, peeking and tampering is a general problem. Fortunately, the cure is relatively simple. To prevent tampering, you can use a message authentication check (MAC) -- a form of checksum that will detect if the user has altered the information in any way. To prevent peeking, you can encrypt the information using an encryption key that is known to you but not to the user.
3.1 Message Authentication Checks
How can we add a MAC to the cookie used in the last section's example? There are many ways to compute a checksum, but the most reliable use a class of algorithms known as message digests. A message digest algorithm takes a large amount of data (usually called a "message") and crunches it through a complex series of bit shifts, rotates, and other bitwise operations until it has been reduced to a smallish number known as a hash. (Hashes like these are essentially guiding searches in Perl's associative arrays, also known as hash tables). The widely used MD5 message digest algorithm produces a 128-bit hash.
Because information is lost during the message digest operation, it is a one-way afffair: given a hash, you can't reconstruct the original message. Because of the complexity of the digest operation, it is extremely difficult to deliberately create a message that will digest to a particular hash. Changing just one bit anywhere in a message will result in a hash that is utterly unlike the previous one. However, you can confirm that a particular message was likely to have a produced a particular hash simply by running the message through the digest algorithm again and comparing the result to the hash.
To create a MAC, follow this general recipe:
3.2 Encrypting Client-Side State Information
Message authentication checks implement a "look but don't touch" policy. Users can't modify the state information, but they can still see what's there. In many web applications, there's no harm in this, but with the hangman game it has the unwanted consequence that the user can peek at the unknown word, either by viewing the page source in the fill-out form version or by quitting the browser and viewing the cookie database file.
To prevent this from happenning without abandoning client-side storage
entirely, you can encrypt the state information. Your application will have
the secret key necessary to decrypt the information, but without launching an
expensive cryptanalysis project (and maybe not even then) the user won't be
able to get at the data. Encryption can be combined with a MAC in order to
obtain truly bullet-proof client-side authentication.
(One can use the Crypt::CBC
Perl module, and the code is in the
book).
4. Storing State at the Server Side
Client-side storage of state information works well when each of the user sessions is independent of the others. But what if we wanted to combine the information from users, for example, to display a list of the top-scoring players in an Internet-wide tournament?
This is where server-side storage comes in. When you store the user information at the server side rather than the client side, you have full access to the list of all users and to the record of what they've done and what they're doing. You can crunch, tally, tabulate, and cross-reference this information to your heart's content. Server-side storage also has the advantage of being more secure, since the information never leaves the server, and it is more resilient to failure. If the user's browser crashes in the midst of accepting or updating a cookie, that information isn't lost because it's stored safely on the server. The downside is scalability and performance. Each user session that you store on the server side consumes some amount of memory, disk, and CPU cycles. When you store state information on the server side, you have to be careful to conserve these resources, for example by deleting user sessions that are no longer in use.
We will consider two types of server-side techniques in this section: storing the information transiently in main memory and storing it in an SQL database.
4.1 Storing State Information in Main Memory
Because Apache server processes are persistent across multiple accesses,
you can store small amounts of state information in main memory. When the
user first runs your application, it generates a random unique session
identifier (session ID) and stores the state information in a data
structure, for instance, a hash table keyed by the session ID.
The application then sends the session ID back to the user
in the form of a cookie, a hidden field, or a component of a URI.
When the same user connects again, your application recovers the session
ID and retrieves the state information from its data structure. (That,
if you are using mod_perl
to write Apache modules, and we
won't.)
The main problem with this technique is that the amount of state information that you can store in shared memory is very limited, making it unsuitable for high-volume or high-reliability applications. A better server-side solution involves using relational database management systems (RDBMS's), which we turn to in this next section, the main topic of these notes.
4.2 Storing State Information in SQL Databases
Persistent memory is only suitable for storing small amounts of state information for relatively short periods of time. If you need to reliably store lots of information for a long time, you need a server-side database.
The DBI library, designed by Tim Bunce and others, is a generic Perl interface to relational database management systems (RDBMS's) that speak SQL (Structured Query Language). The DBI library speaks to specific databases by way of DBD (DataBase Driver) modules. You can make queries on any database that has a DBD module available for it. These modules are sometimes provided by the database vendor, and sometimes by third parties. DBD modules for Oracle, Sybase, Illustra, mSQL, MySQL, and others can be found at CPAN.
Full information on using DBI can be found in the Perl DBI book. We'll summarize just enough here so that you can follow the examples if you are not already familiar with DBI.
Before you can work with the DBI interface, you must select and install a relational database. We have MySQL installed, and that's what we will use.
In relational databases, all information is organized in tables. Each row of the table is a data record, and each column is a field of the record. For example here's one way to represent the hangman data:
Most of the columns in the table above directly correspond to the fields in the now-familiar hangman state object. In addition to these fields we add two more columns.table: dgerman_hangman +----------+--------+-------+------+---+------------+-----+--------------+ |session_id| WORD |GUESSED|GAMENO|WON|GUESSES_LEFT|TOTAL| modified| +----------+--------+-------+------+---+------------+-----+--------------+ |fd2c95dd1 |entice |e | 10| 6| 6| 34|19980623195601| |97aff0de2 |bifocals|aeilort| 4| 2| 3| 20|19980623221335| +----------+--------+-------+------+---+------------+-----+--------------+
session_id
is a string that uniquely
identifies each user session and is used as a key in to the table for fast
record lookup. For reasons that will become apparent soon, we use a short
hexadecimal string as the session ID. We also add a timestamp field named
modified
which holds the date and time at which the record
was last changed. If you look carefully, you'll see that the column consists
of the four-digit year and two digits each for the for the month, day, hour,
minute, and second. This timestamp will come in handy for detecting old
unused sessions and clearing them out periodically.
In SQL databases, each table column has a defined data type and a maximum field length. Available data types include integers, floating point numbers, character strings, date/time types, and sometimes more esoteric types. Unfortunately the data types supported by database management systems vary considerably, limiting the portability of applications among different vendors' products. In this chapter, our examples use MySQL data types and functions. One may have to make some modifications in order to support another database system.
The most basic way to communicate with an SQL database is via a text
monitor -- a small terminal-like application in which you type SQL queries
to the database and view the results. To create the definition for the
table shown above, you could issue the SQL CREATE
command:
This declares a table namedmysql> create table dgerman_hangman ( -> session_id char(8) primary key, -> WORD char(30), -> GUESSED char(26), -> GAMENO int, -> WON int, -> GUESSES_LEFT int, -> TOTAL int, -> modified timestamp -> ); Query OK, 0 rows affected (0.06 sec)
dgerman_hangman
using
the MySQL syntax. The session_id
column is declared
to be a string of at most eight characters, and it is also declared
to be the primary key for the table. This ensures that a given session
ID is unique, and speeds up table lookups considerably. The WORD
and GUESSED
columns are declared to be strings of at most 30
and 26 characters, respectively, and GAMENO
, WON
,
GUESSES_LEFT
, and TOTAL
are delcared to be
integers (using the default length). We declare the column named
modified
to be a timestamp, taking advantage of a
MySQL-specific feature that updates the field automatically
whenever the record that contains it is changed.
You can then load some sample data into the database using an
SQL INSERT
statement:
This inserts the indicated values for the columnsmysql> insert into dgerman_hangman -> (session_id, WORD, GUESSED, GAMENO, WON, GUESSES_LEFT, TOTAL) -> values -> ('a0000001', 'spruce', '', 1, 0, 6, 0); Query OK, 1 row affected (0.02 sec) mysql>
session_id
through TOTAL
. We don't explicitely set the value of the
modified
column because MySQL takes care of that for us.
We can now perform some queries over the database using the SQL
SELECT
statement.
To see everything in the dgerman_hangman
table:
The part of the query following themysql> select * from dgerman_hangman; +------------+--------+---------+--------+------+--------------+-------+----------------+ | session_id | WORD | GUESSED | GAMENO | WON | GUESSES_LEFT | TOTAL | modified | +------------+--------+---------+--------+------+--------------+-------+----------------+ | a0000001 | spruce | | 1 | 0 | 6 | 0 | 20020704022752 | +------------+--------+---------+--------+------+--------------+-------+----------------+ 1 row in set (0.01 sec) mysql>
SELECT
command chooses
which columns to display. In this case we use *
to indicate all columns. The FROM
keyword names the table
to select the data from. If we wished to look at just the
session_id
,
WORD
, and
GAMENO
An optionalmysql> select session_id, WORD, GAMENO from dgerman_hangman; +------------+--------+--------+ | session_id | WORD | GAMENO | +------------+--------+--------+ | a0000001 | spruce | 1 | +------------+--------+--------+ 1 row in set (0.00 sec) mysql>
WHERE
clause allows us to filter the records so
that only records matching a set of criteria are displayed. For example,
this query shows only session records from players who have played five
games or more:
This query retrieves the session with the IDmysql> insert into dgerman_hangman -> (session_id, WORD, GUESSED, GAMENO, WON, GUESSES_LEFT, TOTAL) -> values -> ('fd2c95dd1', 'entice' , 'e', 10, 6, 6, 34), -> ('97aff0de2', 'bifocals', 'aeilort', 4, 2, 3, 20); Query OK, 2 rows affected (0.00 sec) mysql> select * from dgerman_hangman; +------------+----------+---------+--------+------+--------------+-------+----------------+ | session_id | WORD | GUESSED | GAMENO | WON | GUESSES_LEFT | TOTAL | modified | +------------+----------+---------+--------+------+--------------+-------+----------------+ | a0000001 | spruce | | 1 | 0 | 6 | 0 | 20020704022752 | | fd2c95dd | entice | e | 10 | 6 | 6 | 34 | 20020704024506 | | 97aff0de | bifocals | aeilort | 4 | 2 | 3 | 20 | 20020704024506 | +------------+----------+---------+--------+------+--------------+-------+----------------+ 3 rows in set (0.01 sec) mysql> select session_id, WORD, GAMENO from dgerman_hangman where GAMENO >= 5; +------------+--------+--------+ | session_id | WORD | GAMENO | +------------+--------+--------+ | fd2c95dd | entice | 10 | +------------+--------+--------+ 1 row in set (0.02 sec) mysql>
a0000001
:
Finally, this query retrieves all sessions that were modified within the past 24 hours:mysql> select session_id, WORD, GAMENO from dgerman_hangman where session_id = 'a0000001'; +------------+--------+--------+ | session_id | WORD | GAMENO | +------------+--------+--------+ | a0000001 | spruce | 1 | +------------+--------+--------+ 1 row in set (0.00 sec) mysql>
Here's how one would select the sessions modified within the past 10 minutes (in slow motion):mysql> select session_id, WORD, GAMENO from dgerman_hangman -> where unix_timestamp() - unix_timestamp(modified) < 60 * 60 * 24; +------------+----------+--------+ | session_id | WORD | GAMENO | +------------+----------+--------+ | a0000001 | spruce | 1 | | fd2c95dd | entice | 10 | | 97aff0de | bifocals | 4 | +------------+----------+--------+ 3 rows in set (0.00 sec) mysql>
These last two examples show the use of the MySQL-specificmysql> select session_id, WORD, GAMENO, modified from dgerman_hangman; +------------+----------+--------+----------------+ | session_id | WORD | GAMENO | modified | +------------+----------+--------+----------------+ | a0000001 | spruce | 1 | 20020704022752 | | fd2c95dd | entice | 10 | 20020704024506 | | 97aff0de | bifocals | 4 | 20020704024506 | +------------+----------+--------+----------------+ 3 rows in set (0.00 sec) mysql> select session_id, WORD, GAMENO, modified from dgerman_hangman -> where unix_timestamp() - unix_timestamp(modified) < 60 * 10; +------------+----------+--------+----------------+ | session_id | WORD | GAMENO | modified | +------------+----------+--------+----------------+ | fd2c95dd | entice | 10 | 20020704024506 | | 97aff0de | bifocals | 4 | 20020704024506 | +------------+----------+--------+----------------+ 2 rows in set (0.00 sec) mysql>
unix_timestamp()
function. Called without arguments,
unix_timestamp()
returns the current time and date as
the number of seconds since the start of the Unix epoch. The function
can also be called with a timestamp field as the argument, in which
case it operates on the timestamp rather than the current time. The
effect of the query above is to subtract the modified
field from the current time and compare the difference to one day.
The SQL language allows you to form queries that are substantially
more complex than these, including ones that that combine the results
of multiple tables. We may say more about this later.
The INSERT
statement can only be used to create a new
record (or row) of the table. If we were to try to execute the insertion
statement shown earlier a second time, the attempt would fail because any
given session ID can only occur once in the table. This feature guarantees
the uniqueness of session IDs. To change the values in an existing record,
we would use an UPDATE
statement instead.
A typical UPDATE
statement looks like this:
Lastly, themysql> update dgerman_hangman set gameno=gameno+1 -> where session_id='a0000001'; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> select session_id, WORD, GAMENO, modified from dgerman_hangman; +------------+----------+--------+----------------+ | session_id | WORD | GAMENO | modified | +------------+----------+--------+----------------+ | a0000001 | spruce | 2 | 20020704030627 | | fd2c95dd | entice | 10 | 20020704024506 | | 97aff0de | bifocals | 4 | 20020704024506 | +------------+----------+--------+----------------+ 3 rows in set (0.00 sec) mysql>
DELETE
statement can be used to delete all
records that satisfy the criteria set out in the WHERE
clause. This query deletes all sessions older than a day:
Let's try again with a 15 minute expiration limit:mysql> delete from dgerman_hangman -> where unix_timestamp() - unix_timestamp(modified) > 60 * 60 * 24; Query OK, 0 rows affected (0.00 sec) mysql>
If you forget to include amysql> delete from dgerman_hangman -> where unix_timestamp() - unix_timestamp(modified) > 60 * 15; Query OK, 2 rows affected (0.00 sec) mysql>
WHERE
clause in the
UPDATE
and DELETE
statements, every
record in the database will be affected by the operation. This
is generally to be avoided. 4.3 Using DBI
The DBI interface provides methods for opening SQL databases, sending queries to the opened database, and reading the answers returned by those queries.
To open a database, you call DBI->connect()
with
the "data source name," a string that tells the database driver where
the database is located. If the database requires a username and password
for access, you can pass that information in the connect()
call as well. The format of the data source name is DBMS-specific. For
a MySQL database, it looks like this:
All MySQL data sources begin with "dbi:mysql". They are followed by the name of the database, and, optionally, by the name and port of the remote host on which the DBMS is running. If the hostname and port are omitted, the driver defaults to using a standard port on the local host. To connect to a database named"dbi:mysql:$database:$hostname:$port"
www
on
the local host using the username games
and the password grok
,
you'd make this call:
If successful,$dbh = DBI->connect('dbi:mysql:www', 'games', 'grok');
connect()
returns a database handle, $dbh
, which
is used for subsequent communication with the database. The connect()
method also
accepts an optional fourth argument which consists of a hash reference of parameter name=value
pairs. These control a variety of database options, such as whether to automatically commit all
changes made to the database. The only option that we'll use in the examples that follow is
PrintError
, which when set to false, supresses the printing of unwanted warnings
to the server error log.
The database handle has several methods, the most important of which are do()
,
prepare()
, and errstr()
. do()
is used to execute SQL
statements which do not return a list of records, such as INSERT
,
DELETE
,
UPDATE
, or
CREATE
. If the operation is successful, do()
returns a count of the number
of rows modified. For example, the following query sets the GAMENO
field of all sessions
to 1, and returns the number of rows affected:
If the database encountered an error while processing the statement (for example the SQL contained a syntax error), it will return$count = $dbh->do('UPDATE dgerman_hangman SET GAMENO=1'); die $dbh->errstr unless defined $count;
undef
. The errstr()
method can be used to retrieve
an informative error message from the driver.
SELECT
queries can return a potentially large number of records, often
more than will fit into memory at once. For this reason, the results from SELECT
queries are returned in the form of statement handle objects. You then call the statement handle's
fetch()
method repeatedly to retrieve each row of the result. Here's an example of retrieving
the session_id
and WORD
fields from each session in the hangman database:
The example starts with a call to the database handle's$sth = $dbh->prepare('SELECT session_id, WORD FROM dgerman_hangman') || die $dbh->errstr; $sth->execute() || die $sth->errstr; while (my $row = $sth->fetch) { my($session, $word) = @$row; print "$session => $session, $word => $word\n"; } $sth->finish;
prepare()
method with
the text of the SQL SELECT
method. prepare()
parses the SQL and checks
it for syntactic correctness but does not actually execute it. The query is returned as a statement
handler which we store into the variable $sth
. If some error occurred while preparing the
statement, prepare()
returns undef
, in which case we return the errstr()
error text.
Next we call the statement handler's execute()
method. This performs the query and returns either
the number of rows retrieved or undef
if an error occurred. In the case of a syntactically correct
query that happens to return no rows (because the table is empty or because no records satisfied the criteria in
the WHERE
clause), execute()
returns the value 0E0
which Perl regards as
true in a logical context, but as zero in a numeric one.
Now we enter a loop in which we call the statement handler's fetch()
method. Each time it's called,
fetch()
returns the requested columns in the form of an array reference. To retrieve the values
themselves, we just dereference the value into a list. Because we requested the column session_id
and
WORD
, we get a reference to a two-item array back from fetch()
. When there are no more rows
left, fetch()
returns undef
.
DBI actually offers a family of fetch functions. fetchrow_array()
is like fetch()
, but
it dereferences the row first and returns an array corresponding to the list of requested columns. Another function,
fetchrow_hashref()
, turns the current row into a hash of the column names and their values and returns
the hash's reference to the caller. This allows us to make the example above more readable at the cost of making it
somewhat less efficient:
DBI also provides a$sth = $dbh->prepare('SELECT session_id, WORD FROM dgerman_hangman') || die $dbh->errstr; $sth->execute || die $sth->errstr; while (my $row = $sth->fetchrow_hashref) { print "session => $row->{session_id}, word => $row->{WORD}\n"; } $sth->finish;
fetchrow_arrayref()
method for fetching the row as an array reference. It is
identical in every respect to fetch()
. Also, when
you are finished with a statement handler, you should call its finish()
method in order to free
up the resources it uses.
The last thing you need to know about statement handlers is that many DBI drivers allow you to put placeholders,
indicated by the ?
character, inside SQL statements. prepare()
compiles the statement
and returns a statement handler as before, but when you later call execute()
you pass in the values
to be substituted into the placeholders. This allows you to treat statement handlers much as you would a
subroutine by calling it repeatedly with different runtime arguments. For example, we can create a statement
handler for returning the entire row of a given session with this bit of code:
Now we can fetch information on session$sth = $dbh->prepare('SELECT * FROM dgerman_hangman WHERE session_id=?');
fd2c95dd
, by calling the statement handler's
execute()
method this way:
The same statement handler can later be used to fetch information from other named sessions. You should still call$sth->execute('fd2c95dd');
finish()
at the end of each series of
fetches, even though you are going to reuse the statement handler. Failure to do so
can lead to memory leaks.
When you are completely finished with a database handle, you should call its
disconnect()
method in order to sever the connection and clean up.
4.4 A DBI Backend for Hangman
Like the persistent memory version of the hangman game, the DBI implementation has to have code to open the database, to set and fetch session records from the database, to generate unique session IDs for each incoming connection, and to expire old sessions that we're no longer interested in. The code presented at the end of these notes shows what's new and different on the server side. There are no visible changes in the user interface.
This script assumes a database has already been set up that contains a table named hangman
(don't forget our convention of prefixing the name with your username so we can distinguish the tables)
with this structure (I used to have such a table, now I drop it, so I can create a new one):
Themysql> show tables like '%hangman%'; +----------------------------+ | Tables_in_a348 (%hangman%) | +----------------------------+ | dgerman_hangman | +----------------------------+ 1 row in set (0.02 sec) mysql> drop table dgerman_hangman; Query OK, 0 rows affected (0.00 sec) mysql> show tables like '%hangman%'; Empty set (0.02 sec) mysql> create table dgerman_hangman ( -> session_id char(8) primary key, -> WORD char(30), -> GUESSED char(26), -> GAMENO int, -> WON int, -> GUESSES_LEFT int, -> TOTAL int, -> modified timestamp, -> key(modified) -> ); Query OK, 0 rows affected (0.00 sec) mysql> describe dgerman_hangman; +--------------+---------------+------+-----+---------+-------+---------------------------------+ | Field | Type | Null | Key | Default | Extra | Privileges | +--------------+---------------+------+-----+---------+-------+---------------------------------+ | session_id | char(8) | | PRI | | | select,insert,update,references | | WORD | char(30) | YES | | NULL | | select,insert,update,references | | GUESSED | char(26) | YES | | NULL | | select,insert,update,references | | GAMENO | int(11) | YES | | NULL | | select,insert,update,references | | WON | int(11) | YES | | NULL | | select,insert,update,references | | GUESSES_LEFT | int(11) | YES | | NULL | | select,insert,update,references | | TOTAL | int(11) | YES | | NULL | | select,insert,update,references | | modified | timestamp(14) | YES | MUL | NULL | | select,insert,update,references | +--------------+---------------+------+-----+---------+-------+---------------------------------+ 8 rows in set (0.00 sec) mysql>
modified
field is a MySQL-specific data type, and later we will
take advantage of other MySQL features involving the handling of dates. SQL databases
vary widely in their handling of dates and times, and we prefer to show you an efficient
implementation of the application on a specific database than an inefficient implementation
that might work more generically. To port this code to the database of your choice, you will
need to change the data type of the modified column to a date/time type that your database
understands and modify the expires()
subroutine to work with this changed type.
Before stepping through the script, let's first look at get_state()
and
save_state()
:
# get the state from the database ------------------------------get_state--- sub get_state { my $id = shift; my $query = "SELECT * FROM $DB_TABLE WHERE session_id = '$id' AND WORD LIKE '%'"; my $sth = $DBH->prepare($query) || die "Prepare: ", $DBH->errstr; $sth->execute || die "Execute: ", $sth->errstr; my $state = $sth->fetchrow_hashref; $sth->finish; return $state; }
get_state()
is responsible for recovering the state information as a hash reference,
given the ID of an existing session. At its core is this SQL statement:
This selects all columns from the record named by the session ID. We then call DBI'sselect from dgerman_hangman where session_id='$id'
fetchrow_hashref()
to retrieve the record in the form as a hash
reference in which the keys (WORD
, GUESSED
, GAMENO
,
and so on) correspond to the columns of the selected record. As it happens, this hashref
is identical to the state variable that the higher levels of the script operate on, so
all we have to do is to return it.
The save_state()
subroutine is almost as simple:
This subroutine constructs a DBI statement handler containing placeholders for the six keys in# save the state in the database ------------------------------save_state--- sub save_state { my ($state, $id) = @_; my $sth = $DBH->prepare(<<END) || die "prepare: ", $DBH->errstr; UPDATE $DB_TABLE SET WORD=?,GUESSED=?,GAMENO=?,WON=?,TOTAL=?,GUESSES_LEFT=? WHERE session_id='$id' END $sth->execute(@{$state}{qw(WORD GUESSED GAMENO WON TOTAL GUESSES_LEFT)}) || die "execute: ", $DBH->errstr; $sth->finish; }
$state
.
It then calls the statement handler's execute()
statement to write the values from $state
into the database. The remainder of the code is concerned with the generation and maintenance of session IDs. Although most of the state information is stored on the server's side of the connection, there's more to the story. There will always have to be some information stored by the client, because otherwise, there would be no way for the server to distinguish one client from another and, hence, no way to retrieve the correct session record. Some of the obvious ways of distinguishing one client from another, such as recording their IP addresses, do not work well in practice (a dial-in user may have several IP addresses, and conversely, all America Online users share the IP address of a few large proxy servers). The general technique for identifying clients is to generate a session ID for them when they first connect to your application and then arrange for them to return the session ID to you on subsequent requests. A session ID can be anything you like. In the hangman game we use an eight-digit hexadecimal number, which is sufficient for about four billion active sessions.
We've already seen two techniques that can be adapted to this purpose: HTTP
cookies and fill-out forms. Because the session ID is a relatively small amount
of information, there's also a third option available to us. We can store the
session ID in the URI itself as additional path information. When a connection
comes in from a new client we assign it a randomly generated ID, append it to
our URI as additional path information, and send the client an HTTP
redirect()
directive to make it fetch this new URI. On
subsequent requests, we recover the session ID from the additional
path information. This technique has an advantage over cookies in
that it is compatible with all browsers, including those for which the
user has disabled cookies. It has the disadvantage that the session ID is
visible to the user. The URI displayed by the browser will look
something like this:
A side benefit of this technique is that the user can bookmark this URI, session ID and all, and come back to a game later.http://burrowww.cs.indiana.edu:14569/cgi-bin/stein/dbi/d00f9f35
Beginning our walkthrough of the script, we bring in the DBI library and define a few new constants:
Here are brief notes about each:#!/usr/bin/perl use DBI; use MD5; use CGI; $WORDS = '/usr/share/lib/dict/words'; $TRIES = 6; $EXPIRE = 60 * 60 * 24 * 30; # allow 30 days before expiration $DB = "DBI:mysql:a348"; # data source name (database) $username = "a348"; # username $password = "a348AG"; # password $DBAUTH = "$username:$password"; $SECRET = " ***( something secret, whatever )*** "; $MAX_TRIES = 10; $DB_TABLE = "dgerman_hangman"; $ID_LENGTH = 8; # length of the session ID $q = new CGI;
$EXPIRE
is the length of time to keep sessions around before expiring them from
the database. Unlike the shared-memory version of the script, the session data is stored on disk.
This means that we can be less draconian in our expiration policy. An unused session is allowed 30
days before being recycled.
$DB
is the DBI data source name for the database, and
$DBAUTH
is the database authentication information, in the given format.
$SECRET
and $MAX_TRIES
are used in the generation of new session keys.
$DB_TABLE
is the database table name to use and
$ID_LENGTH
is the length
of the session key in characters.
The script begins by opening the database and saving its database handle in a global named# Open the database -------------------------------------------------------- $DBH = DBI->connect($DB, $username, $password, {PrintError => 0}) || die "Couldn't open database: ", $DBI::errstr; # get the current session ID, or make one ---------------------------------- my ($session_id, $note) = &get_session_id();
$DBH
. Next, we retrieve the session ID (or generate a new one) by
calling a subroutine named get_session_id()
. get_session_id()
returns a two-element list: the session ID and a note that can be used to alert the user
to exceptional conditions. In this script, the only exceptional condition that occurs is
when the user tries to use a session ID that has expired.
With the session ID in hand, we retrieve the state by calling the# retrieve the state ------------------------------------------------------- my $state = &get_state($session_id) unless $q->param('clear'); # reinitialize if we need to ----------------------------------------------- if (!$state || $q->param('restart')) { $state = &initialize($state) } # process the current guess, if any ---------------------------------------- my ($message, $status) = &process_guess($q->param('guess') || '', $state); # save the modified state -------------------------------------------------- &save_state($state, $session_id);
get_state()
subroutine that we looked at earlier. We then (re)initialize the state variable as before if
need be, process the user's guess if any, and call save_state()
to write the
modified session back to the database. The remainder of the script is unchanged from previous
versions, except that we display the note returned by get_session_id()
at the top
of the page if it's nonempty.
We'll look at the get_session_id()
subroutine now, which is responsible for
retrieving an existing session ID or generating a new one.
This subroutine first expires all out-of-date sessions by calling# retrieve the session ID from the path info. if it's -----get_session_id--- # not already there, add it to the path info (more or less) with a redirect sub get_session_id { my (@result); &expire_old_sessions(); my ($id) = $q->path_info() =~ m:^/([a-h0-9]{$ID_LENGTH}):o; return @result if $id and @result = &check_id($id); # if we get here, there's not already an ID in the path info my $session_id = &generate_id(); die "Couldn't make a new session id" unless $session_id; print $q->redirect($q->script_name() . "/$session_id"); exit 0; }
expire_old_sessions()
.
(If there are many session records, expire_old_sessions()
will rapidly become a
performance drain on the script.) In high-volume applications, you will want to move session
expiration into a separate standalone process that runs at regular intervals under the Unix
cron
or NT at
utilities. For the hangman application, a nightly
expiration is more than sufficient.) Next it calls CGI.pm
's path_info()
function to return the additional path information and attempt to match it against the expected
session ID pattern. If a likely looking session ID is found, we call check_id()
to ensure that the session ID actually corresponds to a database record. Otherwise, we call
generate_id()
to create a new session ID. We append the ID to our URI (using
CGI.pm
's script_name()
function), incorporate it into a call to
redirect()
, and exit. In this case, the subroutine never returns to the
caller, but the redirected browser immediately generates a second call to the script,
this time with the session ID appended to the URI.
The expire_old_sessions()
is simple:
The subroutine consists of a single DBI call that sends an SQLsub expire_old_sessions { # --------------------------expire_old_sessions--- $DBH->do(<<END); DELETE FROM $DB_TABLE WHERE (unix_timestamp() - unix_timestamp(modified)) > $EXPIRE END }
DELETE
statement
to the database. The effect of the call is to delete all session records that are older than the
time limit set by the $EXPIRE
constant.
generate_id()
, which chooses new session IDs, is slightly more complex:
The reason for this complexity is that it is important to generate a unique session ID in such a way that valid session IDs cannot be trivially guessed. Otherwise it would be possible for a malicious person to hijack another user's session by misappropriating that user's session ID. This is not important in the hangman game, but becomes an issue in transactions in which things of value (money, merchandise, confidential information) are changing hands. A simple sequence of session IDs, such as choosing one higher than the previous highest, is too obvious. IDs generated from the# find a new unique ID and insert it into the database -------generate_id--- sub generate_id { # create a new session id my $tries = 0; my $id = &hash($SECRET . rand()); while ($tries++ < $MAX_TRIES) { last if $DBH->do("INSERT INTO $DB_TABLE (session_id) VALUES ('$id')"); $id = &hash($id); } return undef if $tries >= $MAX_TRIES; # we failed return $id; }
rand()
call
are unreliable as well, because once you know where you are in the series, you can
generate all the subsequent values.
Instead, we use a combination of rand()
and the MD5 message digest algorithm.
We begin by computing the MD5 hash of the value of rand()
concatenated with a
secret phrase. This extra concatenation step makes it impossible to derive the value of the
next session ID from the previous one. Instead of calling MD5 directly, we call a small
internal subroutine, hash()
, to compute the MD5 hash and then truncate it to
eight characters. This reduces the size of the session ID at the cost of making the ID somewhat
easier to guess (The size of the session ID determines the number of guesses a would-be hijacker
has to make before getting a correct one). There are about 4.3 billion eight-digit session IDs.
If you have 10,000 active sessions, this means that the hijacker has to guess (and try) 430,000
IDs before getting lucky. You'll probably notice this number of hits on your server long before
anything untoward happens. If you have 100,000 active sessions, however, only 43,000 guesses
are required, and you might want to use a longer session ID. In practice, it's almost always
easier for a hijacker to recover a session ID by some other method (such as packet sniffing)
than by guessing.) We then enter a loop in which we repeatedly attempt to insert the current
session ID into the database. If a record with that session ID does not already exist in the
database, the insertion statement returns a true result code and we immediately return the
ID. Otherwise, we generate a new trial ID by hashing the current ID concatenated with the
secret, and try again. We do this up to $MAX_TRIES
times, at which point we
give up. This allows us to fill up the space of possible session IDs to approximately
90 percent, or around 3 billion.
The check_id()
subroutine is called by get_session_id()
when the
browser provides a previous session ID. Its job is to check that the session ID still corresponds
to a database record. If not, it attempts to insert a record with that session ID into the database
and delivers a warning to the user that his game session may have expired.
# check to see that an old ID is valid --------------------------check_id--- sub check_id { my $id = shift; return ($id, '') if $DBH->do("SELECT 1 FROM $DB_TABLE WHERE session_id = '$id'") > 0; return ($id, 'The record of your game may have expired. Restarting.') if $DBH->do("INSERT INTO $DB_TABLE (session_id) VALUES ('$id')"); return (); }
The reason we try to reuse old session IDs is that the user may have bookmarked
the URI of the game, session ID and all. We honor the bookmark so that the user doesn't
have to discard it and enter a new one after his session has expired. check_id()
consists of two DBI calls. In the first, it makes an SQL SELECT
query looking
for a record matching the provided session ID. Since we're only interested in whether the
query succeeds or fails, we select a constant 1 instead of a named set of columns. If the
query fails, then the database does not already contain the session ID. We call DBI again
to insert the session ID into the database. If this fails (which it might in the unusual
case of another instance of this script picking the same session ID from within
generate_id()
) we return an empty list. Otherwise, we return the
ID and the warning message. Although the user has lost the record of his
old set of games, his bookmarked URI will still be valid and can now be used to
return to the new set.
The last new routine defined in this version of the game is hash()
,
which simply computes the MD5 digest of the value passed to it, then truncates it
to $ID_LENGTH
characters:
# generate a hash value ---------------------------------------------hash--- sub hash { my $value = shift; return substr(MD5->hexhash($value), 0, $ID_LENGTH); }
Finally, here's how the whole program looks like:
#!/usr/bin/perl use DBI; use MD5; use CGI; $WORDS = '/usr/share/lib/dict/words'; $TRIES = 6; $EXPIRE = 60 * 60 * 24 * 30; # allow 30 days before expiration $DB = "DBI:mysql:a348"; # data source name (database) $username = "a348"; # username $password = "a348AG"; # password $DBAUTH = "$username:$password"; $SECRET = " ***( something secret, whatever )*** "; $MAX_TRIES = 10; $DB_TABLE = "dgerman_hangman"; $ID_LENGTH = 8; # length of the session ID $q = new CGI; # Open the database -------------------------------------------------------- $DBH = DBI->connect($DB, $username, $password, {PrintError => 0}) || die "Couldn't open database: ", $DBI::errstr; # get the current session ID, or make one ---------------------------------- my ($session_id, $note) = &get_session_id(); # retrieve the state ------------------------------------------------------- my $state = &get_state($session_id) unless $q->param('clear'); # reinitialize if we need to ----------------------------------------------- if (!$state || $q->param('restart')) { $state = &initialize($state) } # process the current guess, if any ---------------------------------------- my ($message, $status) = &process_guess($q->param('guess') || '', $state); # save the modified state -------------------------------------------------- &save_state($state, $session_id); # start the page ----------------------------------------------------------- print $q->header, $q->start_html(-title => 'Database Sessions with URL Rewriting', -bgcolor => 'white'); print $q->p($q->font({-color=>'red'}, $note)) if $note; # draw the picture --------------------------------------------------------- &picture($state); # draw the statistics ------------------------------------------------------ &status($message, $state); # prompt the user to restart or for his next guess ------------------------- if ($status =~ /^(won|lost)$/) { # to restart &show_restart_form($state); } else { # for his/her next game &show_guess_form($state); } print $q->end_html; $DBH->disconnect; #------------(subroutines)-------------------------------------------------- # get the state from the database ------------------------------get_state--- sub get_state { my $id = shift; my $query = "SELECT * FROM $DB_TABLE WHERE session_id = '$id' AND WORD LIKE '%'"; my $sth = $DBH->prepare($query) || die "Prepare: ", $DBH->errstr; $sth->execute || die "Execute: ", $sth->errstr; my $state = $sth->fetchrow_hashref; $sth->finish; return $state; } # called to initialize a whole new state object ---------------initialize--- # or to create a new game sub initialize { my $state = shift; $state = {} unless $state; $state->{WORD} = &pick_random_word(); $state->{GUESSES_LEFT} = $TRIES; $state->{GUESSED} = ''; $state->{GAMENO} += 1; $state->{WON} += 0; $state->{TOTAL} += 0; return $state; } # called to process the user's guest -----------------------process_guess--- sub process_guess { my ($guess, $state) = @_; # lose immediately if user has no more guesses left return ('', 'lost') unless $state->{GUESSES_LEFT} > 0; # create hash containing the letters guessed thus far my %guessed = map { $_ => 1 } $state->{GUESSED} =~ /(.)/g; # create hash containing the letters in the original word my %letters = map { $_ => 1 } $state->{WORD} =~ /(.)/g; # return immediately if user has already guessed the word return ('', 'won') unless grep (!$guessed{$_}, keys %letters); # do nothing more (stop here) if no guess is provided return ('', 'continue') unless $guess; # this section processes individual letter guesses $guess = lc $guess; return ("Not a valid letter or word!", 'error') unless $guess =~ /^[a-z]+$/; return ("You already guessed that letter!", 'error') if ($guessed{$guess}); # this section is called when the user guesses the whole world if (length($guess) > 1 && $guess ne $state->{WORD}) { $state->{TOTAL} += $state->{GUESSES_LEFT}; return (qq{You lose. The word was "$state->{WORD}."}, 'lost'); } # update the list of guesses foreach ($guess =~ /(.)/g) { $guessed{$_}++; } $state->{GUESSED} = join('', sort keys %guessed); # correct guess -- word completely filled in unless (grep(!$guessed{$_}, keys %letters)) { $state->{WON}++; return (qq{You got it! The word was "$state->{WORD}."}, 'won'); } # incorrect guess if (! $letters{$guess}) { $state->{TOTAL}++; $state->{GUESSES_LEFT}--; # user runs out of turns return (qq{The jig is up. The word was "$state->{WORD}".}, 'lost') if $state->{GUESSES_LEFT} <= 0; return ('Wrong guess!', 'continue'); } # correct guess but word still incomplete return ('Good guess!', 'continue'); } # create the cute hangman picture --------------------------------picture--- sub picture { my $state = shift; my $tries_left = $state->{GUESSES_LEFT}; my $picture = sprintf("/h%d.gif", $TRIES - $tries_left); print $q->img( {-src=>$picture, -align=>'LEFT', -alt=>"[$tries_left tries_left]" } ); } # print the status ------------------------------------------------status--- sub status { my ($message, $state) = @_; print qq { <table width=100%> <tr> <td> <b> Word #: </b> $state->{GAMENO} ($state->{WORD}) </td> <td> <b> Guessed: </b> $state->{GUESSED} </td> </tr> <tr> <td> <b> Won: </b> $state->{WON} </td> <td> <b> Current average: </b> }, sprintf("%2.3f", $state->{TOTAL} / $state->{GAMENO}), qq{ </td> <td> <b> Overall average: </b> }, $state->{GAMENO} > 1 ? sprintf("%2.3f", ($state->{TOTAL} - ($TRIES - $state->{GUESSES_LEFT} ) ) / ($state->{GAMENO} - 1) ) : '0.000', qq{ </td> </tr> </table> }; my %guessed = (); my @guessed = $state->{GUESSED} =~ /(.)/g; foreach $letter (@guessed) { $guessed{$letter} = 1; } # instead of my %guessed = map { $_ => 1 } $state->{GUESSED} =~ s/(.)/g; print $q->h2("Word:", map { $guessed{$_} ? $_ : '_' } $state->{"WORD"} =~ /(.)/g ); print $q->h2($q->font({-color=>'red'}, $message)) if $message; } # ask the user if (s)he wants to start over ------------show_restart_form--- sub show_restart_form { my $state = shift; print $q->start_form(), "Do you want to play again?", $q->submit(-name=>'restart', -value=>'Another game'), qq{ }, $q->checkbox(-name=>'clear', -label=>'Clear scores'); delete $state->{"WORD"}; # &save_state($state); print $q->end_form; } # print the fill-out form for requesting input -----------show_guess_form--- sub show_guess_form { my $state = shift; print $q->start_form(), "Your guess: ", $q->textfield(-name=>'guess', -value=>'', -override=>1), $q->submit(value=>'Guess'); # &save_state($state); print $q->end_form; } # pick a word, any word ---------------------------------pick_random_word--- sub pick_random_word { open (AB, $WORDS); my @words = <AB>; close(AB); my $chosenWord = $words[int(rand($#words + 1))]; chop($chosenWord); return lc $chosenWord; } # save the state in the database ------------------------------save_state--- sub save_state { my ($state, $id) = @_; my $sth = $DBH->prepare(<<END) || die "prepare: ", $DBH->errstr; UPDATE $DB_TABLE SET WORD=?,GUESSED=?,GAMENO=?,WON=?,TOTAL=?,GUESSES_LEFT=? WHERE session_id='$id' END $sth->execute(@{$state}{qw(WORD GUESSED GAMENO WON TOTAL GUESSES_LEFT)}) || die "execute: ", $DBH->errstr; $sth->finish; } # retrieve the session ID from the path info. if it's -----get_session_id--- # not already there, add it to the path info (more or less) with a redirect sub get_session_id { my (@result); &expire_old_sessions(); my ($id) = $q->path_info() =~ m:^/([a-h0-9]{$ID_LENGTH}):o; return @result if $id and @result = &check_id($id); # if we get here, there's not already an ID in the path info my $session_id = &generate_id(); die "Couldn't make a new session id" unless $session_id; print $q->redirect($q->script_name() . "/$session_id"); exit 0; } # find a new unique ID and insert it into the database -------generate_id--- sub generate_id { # create a new session id my $tries = 0; my $id = &hash($SECRET . rand()); while ($tries++ < $MAX_TRIES) { last if $DBH->do("INSERT INTO $DB_TABLE (session_id) VALUES ('$id')"); $id = &hash($id); } return undef if $tries >= $MAX_TRIES; # we failed return $id; } # check to see that an old ID is valid --------------------------check_id--- sub check_id { my $id = shift; return ($id, '') if $DBH->do("SELECT 1 FROM $DB_TABLE WHERE session_id = '$id'") > 0; return ($id, 'The record of your game may have expired. Restarting.') if $DBH->do("INSERT INTO $DB_TABLE (session_id) VALUES ('$id')"); return (); } # generate a hash value ---------------------------------------------hash--- sub hash { my $value = shift; return substr(MD5->hexhash($value), 0, $ID_LENGTH); } sub expire_old_sessions { # --------------------------expire_old_sessions--- $DBH->do(<<END); DELETE FROM $DB_TABLE WHERE (unix_timestamp() - unix_timestamp(modified)) > $EXPIRE END }
You can play this version of the game here.
Please check the contents of
dgerman_hangman
as you play!
4.5 Using DBI to Its Full Advantage
We've reached the end of our excursion into server side state maintenance techniques and we should feel that we have covered (and learned) a lot. This is, as you would expect, only the beginning.
Once you keep session information stored in a database, there are all sorts of interesting things you can do with it. For example, you can easily compute statistics, such as the average number of games that users have played or how many guesses they have to make on average to arrive at the correct answer.
We could take advantage of this ability to create a "top winners" list, but we won't, at least not immediately. We feel that we've already covered enough. The book, though, has all the details and I strongly encourage you to read it, if you have the inclination, and I very much recommend and endorse it as the outstanding work that it is.