Fall Semester 2004


Lecture Notes Ten: Maintaining State

If you've ever written a complicated CGI script (let's say, at least no simpler than the second program in your second assignment), you know that the main inconvenience of the HTTP architecture is its stateless nature. Once an HTTP transaction is finished, the server forgets all about it. Even if the same remote user connects a few seconds later, from the server's point of view it's a completely new interaction and the script has to reconstruct the previous interaction's state. This makes even simple applications like shopping carts and multipage questionnaires a challenge to write.

CGI script developers have come up with a standard bag of tricks for overcoming this restriction. You can save state information inside the fields of fill-out forms, stuff it into the URI as additional path information, save it in a cookie, ferret it away in a server-side database, or rewrite the URI to include a session ID. In addition to these techniques, the Apache API allows you to maintain state by taking advantage of the persistence of the Apache process itself.

This chapter takes you on a tour of various techniques for maintaining state with the Apache API. In the process it also shows you how to hook your pages up to relational databases using the Perl DBI library. (We really won't touch more than what6 we call the very basics, yet the presentation will be thoroughly complete. For variations on these programs, including everything Apache module, in Perl or C, you'll have to pick up Stein and MacEachern, which is a supremely outstanding book).

1. Choosing the Right Technique.

The main issue in preserving state information is where to store it. Six frequently used places are shown in the following list. They can be broadly broken down into client-side techniques (items 1 through 3) and server-side techniques (items 4 through 6).

  1. Store state in hidden fields
  2. Store state in cookies
  3. Store state in the URI
  4. Store state in web server process memory
  5. Store state in a file
  6. Store state in a database

In client-side techniques the bulk of the state information is saved on the browser's side of the connection. Client-side techniques include those that store information in HTTP cookies and those that put state information in the hidden fields of a fill-out form. In contrast, server-side techniques keep all the state information on the web server host. Server-side techniques include any method for tracking a user session with a session ID.

Each technique for maintaining state has unique advantages and disadvantages. You need to choose the one that best fits your application. The main advantage of the client-side techniques is that they require very little overhead for the web server: no data structures to maintain in memory, no database lookups, and no complex computations. The disadvantage is that client-side techniques require the cooperation of remote users and their browser software. If you store state information in the hidden fields of an HTML form, users are free to peek at the information (using the browser's "View Source" command) or even to try to trick your application by sending a modified version of the form back to you. If you use HTTP cookies to store state information you have to worry about older browsers that don't support the HTTP cookie protocol and the large number of users (estimated to up to 20 percent) who disable cookies out of privacy concerns. If the amount of state information you want to state is large, you may also run into bandwith problems when transmitting the information back and forth.

Server-side techniques solve some of the problems of client-side methods but introduce their own issues. Typically you'll create a "session object" somewhere on the web server system. This object contains all the state information associated with the user session. For example, if the user has completed several pages of a multipage questionnaire, the session will hold the current page number and the responses to previous pages' questions. If the amount of state information is small, and you don't need to hold onto it for an extended period of time, you can keep it in the web server's process memory. Otherwise, you'll have to stash it in some long-term storage, such as a file or a database. Because the information is maintained on the server's side of the connection, you don't have to worry about user peeking or modifying it inappropriately.

However, server-side techniques are more complex than client-side ones. First, because these techniques must manage the information from multiple sessions simultaneously, you must worry about such things as database and file locking. Otherwise, you face the possibility of leaving the session storage in an inconsistent state when two HTTP processes try to update it simultaneously. Second, you have to decide when to expire old sessions that are no longer needed. Finally, you need a way to associate a particular session object with a particular browser. Nothing about a browser is guaranteed to be unique: not its software version number, nor its IP address, nor its DNS name. The browser has to be coerced into identifying itself with a unique session ID, either with one of the client-side techniques or by requiring users to authenticate themselves with usernames and passwords.

A last important consideration is the length of time you need to remember state. If you only need to save state across a single user session and don't mind losing the state information when the user quits the browser or leaves your site, then hidden fields and URI-based storage will work well. If you need state storage that will survive the remote user quitting the browser but don't mind if state is lost when you reboot the web server, then storing state in a web server process memory is appropriate. However, for long-term storage, such as saving a user's preferences over a period of months, you'll need to use persistent cookies on the client side or store the state information in a file or database on the server side.

2. Maintaining State in Hidden Fields

We now introduce the main example used in this chapter, an online hangman game. When the user first accesses the program, it chooses a random word from a dictionary of words and displays a series of underscores for each of the word's letters. The game prompts the user to type in a single letter guess or, if (s)he thinks (s)he knows it, the whole word. Each time the user presses return (or the "Guess" button), the game adds the guess to the list of letters already guessed and updates the display. Each time the user makes the wrong guess, the program updates the image to show a little bit more of the stick figure, up to six wrong guesses total (graphics courtesy Andy Wardley).

When the game is over, the user is prompted to start a new game. A status area at the top of the screen keeps track of the number of words the user has tried, the number of games he's won, and the current and overall averages (number of letters guessed per session).

This hangman game is a classic case of a web application that needs to maintain state across an extended period of time. It has to keep track of several pieces of information, including the unknown word, the letters that the user has already guessed, the number of wins, and a running average of guesses. In this section, we implement the game using hidden fields to record the persistent information. In later sections, we'll reimplement it using other techniques to maintain state.

You can play the game here. The complete code is discussed below. Much of the code is devoted to the program logic of choosing a new word from a random list of words, processing the user's guesses, generating the HTML to display the status information, and creating the fill-out form that prompts the user for input. This is a long script, so we'll have to step through it in stages.

The script starts in the standard way:

#!/usr/bin/perl

use CGI; 
$q = new CGI; 
$WORDS = '/usr/share/lib/dict/words'; 
$TRIES = 6; 

# start the page, just to make sure 
print $q->header, 
      $q->start_html(-title => 'Hangman w/ Hidden Fields',
                     -bgcolor => 'white'); 
In order to compartmentalize the persistent information, we keep all the state information in a hash reference, called $state. This hash contains six keys:

WORD for the unknown word
GUESSED for the list of letters the user has already guessed,
GUESSES_LEFT for the number of tries that the user has left in this game
GAMENO for the number of games the user has played (the current one included)
WON for the number of games the user has won, and
TOTAL for the total number of incorrect guesses the user has made since the user has started playing.

We're now ready to start playing the game:

# retrieve the state 
$state = &getState(); 

# reinitialize if we need to 
if (!$state || $q->param('restart')) { $state = &initialize($state) }

# process the current guess, if any 
($message, $status) = &process_guess($q->param('guess') || '', $state); 

# draw the picture 
&picture($state); 

# draw the statistics 
&status($message, $state); 
We first attempt to retrieve the state information by calling the subroutine get_state(). If this subroutine returns an undefined value or if the user presses the "restart" button, which appears when the game is over, we call the initialize() subroutine to pick a new unknown word and set the state variables to their defaults. Next we handle the user's guess, if any, by calling the subroutine process_guest(). This implements the game logic, updates the state information, and returns a two-item list consisting of a message to display to the user (something along the lines of "Good guess!") and a status code consisting of one of the words "won", "lost", "continue", or "error".

The main task is now to create the rest of the HTML page.

# draw the picture 
&picture($state); 

# draw the statistics 
&status($message, $state); 

# prompt the user to restart or for his next guess 
if ($status =~ /^(won|lost)$/) { # to restart 
  &show_restart_form($state); 
} else {                         # for his/her next game 
  &show_guess_form($state);   
}

print $q->end_html; 
Using CGI.pm functions, we generate the HTTP header (at the top of the script, and that's already done by now) and the beginning of the HTML code. We then generate an <IMG> tag using the state information to select which "hanged man" picture to show and display the status bar. If the status code returned by process_guess() indicates that the user has completed the game, we display the fill-out form that prompts the user to start a new game. Otherwise, we generate the form that prompts the user for a new guess. Finally we end the HTML page and exit.

Let's look at the relevant subroutines now, starting with initialize().

# called to initialize a whole new state object or to create a new game 
sub initialize { 
  my $state = shift; 
  $state = {} unless $state; 
  $state->{WORD} = &pick_random_word(); 
  $state->{GUESSES_LEFT} = $TRIES; 
  $state->{GUESSED} = ''; 
  $state->{GAMENO} += 1; 
  $state->{WON}    += 0; 
  $state->{TOTAL}  += 0; 
  return $state; 
}
All the state maintenance is performed in the subroutines

initialize() creates a new empty state variable if one doesn't already exist, or resets just the per-game fields if one does. The per-game fields that always get reset are WORD, GUESSES_LEFT, and GUESSED. The first field is set to new randomly chosen word, the second to the total number of tries that the user is allowed, and the third to an empty has reference. GAMENO and TOTAL need to persist across user games. GAMENO is bumped up by one each time initialize() is called. TOTAL is set to zero only if it is not already defined. The (re)initialized state variable is now returned to the caller.

# save the current state 
sub save_state {
  my $state = shift; 
  foreach $key ("WORD", "GAMENO", "GUESSES_LEFT", "WON", "TOTAL", "GUESSED") {
    print $q->hidden(-name=>$key, 
                     -value=>$state->{$key}, 
                     -override=>1); 
  }
}
The save_state() routine is where we store the state information.

Because it stashes the information in hidden fields, this subroutine must be called within a <FORM> section. Using CGI.pm's hidden() HTML shortcut, we produce a series of hidden tags whose names correspond to each of the fields in the state hash. For the variables WORD, GAMENO, GUESSES_LEFT, and so on, we just call hidden with the name and current value of the variable.

The output of this subroutine looks something like the following HTML:

<INPUT TYPE="hidden" NAME="WORD"         VALUE="tourists">
<INPUT TYPE="hidden" NAME="GAMENO"       VALUE="2">
<INPUT TYPE="hidden" NAME="GUESSES_LEFT" VALUE="5">
<INPUT TYPE="hidden" NAME="WON"          VALUE="0">
<INPUT TYPE="hidden" NAME="TOTAL"        VALUE="7">
<INPUT TYPE="hidden" NAME="GUESSED"      VALUE="eiotu">
get_state() reverses this process, reconstructing the hash of state information from the hidden form fields: This subroutine loops through each of the scalar variables, calls param() to retrieve its value from the query string, and assigns the value to the appropriate field of the state variable.

# called to retrieve an existing state 
sub getState { 
  return undef unless $q->param(); 
  my $state = {}; 
  foreach $key ("WORD", "GAMENO", "GUESSES_LEFT", "WON", "TOTAL", "GUESSED") {
    $state->{$key} = $q->param($key); 
  }  
  return $state; 
}
The rest of the script is equally straightforward.

The process_guess() subroutine (too long to be reproduced here, see full program code below) first maps the unknown word and the previously guessed letters into hashes for easier comparison later. Then it does a check to see if the user has already won the game but has not moved on to a new game (which can happen if the user reloads the page).

The subroutine now begins to process the guess. It does some error checking on the user's guess to make sure that it is a valid series of lowercase letters and that the user hasn't already guessed it. The routine then checks to see whether the user has guessed a whole word or a single letter. In the latter case, the program fails the user immediately if the guess isn't an identical match to the unknown word. Otherwise, the program adds the letter to the list of guesses and checks to see whether the word has been entirely filled in. If so, the user wins. If the user has guessed incorrectly, we decrement the number of turns left. If the user is out of turns, (s)he loses. Otherwise, we continue.

The picture() routine generates an <IMG> tag pointing to an appropriate picture. There are six static pictures named h0.gif through h5.gif and this routine generates the right filename by subtracting the total number of tries the user is allowed from the number of turns (s)he has left.

The status() subroutine is responsible for printing out the game statistics and the word itself. The most interesting part of the routine is toward the end, where it uses map() to replace the not-yet-guessed letters of the unknown word with underscores.

pick_random_word() is the routine that chooses a random word from a file of words. Many Unix systems happen to have a convenient list of about 38,000 words located in a file somewhere (our system has it in /usr/share/lib/dict/words). Each word appears on a separate line. We choose the new word in a simple minded way, by reading the whole file in as a list then randomly selecting a word as in helloFive (although we could and should use an even better algorithm, which has the drawback that needs to be explained more, so we will stick with the simple-minded one for now).

Because the state information is saved in the document body, the save_state() function has to be called from the part of the code that generates the fill-out forms. The two places where this happens are the routines show_guess_form() and show_restart_form().

# print the fill-out form for requesting input 
sub show_guess_form { 
  my $state = shift; 
  print $q->start_form(), 
        "Your guess: ", 
        $q->textfield(-name=>'guess', 
                      -value=>'', 
                      -override=>1), 
        $q->submit(value=>'Guess');
  &save_state($state); 
  print $q->end_form; 
}
show_guess_form() produces the fill-out form that prompts the user for his guess. It calls save_state() after opening a <FORM> section and before closing it.

# ask the user if (s)he wants to start over 
sub show_restart_form { 
  my $state = shift;
  print $q->start_form(), 
        "Do you want to play again?",
        $q->submit(-name=>'restart', 
                   -value=>'Another game'); 
  delete $state->{"WORD"}; 
  &save_state($state); 
  print $q->end_form; 
} 

show_restart_form() is called after the user has either won or lost a game. It creates a single button that prompts the user to restart. Because the game statistics have to be saved across game, we call save_state() here too. The only difference from show_guess_form() is that we explicitely delete the WORD field from the state variable. This signals the script to generate a new unknown word on its next invocation. Here, now, is the complete source code of this version of the program.

#!/usr/bin/perl

# http://burrowww.cs.indiana.edu:9760/cgi-bin/stein/hidden

use CGI; 
$q = new CGI; 
$WORDS = '/usr/share/lib/dict/words'; 
$TRIES = 6; 

# start the page, just to make sure 
print $q->header, 
      $q->start_html(-title => 'Hangman Hidden Fields',
                     -bgcolor => 'white'); 

# retrieve the state 
$state = &getState(); 

# reinitialize if we need to 
if (!$state || $q->param('restart')) { $state = &initialize($state) }

# process the current guess, if any 
($message, $status) = &process_guess($q->param('guess') || '', $state); 

# draw the picture 
&picture($state); 

# draw the statistics 
&status($message, $state); 

# prompt the user to restart or for his next guess 
if ($status =~ /^(won|lost)$/) { # to restart 
  &show_restart_form($state); 
} else {                         # for his/her next game 
  &show_guess_form($state);   
}

print $q->end_html; 

#------------(subroutines)--------------

# called to retrieve an existing state 
sub getState { 
  return undef unless $q->param(); 
  my $state = {}; 
  foreach $key ("WORD", "GAMENO", "GUESSES_LEFT", "WON", "TOTAL", "GUESSED") {
    $state->{$key} = $q->param($key); 
  }  
  return $state; 
}

# called to initialize a whole new state object or to create a new game 
sub initialize { 
  my $state = shift; 
  $state = {} unless $state; 
  $state->{WORD} = &pick_random_word(); 
  $state->{GUESSES_LEFT} = $TRIES; 
  $state->{GUESSED} = ''; 
  $state->{GAMENO} += 1; 
  $state->{WON}    += 0; 
  $state->{TOTAL}  += 0; 
  return $state; 
}

# called to process the user's guest 
sub process_guess { 
  my ($guess, $state) = @_; 

  # lose immediately if user has no more guesses left 
  return ('', 'lost') unless $state->{"GUESSES_LEFT"} > 0; 

  # create hash containing the letters guessed thus far
  my %guessed = map { $_ => 1 } $state->{"GUESSED"} =~ /(.)/g; 
  # create hash containing the letters in the original word 
  my %letters = map { $_ => 1 } $state->{"WORD"} =~ /(.)/g; 

  # return immediately if user has already guessed the word 
  return ('', 'won') unless grep (!$guessed{$_}, keys %letters); 

  # do nothing more (stop here) if no guess is provided 
  return ('', 'continue') unless $guess; 

  # this section processes individual letter guesses 
  $guess = lc $guess; 
  return ("Not a valid letter or word!", 'error') unless $guess =~ /^[a-z]+$/; 
  return ("You already guessed that letter!", 'error') if ($guessed{$guess}); 

  # this section is called when the user guesses the whole world 
  if (length($guess) > 1 && $guess ne $state->{WORD}) {
    $state->{TOTAL} += $state->{GUESSES_LEFT}; 
    return (qq{You lose. The word was "$state->{WORD}."}, 'lost');  
  }

  # update the list of guesses 
  foreach ($guess =~ /(.)/g) { $guessed{$_}++; }
  $state->{GUESSED} = join('', sort keys %guessed); 

  # correct guess -- word completely filled in
  unless (grep(!$guessed{$_}, keys %letters)) {
    $state->{WON}++; 
    return (qq{You got it! The word was "$state->{WORD}."}, 'won'); 
  }

  # incorrect guess
  if (! $letters{$guess}) { 
    $state->{TOTAL}++; 
    $state->{GUESSES_LEFT}--; 

    # user runs out of turns 
    return (qq{The jig is up. The word was "$state->{WORD}".}, 'lost') 
      if $state->{GUESSES_LEFT} <= 0; 

    return ('Wrong guess!', 'continue'); 
  } 
  # correct guess but word still incomplete 
  return ('Good guess!', 'continue'); 
}

# create the cute hangman picture 
sub picture { 
  my $state = shift; 
  my $tries_left = $state->{GUESSES_LEFT}; 
  my $picture = sprintf("/h%d.gif", $TRIES - $tries_left); 

  print $q->img( {-src=>$picture, 
                  -align=>'LEFT',
                  -alt=>"[$tries_left tries_left]"
                 } 
               ); 
}

# print the status 
sub status { 
  my ($message, $state) = @_; 
  print qq {
    <table width=100%> <tr> 
           <td> <b> Word #: </b> $state->{GAMENO} ($state->{WORD}) </td>  
           <td> <b> Guessed: </b> $state->{GUESSED} </td> 
      </tr> <tr> 
           <td> <b> Won: </b>  $state->{WON} </td> 
           <td> <b> Current average: </b> },
 
                    sprintf("%2.3f", $state->{TOTAL} / $state->{GAMENO}), 

       qq{ </td> <td> <b> Overall average: </b> }, 

            $state->{GAMENO} > 1 ? sprintf("%2.3f", 
                                             ($state->{TOTAL} - 
                                                ($TRIES - 
                                                 $state->{GUESSES_LEFT}
                                                )
                                             ) / ($state->{GAMENO} - 1)
                                          ) 
                                   : '0.000', 

       qq{ </td> 
      </tr> 
    </table> 
  }; 

  my %guessed = (); 
  my @guessed = $state->{GUESSED} =~ /(.)/g; 
  foreach $letter (@guessed) { 
    $guessed{$letter} = 1; 
  } # instead of my %guessed = map { $_ => 1 } $state->{GUESSED} =~ s/(.)/g; 

  print $q->h2("Word:", 
               map { $guessed{$_} ? $_ : '_' } 
                   $state->{"WORD"} =~ /(.)/g
  );
  
  print $q->h2($q->font({-color=>'red'}, 
               $message)) 
    if $message; 

}

# ask the user if (s)he wants to start over 
sub show_restart_form { 
  my $state = shift;
  print $q->start_form(), 
        "Do you want to play again?",
        $q->submit(-name=>'restart', 
                   -value=>'Another game'); 
  delete $state->{"WORD"}; 
  &save_state($state); 
  print $q->end_form; 
} 

# print the fill-out form for requesting input 
sub show_guess_form { 
  my $state = shift; 
  print $q->start_form(), 
        "Your guess: ", 
        $q->textfield(-name=>'guess', 
                      -value=>'', 
                      -override=>1), 
        $q->submit(value=>'Guess');
  &save_state($state); 
  print $q->end_form; 
}

# pick a word, any word 
sub pick_random_word { 
  open (AB, $WORDS); 
  my @words = <AB>; 
  close(AB); 
  my $chosenWord = $words[int(rand($#words + 1))]; 
  chop($chosenWord); 
  return lc $chosenWord; 
} 

# save the current state 
sub save_state {
  my $state = shift; 
  foreach $key ("WORD", "GAMENO", "GUESSES_LEFT", "WON", "TOTAL", "GUESSED") {
    print $q->hidden(-name=>$key, 
                     -value=>$state->{$key}, 
                     -override=>1); 
  }
}
Although this method of maintaining the hangman game's state works great, it has certain obvious limitations. The most severe of these is that it's easy for the user to cheat. All (s)he has to do is to choose the "View Source" command from his browser's menu bar and there's the secret word in full view, along with all other state information. The user can then use this knowledge of the word to immediately win the game, or (s)he can save the form to disk, change the values of the fields that keep track of the wins and losses, and resubmit the doctored form in order to artificially inflate the statistics.

These considerations are not too important for the hangman game, but they become real issues in applications where money is at stake. Even with the hangman game we might worry about the user tampering with the state information if we were contemplating turning the game into an Internet tournament. Techniques for preventing user tampering are discussed later in this chapter.


Last updated: Oct 13, 2004 by Adrian German for A348/A548