Lecture notes for Monday, 6/30

CGI "States"

Today's homework assignment seems at first to be easier than Thursday's. Last time, you wrote a Python script that displayed the picture and name of a random student. This time, you have to write a script that displays the students sequentially, along with buttons to move forward and backward through the list. The reason the previous assignment is easier is that every time we reload the page, we don't need to carry any information with us. This time, we need to remember which student we were on before so that we can load the next (or previous) student. Effectively, we need to find a way to use CGI to "store" some sort of state.

Unfortunately, the only thing CGI really gives us to work with is the series of environment variables and their values, as encoded into the URL (that's everything after the question mark). Every time a CGI script is loaded, it's a brand new instance, and it doesn't have any idea whether it's been run before or what was calculated or stored the last time it was run. All it has access to is the variables and their values from the environment. This means that if we want our script to know anything about the state of the page last time it was run, it has to be communicated by putting something into that environment.

So far, the only way to put data into the environment was by using HTML forms and getting that information from the user via an <input> tag. In this case, we'll still be using an <input> tag, but we don't really want any input from the user. In fact, we don't even want the user to see it. We just want to set a namee/value pair to be quietly included in the environment. To do this, we use the "hidden" <input> type.

Button Clicker: the Game

Okay, enough theory. Let's look at an actual example. We created a web page with a single button that displays how many times the user has clicked the button. The "state" that needs to be remembered here is how many times the button had previously been pressed.

So we went to our apache/cgi-bin/ folder and created the file buttonclicker.py. Much of the code is similar to what we'd done before, so we started with the following sort of skeleton script.


#!/usr/bin/python

import cgi
import cgitb
cgitb.enable()

form = cgi.FieldStorage()

print "Content-Type: text/html\n"

#Here, we'll set the value of a variable named "clicks", storing the
#number of times the button has been clicked.

print '''
<html>
<head>
 <title>Button Clicker: the Game</title>
</head>

<body>
<h1>Button Clicker: the Game</h1>
'''

#Here, we'll display how many times the button has already been pressed.

print '''
<form>

<input type="submit" name="action" value="Click!">
'''

#Here, we'll put in the "hidden" variable that sends the number of 
#clicks to the script the next time it's run.

print '''
</form>
</body>
</html>
'''

Some common mistakes

Most of this we've seen before, but let me point out a couple places where I've seen mistakes.

In that first line, don't forget the slash at the beginning of the path. It needs to be #!/usr/bin/python, not #!usr/bin/python. Very subtle, but a bug like that can take forever to find, especially if you've got a lot of code written already. Bugs like these are why it's often a good idea to start by just writing a skeleton of the code that doesn't really do anything yet, just to make sure it works and you haven't made any little typos.

Be careful with your strings. On a single line, anything between a pair of quotes ("double" or 'single') is a string. If the line ends before you've closed off the string, you'll get an error. The only exception is for strings using three quotes ('''single''' or """double"""). There, you can have line breaks in the middle of the string without using \n. Remember that when you're printing a string in a CGI script, you're not printing directly to the screen; you're creating HTML to be sent to the browser, and the browser will handle printing to the screen. So any HTML tags (like <p> or <br> or <form> or <input>) can only appear inside of a string.

Okay, so we saved the file, made sure it was executable (chmod u+x buttonclicker.py), and loaded it up in the browser. We see the title and the button. Clicking the button reloads the page, adding the environment string action=Click%21 to the URL. Our script doesn't use he environment variables, so nothing different happens when we click the button.

Note that the exclamation point in "Click!" has been converted to the escape code "%21. Just about everything except alphanumeric characters is converted to a % escape code before being put into the URL. (Spaces are converted to plus signs (+).) The "cgi" Python module that we load converts everything back into ordinary strings, so we don't have to deal with that when writing our script, but you should be aware that this is going on when you see it in the URL.

It's also worth noting that the "submit" type is a little unusual from an HTML standpoint. Usually with HTML, especially HTML5, what's displayed is completely separated from what's going on behind the scenes. For example, when you create a radio button or checkbox, you assign it a "value", but "that" value doesn't appear anywhere on the page. That "value" is just the value that will be assigned to the environment variable if the button/box is checked. You have to explicitly write down next to the button what you want the user to see next to the button. But with the "submit" type, the "value" is both the value that's assigned to the environment variable and the text that's printed on the button.

"hidden" <input>s

So now it was time to implement the hidden input which will send along the value of the clicks variable to the script after the button has been clicked. So we updated the script as follows.


#!/usr/bin/python

import cgi
import cgitb
cgitb.enable()

form = cgi.FieldStorage()

print "Content-Type: text/html\n"

#Here, we set the value of a variable named "clicks", storing the
#number of times the button has been clicked.
if "clicks" in form:
        clicks = int(form.getvalue("clicks")) + 1
else:
        clicks = 0

print '''
<html>
<head>
 <title>Button Clicker: the Game</title>
</head>

<body>
<h1>Button Clicker: the Game</h1>
'''

#Here, we'll display how many times the button has already been pressed.

print '''
<form>

<input type="submit" name="action" value="Click!">
'''

#Here is the "hidden" variable that sends the number of 
#clicks to the script the next time it's run.
print '<input type="hidden" name="clicks" value="' + str(clicks) + '">'

print '''
</form>
</body>
</html>
'''

Let's look at that second new addition:

print '<input type="hidden" name="clicks" value="' + str(clicks) + '">'

This line generates a bit of HTML that will ensure that there's an environment variable named clicks that will store (as a string) whatever value the Python variable named clicks currently has.

Now let's look at the first new addition:


if "clicks" in form:
        clicks = int(form.getvalue("clicks")) + 1
else:
        clicks = 0

Whenever you're writing a CGI script, keep in mind that the same script is loaded every single time, and the environment variables are the only way the script knows the difference between (for example) whether you just loaded the page for the first time or whether you reloaded the page by clicking the button. So we have two cases: one where the button has just been pressed (and so the environment contains a variable named clicks), and one where the page was loaded by itself (and there is no such variable). If there is an environment variable clicks, we convert that to a number, add one to it, and store it in the Python variable (also named clicks). Otherwise, we set the Python variable clicks to zero.

Environment Variables vs. (Python) Script Variables

When we saved this script and opened it in the browser, it looked fine at first. There was a button, and when we click it, we could see that there was a new environment variable in the URL named clicks, with value 0. But wait, shouldn't that have been 1, since we had clicked the button once already? When we click again, the value went up to 1 and continued to go up by one for every click. Did that mean we did something wrong?

Not necessarily... See, that variable clicks up in the environment represents the value that was sent to the script. It's an environment variable. It's not the same as the Python variable clicks, which is the one that we added one to at the very beginning of the script. It would be a little more obvious to see if we had a line that actually displays the value of the Python variable clicks. So we did that. We added the following, right before we print the form:


#Here, we display how many times the button has already been pressed.
if clicks<=0:
        print '<p>Click to begin!</p>'
else:
        print '<p>You have clicked ' + str(clicks) + ' times!</p>'

Actually, this does a little bit more than that. If we haven't clicked at all yet, then it displays a "Click to begin!" message. Otherwise, it displays the Python variable clicks. And it works just as we wanted it to.

The moral of this story is that you shouldn't confuse environment variables (which are passed to the script and never ever changed*) with script variables, which can be changed normally. If this is the sort of thing that gives you a headache, it might be a good idea to make sure that the names of your script variables are different from your environment variables. That might make it less easy to confuse them.

Cleaning Things Up a Little Bit

We'll go into this a little bit more in lab on Tuesday, but we made one little change to the code to take care of a problem that can arise when a user directly edits the environment string in the URL (which is actually really easy to do, sometimes even accidentally). We altered the first conditional to this:


if "clicks" in form and int(form.getvalue("clicks"))>=0:
        clicks = int(form.getvalue("clicks")) + 1
else:
        clicks = 0

This requires that the value of clicks be non-negative before incrementing it. If it's not, then the script sets the value to zero. This takes into account the situation where someone alters the environment string to make clicks a negative number. Of course, this isn't the only problem that could arise. We didn't take into account many other potential issues. In particular, we didn't take into account what happens when someone sets clicks to a value that isn't a number and we didn't take into account what happens if someone sets the value of clicks more than once in the environment. True defensive programming should take into account all of these possibilities and at least refrain from crashing.

Multiple Values for the Same Environment Variable

I talked about this in the lecture notes from Thursday, but we didn't talk about it in class, so we added a few checkboxes to our code to get some practice with using them. In particular, I polled the class for a few of their favorite words, which can be selected by the user before clicking the button. Here is the final version of Button Clicker:


#!/usr/bin/python

import cgi
import cgitb
cgitb.enable()

form = cgi.FieldStorage()

print "Content-Type: text/html\n"

#Here, we set the value of a variable named "clicks", storing the
#number of times the button has been clicked.
if "clicks" in form and int(form.getvalue("clicks"))>=0:
        clicks = int(form.getvalue("clicks")) + 1
else:
        clicks = 0

words = form.getlist("words")
wordstring = ""
for word in words:
        wordstring = wordstring + ', ' + word

print '''
<html>
<head>
 <title>Button Clicker: the Game</title>
</head>

<body>
<h1>Button Clicker: the Game</h1>
'''

#Here, we display how many times the button has already been pressed.
if clicks<=0:
        print '<p>Click to begin!</p>'
else:
        print '<p>You have clicked ' + str(clicks) + ' times!</p>'
        print '<p>Your favorite word(s) is/are:' + wordstring + '</p>'

print '''
<form>

<p>Select your favorite word(s):</p>
<input type="checkbox" name="words" value="beautiful">beautiful<br>
<input type="checkbox" name="words" value="ugly">ugly<br>
<input type="checkbox" name="words" value="peculiarly">peculiarly<br>
<input type="checkbox" name="words" value="funny">funny<br>

<input type="submit" name="action" value="Click!">
'''

#Here is the "hidden" variable that sends the number of 
#clicks to the script the next time it's run.
print '<input type="hidden" name="clicks" value="' + str(clicks) + '">'

print '''
</form>
</body>
</html>
'''

Now if the user selects more than one word (or none at all!), then we can't use form.getvalue("words") or form["words"].value because those methods presume that there's only one value. But we can use form.getlist("words"), which returns a list of values. This is a very versatile method because it works when there's more than one value, when there's exactly one value (it just returns a singleton list), and when there are no values at all (it returns an empty list).

I also briefly mentioned that you can use form.getfirst("words") to retrieve the first item in the list. This is sometimes a good substitute for something like form.getvalue("words") because it returns the only value if there's only one and it returns the first value if there are more than one. In either case, it returns a single value. The only thing it doesn't cover is if there are no values at all.

The for loop we wrote pieces together the different values for words, building a single string that has all of them in it. It does so in a very sloppy way, however. In particular, there's a comma floating around at the beginning of the list. This is an issue that could easily have been fixed if we'd had a little more time. It might be good practice for you to try and fix that problem yourself.

Summary

We introduced the HTML <input> type "hidden", and discussed how it can be used to store information about the state of the page/script.

The only core Python feature that we used that we hadn't used before was the for loop. We also talked about the .getlist("variablename") method and we briefly mentioned .getfirst("variablename") as well.

Remember that homework #5 is due tomorrow before class.