Lecture notes for Thursday, 6/26

Permissions and the Mystery of Homework #1 Solved

We began the lesson by talking about directory permissions again. The mystery of why I couldn't access your homework #1 directories had been solved. In order for me to access anything inside your home directory (including the homework directory), it's necessary for me to have execute permissions for your home directory. You can think of having "execute" permissions for a directory as having a key to a building and "read" permissions as being able to turn on the lights. If I have the key but can't turn on the lights, then I can access those subdirectories I have permission for, but only if I know exactly where they are (meaning that I know the exact name). You had all given me execute and read permissions for your "homework" directories, meaning that I had the key to the "homework" room and I could turn on the lights once I got in there. But I didn't have "execute" permissions for your home directories, so I couldn't even get into the building. The solution is to grant "execute" permissions for your home directory. You're giving out a key to the building. But you're not giving "read" permissions, so I can't turn on the lights, except in the "homework" directory. And that's as it should be. You don't want everyone to be able to see everything inside your home directory. You just want us to be able to access those subdirectories and files that you've granted permission for us to see.

The short version is that you need to grant everyone "execute" permissions for your home directory, but not "read" or "write" permissions. Here's how you would do that. Log into Silo and make sure you're in your home directory. Use the command chmod a+x ~ to grant everyone executable permissions for your home directory.

If you've placed your "homework" directory inside of another directory, then you'll also have to grant executable permissions for that directory as well.

CGI

We talked a bit about CGI in general. CGI is not a programming language like Python or Java or PHP. It's not even a markup language like HTML or LaTeX. CGI (Common Gateway Interface) is just a standard for how a web server (which normally just serves up static HTML files) should interact with programs or scripts that actually do something. This is why you'll hear people talking about CGI/Python, CGI/Perl, or CGI/PHP. You can write the script in pretty much any programming or scripting language you want.

You've actually already seen a little bit of what the CGI standard looks like. Last time, when we created a web page with a form that took user input, hitting the submit button caused it to reload the page, with extra data appended to the URL. That method of presenting the data (a question mark followed by a list of variable=data pairs, each separated by an ampersand) is how the information is sent to a script.

We created another interface, like we did in the previous lecture, but it was simpler and it actually did something this time because we implemented a Python script to interface with it. It was a simple page that asks for the user's name and age, and upon pressing a button, it greets the user by name and tells them how old they'll be in the following year. We started by creating a simple user interface page as before. We called it nextyear.html and placed it in the apache/htdocs/ directory.


<html>
<head>
 <title>Age Next Year</title>
</head>
<body>

<form>

Name: <input type="text" name="who">
Age: <input type="text" name="age">
<input type="submit" name="action" value="Proceed">

</form>

</body>
</html>

We loaded the page in our browser, filled out the form and pressed "Proceed". As predicted, it reloaded the web page, appending the data to the URL. By the way, this list of data is called the "environment". Since we plan on creating a separate script to handle the data, we'd like the page to call up the script file and not just reload the same page. We can do this by adding an action attribute to the <form> tag:

<form action="http://silo.soic.indiana.edu:61300/cgi-bin/nextyear.py">

If you're playing along at home, you should use your Apache port number instead of 61300. We saved and loaded this into our browser and then guessed about what might happen. The page did what it was supposed to and tried to open the link to the Python script, sending the data as part of the URL, but the script didn't exist yet, so we got a "Not Found" message.

CGI/Python

So the next step was to actually go and create the Python script. We moved to the apache/cgi-bin/ directory, which is the traditional place to store the scripts that a web page calls. We created a text file called nextyear.py.


[ewennstr@silo htdocs]$ cd ..
[ewennstr@silo apache]$ cd cgi-bin
[ewennstr@silo cgi-bin]$ pico -w nextyear.py

Do not forget the -w option! Unlike many programming languages, Python is very picky about line breaks and spacing. In a programming language like PHP or C++, every line ends in a semi-colon, so if extra line breaks sneak in, it doesn't matter. But in Python, lines are ended with a line break, so if you forget to use -w to disable word wrap, long lines will get broken up and you could end up with all sorts of weird behavior.

Here's what we put in the script file at first. It doesn't do everything yet, but we wanted to get used to how things work before we got more complicated.


#!/usr/bin/python

import cgi
import cgitb    #special exception handler
cgitb.enable()

form=cgi.FieldStorage()

print "Content-Type: text/html\n"

print form["who"].value

Now even though we gave the file the extension .py, that doesn't mean that the server knows this is a python script. So the very first line (#!/usr/bin/python) is an instruction to the server that tells it what program to use to execute the script. So when the server encounters that first line, it will use the program called python in the directory /usr/bin/ to run the script.

The next line (import cgi) imports the package that enables Python to get at the data that was collected in the HTML form. You can read more about this package here. The next two lines import cgitb and cgitb.enable) enable a tool that will allow Python to send more useful error messages when it's called via a web page. For a real-world application, you would only use this during development and disable it when the product was ready for users (you wouldn't want to expose anything about the code itself to your users).

Because this is a CGI script, the output is sent to the web browser, which doesn't automatically know what kind of content it's going to get. The first print line (print "Content-Type: text/html\n") tells the web browser that what's coming is text data, but more specifically it's an HTML file. The line break (/n) is necessary, and it won't work properly without it. If you leave this line out, the web browser may make incorrect assumptions about what kind of data it's getting. For example, Firefox will assume that it's getting an ordinary text file and it will print out the source code of the HTML that you send, which is almost certainly not what you wanted.

cgi.FieldStorage() creates an object of class FieldStorage that has all of the data from the environment (that's all the stuff in the URL after the question mark). It works a little like a Python dictionary. You can access the contents in the environment in a couple different ways. Here, we used form["who"].value to refer to the value of the environment variable called who. This method works as long as the variable who exists and has exactly one value (this isn't always the case, as with checkboxes). You can check to see if it exists by asking "who" in form. You can also access the same data by using form.getvalue("who").

A side note that we didn't talk about in class, but came up in the lab. Remember that the environment can multiple assignments to the same variable. This happens often if you use checkboxes instead of radio buttons. When this happens, the form["variablename"].value and form.getvalue("variablename") commands won't work properly. Instead, you can use form.getlist("variablename") to get a list of all the values that have been assigned to variablename. If you're not sure whether you're going to get a list or a single value, you can use form.getfirst("variablename"), which will be either the value of variablename (if there's only one value) or the first value of variablename (if there are more than one).

So if the script we created worked properly, upon pressing the "Proceed" button on the interface, the script should load and print out whatever was in the "Name" text box. To see if this worked, we saved the file, loaded up our interface page again, filled out the form, and pressed the button.

It didn't work. Instead, we got an "Internal Server Error", which means that something went wrong when the server (Silo) tried to run the script. This could mean any of a number of problems with the script, but in this case, the content of the script was just fine. It's just that Silo didn't know that the script file was something that could be executed at all. So we went and changed the permissions of the script file so that we could execute it. (Note that we don't need to set it to executable for anyone other than the user who owns the script.) This is a really common mistake, and it's easy to overlook!


[ewennstr@silo cgi-bin]$ chmod u+x nextyear.py

We tried again, and it worked as expected. So we opened the file back up to make it a little cleaner. First, we made sure that we'd formatted everything as valid HTML.* The code looked something like this:

*The notes may differ slightly from what we did in class, but all the same information should be there. In fact, I'll probably also add in a few things that I should have mentioned during class.


#!/usr/bin/python

import cgi
import cgitb    #special exception handler
cgitb.enable()

form=cgi.FieldStorage()

print "Content-Type: text/html\n"

print '''
<html>
<head>
 <title>Your Age Next Year</title>
</head>
<body>
'''

print "Hello, " + form["who"].value + "."

print '''
</body>
</html>
'''

You might remember that in Python, you can use either 'single quotes' or "double quotes" to define strings. But if you want to put a line break into a string that uses double or single quotes, you have to write \n. If you put actual line breaks into the string, Python will assume that the command has ended and that you just forgot to close off the string. This will probably result in strange errors. This can get really annoying when you're trying to print out lots of text with lots of line breaks, like we are here. The solution is to surround those long strings with three quotation marks in a row. They can either be '''three single quotes''' as in our example here, or they can be """three double quotes""". Inside of a string defined using three quotes, any line break that appears will be treated as part of the string and not as defining the end of a line of code.

At this point, we'd added enough HTML to define a page with a head (containing a title) and a body (containing one line of text). That line is generated by the command print "Hello, " + form["who"].value + "." which takes the three strings ("Hello, ", whatever value form["who"].value has, and "."), concatenates them into one long string, and prints that.

In class, we used a comma instead of a plus sign, which gives slightly different results. The comma is more versatile in that not every argument has to be a string; it just has to be something that Python knows how to print. But the plus sign gives us more control over whether to put spaces between the various things being printed. For this particular page, it just makes thing look nicer without awkward spaces in strange places, but it's essential in other situations, such as if you were printing a URL as part of an <img src="http://..."> tag. (This might be a hint for homework #4.)

So the next step was to add the part where the script tells the user how old they'll be in the following year. So we added a bit more to that print line, changing it to:


print "Hello, " + form["who"].value + ". Next year, you will be " + (form["age"].value + 1) + " years old."

We saved the script, loaded, the page, entered some information, and clicked "Proceed". Instead of loading the page, it sent us to an odd purple and pink error page informing us of a <type 'exceptions.TypeError'>. It also informed us of which script generated the error and which line was in the error as well as printing some more useful information. This is the work of the "cgitb" exception handler. If we hadn't enable cgitb, then we probably would've just gotten another "500 Internal Server Error". You wouldn't want an end user to see an error like this. It's there to make it easier for us to debug the script. Of course in this case, we had a few sharp minds who knew what the problem was immediately and we didn't need to read the error message very closely.

The problem was that we'd treated form["age"].value as if it were a number and not a string that we merely expect to have a number in it. In this version for the notes, where I used plus signs instead of commas, the mistake is even worse. Not only did we try to add 1 to a string, but even if it had been a number, we would be trying to concatenate a number with a string, which doesn't work.

Fortunately, Python makes converting from strings to numbers (int()) and vice versa (str()) very easy. So we added a line creating a Python variable age to store the age as an integer.


#!/usr/bin/python

import cgi
import cgitb    #special exception handler
cgitb.enable()

form=cgi.FieldStorage()

age = int(form["age"].value)

print "Content-Type: text/html\n"

print '''
<html>
<head>
 <title>Your Age Next Year</title>
</head>
<body>
'''

print "Hello, " + form["who"].value + ". Next year, you will be " + str(age+1) + " years old."

print '''
</body>
</html>
'''

Actually, in our class, since we were using commas (to give the print command multiple arguments) instead of plus signs to concatenate all the arguments together, we didn't even need the str() part. But since we're trying to concatenate strings here, we need to convert next years age from an integer back to a string first.

When we saved and ran this script, everything worked as it was supposed to.

HTML5 and the "number" input type

When Karteek was going through your web pages for Homework #3, he pointed out to me that the most recent version of HTML (HTML5) has expanded the number of input types you can use for HTML forms. We were using type="text" for things like "age", which is what we would have had to do back in þe olde days before 2012. But in the modern world of HTML5, we have many more options for input types. (You can see all of them here.)

So since I'd never used the "number" type, we adjusted our page slightly to see what would happen if we used that instead of "text" for our age input box. We changed that one line to Age: <input type="number" name="age"> and reloaded the page. At first, we didn't notice any difference at all. Which wasn't all that shocking. After all, the only thing we did was tell the web browser (Firefox, in this case) to expect a number. Firefox doesn't have to do anything special with that information. But then someone pointed out that Chrome does do something different. It adds little increment and decrement arrows to the side of the input box, and it gives a pop-up error message if you try to enter something that's not a number.

We didn't test this in class, but it's worth noting that the CGI standard still treats all environment variables as strings. So even though using the "number" type means that the browser knows to expect a number, the script is still being sent a string and not an integer. So we still have to use the int() function to convert the string to a number before doing arithmetic with it.

CGI Scripts Don't Need Separate Interface Pages

We didn't talk about this in class, but it's worth mentioning. Our example was written as two files: an HTML interface and a Python script that generates an HTML web page. You don't have to separate the two things, and often (such as with homework #4), you probably shouldn't. We could have created a single script that generates an HTML page and loaded that directly into the browser. For this particular assignment, we should be careful about what is displayed when the page is loaded as opposed to when it's reloaded with the "Proceed" button. The message about how old you'll be next year shouldn't display until we've hit the button. So we would need to include a line that checks to see if the who and age variables have values. If they have, then we should print the message about next year's age. If not, then we should skip that and just print the form. The code mike look something like this:


#!/usr/bin/python

import cgi
import cgitb    #special exception handler
cgitb.enable()

form=cgi.FieldStorage()

print "Content-Type: text/html\n"

print '''
<html>
<head>
 <title>Your Age Next Year</title>
</head>
<body>

<form>

Name: <input type="text" name="who">
Age: <input type="number" name="age">
<input type="submit" name="action" value="Proceed">

</form>
'''

if "who" in form and "age" in form:
        age = int(form["age"].value)
        print "Hello, " + form["who"].value + ". Next year, you will be " + str$

print '''
</body>
</html>
'''

Summary

We wrote our first web page that uses a CGI script. We wrote it in Python, using the cgi and cgitb packages. The cgi package defines a class called FieldStorage and we saw that we could use it like a dictionary, and we saw some of its special methods, including:

Most of you have coded with Python before, but in case you haven't, or if you need a refresher, we used the following Python features and functions:

Lastly, we looked briefly at the new HTML5 <input> types, especially the "number" type.