CSCI A348/A548

Lecture Notes 7

Fall 1999


Review of Perl. More Perl. HTML with forms. Parsing CGI data. This lecture is Part One of all these.

Handin Information

You need to create a directory called assignments in your htdocs where you will place all your source code assignments. Make sure that the files are not readable by anybody other than the owner and make this directory accessible by dgerman only following the instructions provided tonight in class. Use htpasswd to create a password for this username and send me the password. I will check your source code and assignments during the semester as you post them in this directory.

Review of Perl

We now review Perl with our future tasks in mind.

Normally, people consider Perl to be an interpreted language because program execution basically starts at the top and continues line-by-line. But when a Perl program is run, it is actually first parsed and compiled, and only then is it executed. This approach provides some of the efficiency of compilation while permitting the convenience and flexibility of interpreted languages.

Perl's power derives from a combination of the best properties of many different languages. For example, as with most interpreted scripting languages, commands in a Perl program need not be included within a function. Each line in a script is run from top to bottom; the first line of the script ordinarily will be the first to be executed. Contrast this with a C program, in which all commands (or statements) must reside within functions, and program execution always begins with the main function. Like C, however, Perl is a free-form language. You can generally put as many statements as you like on a single line and put line breaks wherever you want. To tell where each statement ends and another begins, each statement must be terminated with a semicolon.

This flexibility carries over to Perl functions and variables as well. Perl variables come in many different flavors, but all of them are case-sensitive, don't need to be declared in advance of use, and are global by default. That is, unless you explicitely indicate otherwise, each variable will be shared across all the functions in a script. And speaking of functions, you don't need to declare them in advance either. Furthermore, many functions in Perl do not require you to enclose parameters in parentheses - a necessity for many programming languages. In what follows we review variables and functions in detail.

As in most scripting languages, Perl's comments are line-based, beginning with a hash (#) sign and continuing to the end of line. There is no way of making a true multi-line comment other than by putting a # on each line.

Perl Variables Part I: Scalars

Most programming languages have various data types, and Perl is no exception - but like almost everything else in Perl, there's a twist. Perl's simplest and most common data type, the scalar, replaces many of the common data types found in other languages. A scalar is simply a a single item; integer, floating point number, string, or boolean value; the precise type need not be specified in advance. A nifty feature of scalars is that they automatically convert between the different types as needed. As in most scripting languages Perl's comments are line-based, beginning with a hash sign (#) and continuing to the end of the line.

$number = 4;         # $number is 4, as you would expect
$string = "Hello";   # a nice, friendly string 
$bond = "007";       # a more exciting string 
print $bond - 2;     # prints "5" -- automagic string/number conversion 
$scalar = "2" . "1"; # .(dot) is string concatenate; $scalar is "21" 
$scalar -= 15;       # $scalar is now 6
These last couple of examples may seem odd; hearkening back to the childhood riddle, "What do you get when you put 2 and 2 together?" to which the answer was "22". Perhaps the riddle was just preparation for our eventually becoming Perl programmers.

In case you haven't noticed, all scalar variables begin with a dollar sign ($). Though this may seem annoying (and perhaps ugly) at first, it turns out to be phenomenally useful because it prevents variable names from being confused with Perl keywords. More interestingly, it also allows the variables to be directly subsituted, or interpolated into strings:

print "the value of my scalar is $scalar."
yields
The value of my scalar is 6.
Even though words and numbers are represented using a single type of variable, there are some differences in how they can be used. For example, the symbols ==, !=, <, > (and others) are used to test numerical relationships (e.g., 1 + 1 == 2) while the corresponding operators eq, ne, lt, and gt play the analogous role for strings ("1 + 1" ne "2").

Perl Variables Part II: Arrays

Perl can group a number of scalars together in an array; the entire array can then be referenced as a single variable. In Perl, arrays are denoted by the "at" character (@) and perhaps bear a stronger resemblance to lists in LISP than to arrays in C. Each array can contain any number of elements, which are simply scalars. For convenience, arrays can be assigned both to and from lists (denoted by parentheses):

@array = ("1", "two", "3"); 
($first, $second, $third) = @array;
Like scalars, arrays can be printed and interpolated into strings. Note that as in our first example above, an array need not contain scalars of the same type. This is an especially useful property when interpolating one array into another, an operation which simply inserts each of the elements of an array into another array:
@newarray = (0, @array, 4); 

# @newarray now contains (0, "1", "two", 3, 4)
Individual elements of an array can be accessed by their indices, which as in C, normally start at zero (although unlike C, the starting index can be altered). Also like C and many other programming languages, square brackets are used to specify the index:
$first = $array[0]; 

# $first is the first item in the $array array: "1"
A potentially confusing aspect of array elements is that since they are themselves scalar the character that precedes the variable name and signifies its type is $, not @. This anomaly sets up the rather confusing situation in which one can have a scalar variable $array which has no relationship to the value of $array[0], a scalar that represents the first element of the array @array.

The highest idnex (the one which specifies the last element) of an array named @array is given by $#array, while the size of the array (generally one larger) is the scalar value of the array. These also work backwards; assigning a number to the highest index changes its size:

$last = $array[$#array]; # $#array is 2; $last is 3
$scalar = @array;        # $scalar is 3 (number of elements in @array) 
$#array = 1;             # @array is now ("1", "two"); 
Perl provides enormous built-in support for arrays, making them very handy data types. We've only begun to scratch the surface of all of the ways in which Perl arrays can be used; for example the language provides a number of special functions such as shift, unshift, push, pop, and splice to manipulate array contents conveniently and efficiently. More information about these can be found in the Perl reference manual (which comes with the language, type man perlfunc at the command prompt) or in the recommended books about Perl.

Perl Variables Part III: Associative Arrays

Associative arrays are like normal arrays; however, rather than being indexed by numbers, they're indexed by strings. Thus, while a standard array will limit you to looking up entry 3, with an associative array you can use "3.141519", "pi", or even

"the whole kit and caboodle"
or any other string. Programmers experienced in C may wonder what use such an array is; programmers experienced in Perl may wonder how C programmers can program at all without them. As we have seen already, associative arrays are used extensively in processing CGI information.

Like scalars and ordinary arrays, associative arrays are indicated by a special character, in this case a percent sign: (%). And, as with arrays, each of the elements of an associative array is a scalar. However, in order to differentiate elements of associative arrays from those of normal ones, curly braces, rather than square brackets, are used to enclose the index:

$myassociativearray{'index'} = "myvalue";
Associative arrays are useful for establishing a relationship between two strings. In this usage, the string that is used as an index is called the "key", and the one used as the value, the "value". In an address book, for example, the names are the keys and the addresses are the values:
$addresses{'John Smith'} = "1234 Main Street"; 
$addresses{'Mary Doe'} = "115 Central Avenue"; 
A single entry in an associative array is often called a key/value or name/value pair. Like any other string, the array indices are case sensitive; they can also contain spaces and even nonprinting characters. Thus, associative arrays can be used to mimic multidimensional arrays:

$two_d{'2,3'} = "I'm element 2,3!"; 
Note, however, that since these array indices are really strings, they are sensitive to whitespace (e.g., an index or key of '2, 3' is not the same as '2,3').

It is possible to assign associative arrays to ordinary arrays and viceversa. The ordinary array represents the name/value pairs as sequential entries:

@ary = %myassociativearray; # @ary is ("myindex", "myvalue");
When a list is assigned to an associative array, the reverse occurs:
@humps_list = ("dromedary", 1, "bactrian", 2, "camel", "1 or 2");
%humps = @humps_list;
$dromedary = $humps{"dromedary"}; #dromedary is 1 
Like other scalars, the individual elements of an array can be interpolated within double quoted strings:
print "A camel has $humps{'camel'} hump(s)\n";
Note the use of single quotes to enclose 'camel'. Double quotes are employed to delineate the whole argument to the print statement, so single quotes must be used to demarcate strings within it.

One final note: if an array like @humps_list contained thousands of entries, searching through it could take a long time. However associative array lookups in Perl are very fast because they make use of hash tables. Accordingly (and also because it's much easier to say and type), associative arrays are often referred to as hashes.

Simple Subroutines

A subroutine function is just a block, preceded by the keyword sub and a name. Functions can be placed almost anywhere in a Perl program, and the sub indicates that code should not be executed when the interpreter gets to it. Instead, it will be simply tucked away for use when needed.

Unlike some languages that have both functions (which perform some action and return a value to their caller), and procedures (which perform some action but do not return anything), Perl has only the former. By default, the value returned is simply the result of the last expression in the subroutine. Suppose the following code fragment:

#!/usr/bin/perl
print &compute;

sub compute {
  $four = 2 + 2;
}
The result returned is 4, the value of $four. As you can see the return statement is not strictly necessary. It is used to make the return value explicit or to cause a function to exit before reaching its last line.

Parameter Passing

In Perl, the parameters that are passed into a function do not have any special names. They neither need to be declared in advance, nor must they be specified as part of the function definition. Instead, the special array @_ (the underscore character) holds all the parameters that are passed into a function. Each parameter can be accessed by its zero-based index; that is, the order in which it was passed to the function. Thus the first parameter would be $_[0] and the tenth would be $_[9].

Unfortunately, variable names like $_[14] don't roll off the tongue and can become pretty confusing. A good solution is to assign @_ to a list of named scalar variables. A function which takes three parameters could use a statement like:

local ($param1, param2, param3) = @_;
or, even better:
local ($name, $rank, $serial_number) = @_;
Note the improvement in mnemonics. Note also the use of local, without which the values of the actual arguments would have been assigned to global variables, since variables in Perl are global unless declared otherwise.

For this reason you can and should define them local.

Notable Quotes

The most common quotes in Perl are the double quotes: "". They can interpolate variables. They cannot span multiple lines. They need to have double quotes inside escaped with the backslash character. Look at some examples below.

$billions = "35 dollars"; 
print "We lost $billions\n"; 
# We lost 35 dollars followed by a newline 
The single quote (') is similar except it won't interpolate. It also does very limited interpretation of backslash combinations.
$billions = "35 dollars"; 
print 'We lost $billions\n'; 
# We lost $billions\n 
A third variety is the backquote (`) which works like a double quote with an added feature: after the string is formed, its text is executed as though it were a command given to the shell. This usage can create potentially very dangerous security holes and therefore will be rarely found in CGI scripts. You've seen the backquote when you restarted the web server from the command line with
kill -HUP `cat httpd.pid`
Of course, that was not Perl, but it's related.

Substitutions

To substitute patterns of characters in a variable that holds a string we use the s/// operator. Here are a few simple examples that illustrate how this works.

We call the part that comes in between the first two slashes left-hand side and the part that comes in between the other two right-hand side. If a match is found the left-hand side is replaced by the right-hand side.

$weather = "A rainy day."; 
$weather =~ s/rainy/sunny/; # $weather is now "A sunny day". 
Substitutions occur only if the string on the left can be matched:
$weather = "A snowy day."; 
$weather =~ s/rainy/sunny/; # $weather is still snowy. 

Pattern Matching and Regular Expressions

Changing the weather in a predictable way is nice, but you can do that just by flying from England to California. What you usually want to do is to match and substitute more general strings, without knowing in advance what they are. This is done using regular expressions and here's a very simple example.

In Perl regular expressions a dot (.) or period is the most unrestricted character; it will match any single character. The percent sign is not a special character so it will match itself, so we have:

$weather = "Probability of precipitation is 80%."; 
$weather =~ s/..%/5%/; # Only a 5% chance now.
We will have more on pattern matching later.

HTML Forms

To display: Use: Attributes:
A form <form>
... HTML form info
</form>
method
action
enctype
Single-line text field
<input type=text>
name
value
maxlength
size
Single-line password field
<input type=password>
name
value
maxlength
size
Multiple-line text area
<textarea></textarea>
name
cols
rows
wrap
Checkbox
<input type=checkbox>
name
value
checked
Radio buttons
<input type=radio>
name
value
checked
List of choices <select>
items in list...
</select>
name
multiple
size
Items in a <select> list <option>
value
selected
Clickable image
<input type=image>
name
align
src
File upload
<input type=file>
name
accept
Hidden field
<input type=hidden>
name
value
Reset button
<input type=reset>
value
Submit button
<input type=submit>
name
value

Your textbook contains a similar table/quick guide at the end and so does the Javascript chapter. Check the posted lab notes as well, as they contain important lab exercises.

These lecture notes build on a pretty book written by Steven Brenner and Edwin Aoki, published in 1996, and that is now out of print (but you can find it in the library). Steve Brenner is the author of cgi-lib.pl library, the standard library used with Perl4 (just as Lincoln Stein's CGI.pm, described in your text, is regarded as the de facto standard library for CGI processing with Perl5).


Last updated: September 20, 1999 by Adrian German