CSCI A348/548
Lab Notes Three

Spring 2001 (Second semester 2000-2001)


Review of Perl. More Perl ( I/O ). HTML with forms. Parsing CGI data.

Review of Perl

We now review Perl with our future tasks in mind.

Normally, people consider Perl to be an interpreted language because program execution basically starts at the top and continues line-by-line. But when a Perl program is run, it is actually first parsed and compiled, and only then is it executed. This approach provides some of the efficiency of compilation while permitting the convenience and flexibility of interpreted languages.

Perl's power derives from a combination of the best properties of many different languages. For example, as with most interpreted scripting languages, commands in a Perl program need not be included within a function. Each line in a script is run from top to bottom; the first line of the script ordinarily will be the first to be executed. Contrast this with a C program, in which all commands (or statements) must reside within functions, and program execution always begins with the main function. Like C, however, Perl is a free-form language. You can generally put as many statements as you like on a single line and put line breaks wherever you want. To tell where each statement ends and another begins, each statement must be terminated with a semicolon.

This flexibility carries over to Perl functions and variables as well. Perl variables come in many different flavors, but all of them are case-sensitive, don't need to be declared in advance of use, and are global by default. That is, unless you explicitely indicate otherwise, each variable will be shared across all the functions in a script. And speaking of functions, you don't need to declare them in advance either. Furthermore, many functions in Perl do not require you to enclose parameters in parentheses - a necessity for many programming languages. In what follows we review variables and functions in detail.

As in most scripting languages, Perl's comments are line-based, beginning with a hash (#) sign and continuing to the end of line. There is no way of making a true multi-line comment other than by putting a # on each line.

Perl Variables Part I: Scalars

Most programming languages have various data types, and Perl is no exception - but like almost everything else in Perl, there's a twist. Perl's simplest and most common data type, the scalar, replaces many of the common data types found in other languages. A scalar is simply a a single item; integer, floating point number, string, or boolean value; the precise type need not be specified in advance. A nifty feature of scalars is that they automatically convert between the different types as needed. As in most scripting languages Perl's comments are line-based, beginning with a hash sign (#) and continuing to the end of the line.

$number = 4;         # $number is 4, as you would expect
$string = "Hello";   # a nice, friendly string 
$bond = "007";       # a more exciting string 
print $bond - 2;     # prints "5" -- automagic string/number conversion 
$scalar = "2" . "1"; # .(dot) is string concatenate; $scalar is "21" 
$scalar -= 15;       # $scalar is now 6
These last couple of examples may seem odd; hearkening back to the childhood riddle, "What do you get when you put 2 and 2 together?" to which the answer was "22". Perhaps the riddle was just preparation for our eventually becoming Perl programmers.

In case you haven't noticed, all scalar variables begin with a dollar sign ($). Though this may seem annoying (and perhaps ugly) at first, it turns out to be phenomenally useful because it prevents variable names from being confused with Perl keywords. More interestingly, it also allows the variables to be directly subsituted, or interpolated into strings:

print "the value of my scalar is $scalar."
yields
The value of my scalar is 6.
Even though words and numbers are represented using a single type of variable, there are some differences in how they can be used. For example, the symbols ==, !=, <, > (and others) are used to test numerical relationships (e.g., 1 + 1 == 2) while the corresponding operators eq, ne, lt, and gt play the analogous role for strings ("1 + 1" ne "2").

Perl Variables Part II: Arrays

Perl can group a number of scalars together in an array; the entire array can then be referenced as a single variable. In Perl, arrays are denoted by the "at" character (@) and perhaps bear a stronger resemblance to lists in LISP than to arrays in C. Each array can contain any number of elements, which are simply scalars. For convenience, arrays can be assigned both to and from lists (denoted by parentheses):

@array = ("1", "two", "3"); 
($first, $second, $third) = @array;
Like scalars, arrays can be printed and interpolated into strings. Note that as in our first example above, an array need not contain scalars of the same type. This is an especially useful property when interpolating one array into another, an operation which simply inserts each of the elements of an array into another array:
@newarray = (0, @array, 4); 

# @newarray now contains (0, "1", "two", 3, 4)
Individual elements of an array can be accessed by their indices, which as in C, normally start at zero (although unlike C, the starting index can be altered). Also like C and many other programming languages, square brackets are used to specify the index:
$first = $array[0]; 

# $first is the first item in the $array array: "1"
A potentially confusing aspect of array elements is that since they are themselves scalar the character that precedes the variable name and signifies its type is $, not @. This anomaly sets up the rather confusing situation in which one can have a scalar variable $array which has no relationship to the value of $array[0], a scalar that represents the first element of the array @array.

The highest idnex (the one which specifies the last element) of an array named @array is given by $#array, while the size of the array (generally one larger) is the scalar value of the array. These also work backwards; assigning a number to the highest index changes its size:

$last = $array[$#array]; # $#array is 2; $last is 3
$scalar = @array;        # $scalar is 3 (number of elements in @array) 
$#array = 1;             # @array is now ("1", "two"); 
Perl provides enormous built-in support for arrays, making them very handy data types. We've only begun to scratch the surface of all of the ways in which Perl arrays can be used; for example the language provides a number of special functions such as shift, unshift, push, pop, and splice to manipulate array contents conveniently and efficiently. More information about these can be found in the Perl reference manual (which comes with the language, type man perlfunc at the command prompt) or in the recommended books about Perl.

Perl Variables Part III: Associative Arrays

Associative arrays are like normal arrays; however, rather than being indexed by numbers, they're indexed by strings. Thus, while a standard array will limit you to looking up entry 3, with an associative array you can use "3.141519", "pi", or even

"the whole kit and caboodle"
or any other string. Programmers experienced in C may wonder what use such an array is; programmers experienced in Perl may wonder how C programmers can program at all without them. As we have seen already, associative arrays are used extensively in processing CGI information.

Like scalars and ordinary arrays, associative arrays are indicated by a special character, in this case a percent sign: (%). And, as with arrays, each of the elements of an associative array is a scalar. However, in order to differentiate elements of associative arrays from those of normal ones, curly braces, rather than square brackets, are used to enclose the index:

$myassociativearray{'index'} = "myvalue";
Associative arrays are useful for establishing a relationship between two strings. In this usage, the string that is used as an index is called the "key", and the one used as the value, the "value". In an address book, for example, the names are the keys and the addresses are the values:
$addresses{'John Smith'} = "1234 Main Street"; 
$addresses{'Mary Doe'} = "115 Central Avenue"; 
A single entry in an associative array is often called a key/value or name/value pair. Like any other string, the array indices are case sensitive; they can also contain spaces and even nonprinting characters. Thus, associative arrays can be used to mimic multidimensional arrays:

$two_d{'2,3'} = "I'm element 2,3!"; 
Note, however, that since these array indices are really strings, they are sensitive to whitespace (e.g., an index or key of '2, 3' is not the same as '2,3').

It is possible to assign associative arrays to ordinary arrays and viceversa. The ordinary array represents the name/value pairs as sequential entries:

@ary = %myassociativearray; # @ary is ("myindex", "myvalue");
When a list is assigned to an associative array, the reverse occurs:
@humps_list = ("dromedary", 1, "bactrian", 2, "camel", "1 or 2");
%humps = @humps_list;
$dromedary = $humps{"dromedary"}; #dromedary is 1 
Like other scalars, the individual elements of an array can be interpolated within double quoted strings:
print "A camel has $humps{'camel'} hump(s)\n";
Note the use of single quotes to enclose 'camel'. Double quotes are employed to delineate the whole argument to the print statement, so single quotes must be used to demarcate strings within it.

One final note: if an array like @humps_list contained thousands of entries, searching through it could take a long time. However associative array lookups in Perl are very fast because they make use of hash tables. Accordingly (and also because it's much easier to say and type), associative arrays are often referred to as hashes.

Simple Subroutines

A subroutine function is just a block, preceded by the keyword sub and a name. Functions can be placed almost anywhere in a Perl program, and the sub indicates that code should not be executed when the interpreter gets to it. Instead, it will be simply tucked away for use when needed.

Unlike some languages that have both functions (which perform some action and return a value to their caller), and procedures (which perform some action but do not return anything), Perl has only the former. By default, the value returned is simply the result of the last expression in the subroutine. Suppose the following code fragment:

#!/usr/bin/perl
print &compute;

sub compute {
  $four = 2 + 2;
}
The result returned is 4, the value of $four. As you can see the return statement is not strictly necessary. It is used to make the return value explicit or to cause a function to exit before reaching its last line.

Parameter Passing

In Perl, the parameters that are passed into a function do not have any special names. They neither need to be declared in advance, nor must they be specified as part of the function definition. Instead, the special array @_ (the underscore character) holds all the parameters that are passed into a function. Each parameter can be accessed by its zero-based index; that is, the order in which it was passed to the function. Thus the first parameter would be $_[0] and the tenth would be $_[9].

Unfortunately, variable names like $_[14] don't roll off the tongue and can become pretty confusing. A good solution is to assign @_ to a list of named scalar variables. A function which takes three parameters could use a statement like:

local ($param1, param2, param3) = @_;
or, even better:
local ($name, $rank, $serial_number) = @_;
Note the improvement in mnemonics. Note also the use of local, without which the values of the actual arguments would have been assigned to global variables, since variables in Perl are global unless declared otherwise.

For this reason you can and should define them local.

Notable Quotes

The most common quotes in Perl are the double quotes: "". They can interpolate variables. They cannot span multiple lines. They need to have double quotes inside escaped with the backslash character. Look at some examples below.

$billions = "35 dollars"; 
print "We lost $billions\n"; 
# We lost 35 dollars followed by a newline 
The single quote (') is similar except it won't interpolate. It also does very limited interpretation of backslash combinations.
$billions = "35 dollars"; 
print 'We lost $billions\n'; 
# We lost $billions\n 
A third variety is the backquote (`) which works like a double quote with an added feature: after the string is formed, its text is executed as though it were a command given to the shell. This usage can create potentially very dangerous security holes and therefore will be rarely found in CGI scripts. You've seen the backquote when you restarted the web server from the command line with
kill -HUP `cat httpd.pid`
Of course, that was not Perl, but it's related.

Substitutions

To substitute patterns of characters in a variable that holds a string we use the s/// operator. Here are a few simple examples that illustrate how this works.

We call the part that comes in between the first two slashes left-hand side and the part that comes in between the other two right-hand side. If a match is found the left-hand side is replaced by the right-hand side.

$weather = "A rainy day."; 
$weather =~ s/rainy/sunny/; # $weather is now "A sunny day". 
Substitutions occur only if the string on the left can be matched:
$weather = "A snowy day."; 
$weather =~ s/rainy/sunny/; # $weather is still snowy. 

Pattern Matching and Regular Expressions

Changing the weather in a predictable way is nice, but you can do that just by flying from England to California. What you usually want to do is to match and substitute more general strings, without knowing in advance what they are. This is done using regular expressions and here's a very simple example.

In Perl regular expressions a dot (.) or period is the most unrestricted character; it will match any single character. The percent sign is not a special character so it will match itself, so we have:

$weather = "Probability of precipitation is 80%."; 
$weather =~ s/..%/5%/; # Only a 5% chance now.
We will have more on pattern matching later.

HTML Forms

To display: Use: Attributes:
A form <form>
... HTML form info
</form>
method
action
enctype
Single-line text field
<input type=text>
name
value
maxlength
size
Single-line password field
<input type=password>
name
value
maxlength
size
Multiple-line text area
<textarea></textarea>
name
cols
rows
wrap
Checkbox
<input type=checkbox>
name
value
checked
Radio buttons
<input type=radio>
name
value
checked
List of choices <select>
items in list...
</select>
name
multiple
size
Items in a <select> list <option>
value
selected
Clickable image
<input type=image>
name
align
src
File upload
<input type=file>
name
accept
Hidden field
<input type=hidden>
name
value
Reset button
<input type=reset>
value
Submit button
<input type=submit>
name
value

Part II: More Perl

One thing that we will discuss this week will be pattern matching in Perl. In what follows the context is: summarizing information from files. We will start with filehandles, and describe pattern matching and regular expressions in the context of locating and extracting information that is read from the files.

1. Filehandles

A filehandle is just a name you give to a file, device, socket or pipe to help you remember which one you're talking about (also to hide the complexities of buffering and such). Internally, filehandles are similar to streams in C++ or Java.

You create a filehandle and attach it to a file by using the open function.

It takes two parameters: the filehandle and the filename.

Perl gives you some predefined (and preopened) filehandles:

These filehandles are typically attached to your terminal but they may also be attached to other files or pipes. You can use the open function to create filehandles for various purposes (input, output, pipe-ing) so you need to specify what behaviour you want:
open (AB, "filename");                 # read from file
open (AB, "<filename");                # same, explicitly
          >filename");                 # create and write file 
          >>filename");                # append to file create if needed
         "| output_pipe_command");     # set up an output filter 
         "input_pipe_command |");      # set up an input filter
The name you pick for the filehandle is arbitrary. Once opened, the filehandle can be used to access the file or pipe until explicitly closed, which you can do with close.

Once a filehandle is open for reading you can read lines from it just as you can read from standard input with STDIN.

So, for example, to read lines from a file specified in the command line:

open (AB, $ARGV[0]); 
while ($x = <AB>) {
  print $x; 
} 
close(AB); 
The fragment above just lists the lines in the specified file and is therefore, for all practical purposes, equivalent to the Unix cat command. Note that the newly opened filehandle is used inside the angle brackets just as we have used STDIN previously.

Also, note that to make the program completely equivalent to cat we'd have to process all arguments passed to it on the command line, like this:

foreach $argv (@ARGV) {
  open (AB, $argv); 
  while ($x = <AB>) {
    print $x; 
  } 
  close(AB); 
}
If you have a filehandle open for writing or appending, and if you want to print to it, you must place the filehandle immediately after the print keyword and before the other arguments. No comma should occur between the filehandle and the rest of the arguments. (I personally never remember this so this is the first compile error I have to fix).

When you read from a filehandle you can specify either a scalar context (read one line which is then stored into the scalar variable that appears on the left)

$x = <AB>;
or a list context:
@x = <AB>; 
which reads all the lines from AB and places them in
$x[0], $x[1],... $x[$#x]. 

2. Exact pattern matching

The =~ operator is used for pattern matching.

The pattern itself is specified between leaning toothpicks, or slashes.

$x =~ /foo/; 
is a statement that checks whether the string $x contains the pattern foo in it. This statement returns a boolean value (0 or 1) so it can be used as a condition in an if statement.

$x =~ /foo/i 
does the same thing but ignores case.

So this is how we locate patterns.

If we locate them we could also replace them, and we do that with the s operator.

For example,

$x =~ s/foo/bar/; 
replaces the first occurence of foo with bar in $x.

$x =~ s/foo/bar/g; 
performs a global replacement of all occurrences of foo with bar in $x (if any exists).

3. Regular expressions

This will have to wait until next lab.


Last updated on Jan 25, 2001, by Adrian German for A348/A548