1. The Homework

The prototype for the homework is here.

It is probably useful to develop your homework in stages. The same can be said of any project, regardless of its size. Let's split it in stages.

1.1 Write a script that, when called with GET, produces the form displayed by the prototype. You can save the form in a file and then look at it with an editor or you can view the source for the HTML online. Make the script reply to POST by writing back a message that distinguishes between an invocation to summarize the log by dates or by hosts (depending on what radio button the user has pressed).

1.2 Write a program that opens the access_log file and extracts (identifies and prints out, each on a new line) only the host information or the time information based on a switch that you specify on the command line (one of -host or -day). Use two functions for this, which are called from the main program depending on the value of the switch. This way you develop and test two functions to be used later and you need to wrap them in a main program with command line switches for testing purposes only.

1.3 Augment the two functions such that the host or date occurrences are counted. Use a hash (associative array) for that, the strings that your program was listing at 1.2 will now be used to index the bins in the hash. At the end write out the contents of the corresponding associative array (either the one for hosts or the one for dates).

1.4 Make the program list the frequencies listed in the hashes in a certain order (ascending or descending). Now you're ready to use these functions in your script developed at 1.1 (simply call one or the other based on the radio button that the user has pressed).

At this point you're done. Let me know the URL or index it visibly on one of your pages on your server and send me a message so that I can look at it.

Hints for each of the stages are located at the end of the file.

2. Pattern Matching Exercises

2.1 Start from the program that reads from a file and prints its lines one by one.
#!/usr/bin/perl
open (AB, $ARGV[0]); 
while ($line = <AB>) {
  print $line; 
} 
close(AB); 
Change the program to print only non-empty lines.
/^\s*$/
# ignore lines with this pattern 
Identify the words that start with a and put *( )* around them.
s/(a\w*)/*($1)*/g
# parens group things on the left side 
#    and place them in $1, $2, ... 
# parens are regular chars on the right 
Replace all strings of numeric characters with <NUMBER>
s/\d+/<NUMBER>/g; 
Have the program change all characters to uppercase.
while ($line =~ /([a-z])/) {
  $lcletter = $1; 
  $ucletter = uc $lcletter; 
  $line =~ s/$lcletter/$ucletter/g; 
} 
Change only sequences longer than 3 characters to uppercase.
while ($line =~ /([a-z]{3,})/) {
  $lcgroup = $1; 
  # now uppercase lcgroup and 
  # make the substitution as before 
} 
3. Associative Arrays and DBM files Exercises

1. Start from this program
#!/usr/bin/perl
%myHash = (
  'llama' => 'South America',
  'camel' => 'Middle East',
  'bobac' => 'Central Asia'
          ); 
foreach $key (keys %myHash) {
  print $key, "can be found in ", $myHash{$key}, "\n"; 
} 
Every time this program runs myHash is initialized and then listed.

If we want to add a new animal/habitat pair to our knowledge base we need to change the program source code.

DBM files are ways in which we can keep associative arrays on disk (more permanent storage). To associate a DBM database with a DBM array (the hash associated with it) we use the dbmopen function:

dbmopen(%ARRAYNAME, "dbmfilename", $mode)
For example:
#!/usr/bin/perl
dbmopen(%myHash, "habitats", 0644); # note: 0644 means rw-r--r--  
                                    # note the leading 0 in 0644
if ($ARGV[0] =~ /^add$/i) { # no ambiguity
  $myHash{$ARGV[1]} = $ARGV[2]; 
} else {
  print "Unsupported operation $ARGV[0]\n"; 
} 
dbmclose(%myHash); 
associates %myHash with habitats.* on your disk (which are created if they don't exist already) with a umask of 644 (that gives read/write access to the owner and read access to group and world users).

Look for habitats.dir and habitats.pag after you run this program.

Extend this program to accept such commands as:

Assignment statements, delete, hash indexing and foreach will be the ways in which you can implement them respectively.

You can later extend your program to have the database passed as a parameter as well.

4. Homework Hints

Hints for 1.1

#!/usr/bin/perl
use CGI;
$query = new CGI;
print $query->header, $query->start_html; 
if ($query->request_method() eq 'GET') {
  &show_form;       # shows the form 
} elsif ($query->request_method() eq 'POST') {
  &process_request; # makes distinction between 
                    # summaries by host or by date 
} 
print $query->end_html; 

sub show_form {
  # definition of show_form
}

sub process_request {
  # definition of process_request 
} 

Hints for 1.2
Look for patterns such as these:
/^[\S]+/ # for hosts 
# that's: beginning of line (^) followed by non-space (\S) chars 

/\[\d+\/\w+\/1998/ for dates 
# day (digits) slash (protected with \) month (word chars) slash 1998 
Note: as pointed out the brackets in /^[\S]+/ are redundant so /^\S+/ should also work.

You notice that we disambiguate significantly by assuming that our script will only summarize accesses that have been recorded in 1998.

Use parens to extract the strings of interest for later use in your associative arrays:

/^([\S]+)/ # for hosts 

/\[(\d+\/\w+)\/1998/ for dates 

Hints for 1.3

It basically works like this: you're looking for strings that should match a specified pattern. Once you find a string that matches it, you have the string that matches in $1, because you use parens in the left-hand side of the substitution operator.

You store this in a variable

$x = $1; 
and use that to increment with one the value that is associated in an associative array with that key
$hash{$x} += 1;

You can use one or two associative arrays to count the number of hits by date and by host, it's up to you.

Hints for 1.4
Use this

foreach $key (sort { $hash{$a} <=> $hash{$b} } (keys %hash)) {
  # print the entry 
} 
or
foreach $key (sort my_routine (keys %hash)) {
  # print the entry 
} 

# ... and define my_routine as a sub

sub my_routine {
  return $hash{$a} <=> $hash{$b}; 
} 

Remember that you can develop this in a group or team but each member of the team needs to have the assignment installed on her/his server.