On this page:
1 Processing prefix trees
2 Building prefix trees
3 Challenge
8.5

Problem set 10: Prefix trees

This assignment is due on Wednesday, April 6 at 11:59pm. Submit it using Handin as assignment ps10. Corrections can be presented until Friday, April 29.

This problem set builds on the skills that we introduced in Lectures 2−9, 11−19 and 21. To encourage you to review those lectures, your grade on this assignment will be limited by your grade on those lectures at the time you submit (or correct) this assignment. Remember that most lecture exercises can be resubmitted at any time.

Live by the design recipe

Remember to follow the design recipe for each function, even if defined locally. In particular, every type mentioned in a signature must be introduced by a data definition, except for these well-known types: Number, Image, String, Color, Boolean, KeyEvent, MouseEvent, Anything, Posn, NaturalNumber, ListOfNumbers, 1String, [ListOf X], [NEListOf X].

Use the “Intermediate Student with lambda” language on this and further assignments in this course.

Generative recursion is not in this problem set.

1 Processing prefix trees

When data is organized in an unstructured list and we want to search for an element, our last resort is to inspect every item until we find what we’re looking for. This search can be slow if there are many items. For example, count-word in Problem set 7: Lists is slow, so counting the words in Hamlet took a long time.

Prefix trees are one way to organize data to help search for an element. An example of a prefix tree is all of the entries in a printed dictionary whose headwords start with the same letter. When you are looking for the definition of a word in such a dictionary, it suffices to go letter by letter in alphabetical order.

; A PrefixForest is a [NEListOf PrefixTree]
 
; A PrefixTree is one of:
; - (make-end)
; - (make-initial 1String PrefixForest)
 
(define-struct end [])
(define-struct initial [letter forest])
 
; A 1String a String of length 1

An example of a PrefixForest is:
(define forest1
  (list
   (make-initial "n"
                 (list
                  (make-end)
                  (make-initial "e" (list (make-end)))))
   (make-initial "f"
                 (list
                  (make-end)
                  (make-initial "f" (list (make-end)))
                  (make-initial "t" (list (make-end)))))
   (make-initial "r"
                 (list
                  (make-initial "c" (list (make-end)))
                  (make-initial "e" (list (make-end)))))))
This forest stores the strings "n", "ne", "f", "ff", "ft", "rc", and "re". It does not store any other string, such as "r" or the empty string "".

An example of a PrefixTree is:
(define tree1 (make-initial "o" forest1))
This tree is displayed visually as:

This tree stores the strings "on", "one", "of", "off", "oft", "orc", and "ore". It does not store any other string, such as the string "o", the string "f", the string "ofr", the string "or", or the empty string "".

Take some time to familiarize yourself with this technical notion of what strings are stored by a given forest or tree. Use your finger to trace paths in the picture above, from the root “o” at the top to each leaf “Ø” at the bottom. There is no itinerary that gives the string "ofr" or the string "or". Moreover, a string stored by a sub-forest or sub-tree is not necessarily stored by the overall forest or tree! For example, the strings "n" and "ne" are stored by forest1 and (first forest1) but not by tree1. And even though every forest or tree contains (make-end), the empty string "" is stored by (make-end) but not by every forest or tree.

A PrefixForest should never contain multiple initials with the same letter. You can make this assumption when your code receives a PrefixForest in its input, and you must maintain this guarantee when your code produces a PrefixForest in its output. For example, in tree1 above, the letters "n" "f" "r" near the top have to be all different, but it is ok that there is one "f" above another "f", and it is ok that two "e"s are cousins of each other.

Exercise 1. Define two different examples of PrefixTrees (called tree2 and tree3) and two different examples of PrefixForests (called forest2 and forest3).

Hint: A tree is not a forest, so don’t use a tree as the initial-forest of another tree. A forest is a non-empty list of trees, not of forests, so don’t put one list directly inside another list. Every time you write a tree or forest in this assignment, it helps to draw a picture of it, carefully following the pattern in our picture above. Then ask: what strings are stored?

Exercise 2. Write the templates process-tree and process-forest for functions that process a PrefixTree and a PrefixForest. Make sure you get these templates right; feel free to check with any instructor.

Hint: If you need a refresher about writing processing templates, review the video above Lecture 12: Unlimited data Exercise 3, and/or the video above Lecture 11: More points Exercise 1. Also, the template for processing a [NEListOf PrefixTree] is similar to (but not quite the same as) the template for processing a [NEListOf String] that you wrote and followed in Problem set 8: Abstraction.

Advice for all exercises below: Follow the templates. → When writing or debugging a case in a function definition, refer to examples for that case. → Whenever you make a function call (especially a recursive function call provided by a template), trust that it returns what it should, but write down this trust as a smaller check-expect. → Repeatedly doing this guarantees that you’ll find the cause of every bug! In short, use the table method.

Exercise 3. Design functions tree-size and forest-size which report how many strings are stored in a given PrefixTree or PrefixForest. This count is the number of ends (not initials) in the input. For example, tree1 above stores 7 strings.

Hint: Write plenty of examples with all kinds of inputs, including forest1 and tree1 above. Then, follow the template for processing the input.

Exercise 4. Design functions tree->list and forest->list which take a PrefixTree or PrefixForest and returns a [NEListOf String] containing all of the strings stored in the input tree or forest. An end stores just the empty string "", whereas an initial stores strings that begin with its letter.

Hint: Write plenty of examples with all kinds of inputs, including forest1 and tree1 above. Then, follow the template for processing the input!

2 Building prefix trees

We write Word to mean a list of 1Strings.
; a Word is a [ListOf 1String]
Feel free to use explode in your tests to easily convert a string to a Word.

Exercise 5. Design a function word->tree which takes a Word and returns a PrefixTree which stores just that word. Be sure the result matches the data definition of PrefixTree. For instance, below is an incorrect test:
(check-expect (word->tree (explode "hi"))
              ; The following is wrong. It's not a PrefixTree!
              (make-initial "h" (make-initial "i" (make-end))))
Hint: If your tree->list function is correct, then the purpose of word->tree is to make sure (tree->list (word->tree (explode "hi"))) yields (list "hi").

Exercise 6. Design functions word-in-tree? and word-in-forest? which determine whether or not a given Word is stored in a given PrefixTree or PrefixForest.

Note: In this exercise and in the rest of this problem set, do not use tree->list or forest->list from above (except possibly for testing). Instead, follow the template for processing the input.

Hint: If your tree->list and forest->list functions are correct, then the purpose of word-in-tree? and word-in-forest? is to determine whether or not a given Word appears (as a string) in the tree->list of the given tree or in the forest->list of the given forest.

Exercise 7. Design functions add-to-tree and add-to-forest.

add-to-forest takes a Word and a PrefixForest and adds the word to the forest, returning a new forest. Again, a PrefixForest should never contain multiple initials with the same letter.

add-to-tree takes a Word and a PrefixTree and adds the word to the tree, returning a new tree. The given word must be non-empty, and the given tree must be an initial whose letter matches the first letter of the given word. Thanks to these restrictions on what calls to add-to-tree are allowed, the definition of add-to-tree can be short and contain no cond.

Hint: If your tree->list and forest->list functions are correct, then the purpose of add-to-tree and add-to-forest is to produce a tree or forest whose tree->list or forest->list has one more string than the tree->list or forest->list of the input tree or forest.

Exercise 8. Design a function list->forest which takes a [NEListOf String] and returns a PrefixForest that stores the given words (given as strings this time).

Hint: follow the template for processing [NEListOf String] and use explode, word->tree, and add-to-forest.

3 Challenge

Exercise 9. Design a function alphabetize which sorts all of the initials in a PrefixForest in alphabetical order according to their letter and which puts all of the ends first. (In other words, for the comparison function, a tree which is an end is less than any initial.) alphabetize should also sort the forest associated to any initial in the input forest. Feel free to use sort and string comparison functions such as string<?.

Exercise 10. What happens to a [NEListOf String] if you feed it to list->forest, then alphabetize, then forest->list?