CS
1501 Programming Assignment 1
Anagrams: So Dark The Con of Man
Overview: Anagrams played a central role in the book The Da Vinci Code. The
most famous was probably So Dark The Con
of Man as an anagram of Madonna of the Rocks.
Note that for our purposes, an anagram means
that the two phrases have exactly the same number of occurrences for
each letter. We do not care about the number of spaces. We will also ignore
punctuation. You are going to write two programs, of increasing sophistication.
Each program will read a line from standard input, and write the valid anagrams
to standard output. An anagram is valid if each word in the anagram can be
found in the official dictionary for this assignment 5desk.txt.
Each valid anagram should only be output once. The valid anagrams should be
output in alphabetical order. To help you get started, Dictionary.java
is code to read the dictionary into an array of strings, and SortAndPrint.java is code to sort and print out an
array of strings in alphabetical order.
Program 1: The program should read the dictionary into an array of strings, and then read in the input string from standard input. Your array should be unordered, that is, the order in the array should be the same as the order in the file. The program should then generate possible anagrams using a variation of the permutation generation code in chapter 44 of the Sedgewick text. The program must cut off the search when the current candidate word is not a prefix of any valid word in your dictionary. So you will need to implement two search methods for your dictionary: one to determine if a string is a word in the dictionary, and one to determine if a string is a prefix of a word in the dictionary. You should implement these search methods using linear search.
Program 2: This program should implement the two dictionary search methods using a Patricia tree. Patricia trees are discussed in section 17 of the Sedgewick text. You use the code in the text as a starting point, or Kirk’s pseudo-code, or start from scratch on your own. You may not copy Patricia code from any other source.
Test Inputs: Run your input on the following Da Vinci Paintings:
· The Virgin and Child with Saint Anne and Saint John the Baptist
We may also test your code on an input we generate.
Experimentation: The following table gives the frequency of letter occurrences English words.
|
# of Occurrences |
|
e |
42689 |
11.74% |
i |
31450 |
8.65% |
s |
29639 |
8.15% |
a |
28965 |
7.97% |
r |
27045 |
7.44% |
n |
26975 |
7.42% |
t |
24599 |
6.76% |
o |
21588 |
5.94% |
l |
19471 |
5.35% |
c |
15002 |
4.13% |
d |
13849 |
3.81% |
u |
11715 |
3.22% |
g |
10339 |
2.84% |
p |
10063 |
2.77% |
m |
9803 |
2.70% |
h |
7808 |
2.15% |
b |
7368 |
2.03% |
y |
6005 |
1.65% |
f |
4926 |
1.35% |
v |
3971 |
1.09% |
k |
3209 |
0.88% |
w |
3073 |
0.85% |
z |
1631 |
0.45% |
x |
1053 |
0.29% |
j |
727 |
0.20% |
q |
682 |
0.19% |
MakeString.java is a program to generate strings of letters, of a specified size k, such that each letter is generated independently with the above distribution Hypothesize what you would expected the running time (ignoring multiplicative constants) to be as a function of k if input string is generated by the supplied program. Test your hypothesis by running your programs on many strings. For each your three programs, fill in the following table. If a program will take too long for a particular input size, then provide an estimate.
Time in Seconds/Minutes/Hours/Days/Years as appropriate |
||
|
Program 1 |
Program 2 |
K=4 |
|
|
K=8 |
|
|
K=16 |
|
|
K=32 |
|
|
K=64 |
|
|
K=128 |
|
|
Write-up: Write a report describing
This write-up can be rather brief and informal for non-W students, say on the order of 1 or 2 pages. W-students should provide more extensive and polished reports, say more like 4-6 pages.
Examples write-ups from the Boggle Assignment from last Spring can be found here, here, and here.
What your should turn in: Your program source, your executable, your report, the output of your program on the sample input, and your assignment information sheet.