CS 1501 Programming Assignment 1

Anagrams: So Dark The Con of Man

 


Overview: Anagrams played a central role in the book The Da Vinci Code. The most famous was probably So Dark The Con of Man as an anagram of Madonna of the Rocks. Note that for our purposes, an anagram means  that the two phrases have exactly the same number of occurrences for each letter. We do not care about the number of spaces. We will also ignore punctuation. You are going to write two programs, of increasing sophistication. Each program will read a line from standard input, and write the valid anagrams to standard output. An anagram is valid if each word in the anagram can be found in the official dictionary for this assignment 5desk.txt. Each valid anagram should only be output once. The valid anagrams should be output in alphabetical order. To help you get started, Dictionary.java is code to read the dictionary into an array of strings, and SortAndPrint.java is code to sort and print out an array of strings in alphabetical order.

 

Program 1: The program should read the dictionary into an array of strings, and then read in the input string from standard input. Your array should be unordered, that is, the order in the array should be the same as the order in the file. The program should then generate possible anagrams using a variation of the permutation generation code in chapter 44 of the Sedgewick text. The program must cut off the search when the current candidate word is not a prefix of any valid word in your dictionary. So you will need to implement two search methods for your dictionary: one to determine if a string is a word in the dictionary, and one to determine if a string is a prefix of a word in the dictionary. You should implement these search methods using linear search.

Program 2: This program should implement the two dictionary search methods using a Patricia tree. Patricia trees are discussed in section 17 of the Sedgewick text. You use the code in the text as a starting point, or Kirk’s pseudo-code, or start from scratch on your own. You may not copy Patricia code from any other source. 

 

Test Inputs: Run your input on the following Da Vinci Paintings:

·         Mona Lisa

·         Leda and the Swan

·         The Virgin and Child with Saint Anne and Saint John the Baptist

We may also test your code on an input we generate.

 

Experimentation: The following table gives the frequency of letter occurrences English words.

 

    

# of Occurrences

e

42689

11.74%

i

31450

8.65%

s

29639

8.15%

a

28965

7.97%

r

27045

7.44%

n

26975

7.42%

t

24599

6.76%

o

21588

5.94%

l

19471

5.35%

c

15002

4.13%

d

13849

3.81%

u

11715

3.22%

g

10339

2.84%

p

10063

2.77%

m

9803

2.70%

h

7808

2.15%

b

7368

2.03%

y

6005

1.65%

f

4926

1.35%

v

3971

1.09%

k

3209

0.88%

w

3073

0.85%

z

1631

0.45%

x

1053

0.29%

j

727

0.20%

q

682

0.19%

MakeString.java  is a program to generate strings of letters, of a specified size k, such that each letter is generated independently with the above distribution  Hypothesize what you would expected the running time (ignoring multiplicative constants) to be as a function of k if input string is generated by the supplied program. Test your hypothesis by running your programs on many strings. For each your three programs, fill in the following table. If a program will take too long for a particular input size, then provide an estimate.

 

Time in Seconds/Minutes/Hours/Days/Years as appropriate

 

Program 1

 Program 2

K=4

 

 

K=8

 

 

K=16

 

 

K=32

 

 

K=64

 

 

K=128

 

 

 

 

Write-up: Write a report describing

 

This write-up can be rather brief and informal for non-W students, say on the order of 1 or 2 pages. W-students should provide more extensive and polished reports, say more like 4-6 pages.

 

Examples write-ups from the Boggle Assignment from last Spring can be found here, here, and here.

 

What your should turn in: Your program source, your executable, your report, the output of your program on the sample input, and your assignment information sheet.