CS 1501

Algorithm Implementations

Programming Project 1 Part B

(Prior to reading this, be sure to thoroughly read and understand Part A)

 

Online: Friday, May 26, 2006

Due (BOTH Parts A and B): All assignment materials: 1) All source files of program, 2) All .class files (or a .jar file containing them)  3) Well written/formatted paper explaining your search algorithm and results (see Part B for details on the paper) and 4) Assignment Information Sheet on the appropriate directory of the submission site by 11:59 PM on Wednesday, June 7, 2006.  Note: Do NOT submit the dictionary file or the input files.

Late Due Date: 11:59PM on Friday, June 9, 2006.

 

Background:

In Part A of this assignment, you completed a recursive, backtracking solution to filling in the squares of a crossword puzzle (see Part A for details).  To do this you utilized the DictInterface interface and the MyDictionary class, both of which were provided for you.  Unfortunately, the MyDictionary class implements the DictInterface in a somewhat primitive and inefficient way, utilizing a linear search of the dictionary array.  This can lead to long run-times when many searches are required (as in the crossword problem).

 

Consider de la Briandais (DLB) trees, which we discussed in lecture.  Since these trees allow a string to be tested as a word or as a prefix in the dictionary in (typically) time proportional to the length of the string, they appear to be a superior way to implement the DictInterface interface.  In Part B of this assignment you will do the following:

 

1)      Implement a DLB as a class, as discussed in lecture and shown in the online notes.  This requires the "nodelets" within the DLB to contain a character field, a child reference and a sibling reference. Your class(es) should be written in reasonably good object-oriented style.  You may have a number of methods in your DLB class, but it must minimally implement the DictInterface as specified.  Verify that this implementation works by substituting it for the MyDictionary class in your crossword filling program.[1]

2)      Compare the execution of your original crossword solution to that using the DLB.  Do this by informally timing the two for various test files.  To informally time the programs, run them while watching your clock/watch/etc.  We are not worried about exact times here – just orders of magnitude.  Based on your comparisons of the various test files, determine if there is a significant difference in the run-times.  Below are some additional notes about the comparisons:

a)      Your two main programs should differ ONLY in the class storing the dictionary (MyDictionary vs. DLB).  In all other aspects the two programs should be identical.  In fact, if you prefer you can use one program with the dictionary choice as an input).

b)      Some test files are online now, and a few more will be online soon – check back for updates.

c)      For some test files one or perhaps both of the programs will take several minutes or perhaps hours or even days.  Based on some worst case analysis and some real timing of the smaller files you should be able to get an idea of how long the larger files will take to run.  If a program takes more than a few hours to run you can abort the execution.

3)      Once you have completed your comparison runs, write a short paper (2-3 pages, double-spaced) that summarizes your project in the following ways:

a)      Discuss how you solved the crossword-filling problem in some detail.  Include both how you set up the data structures necessary for the problem and how your algorithm proceeded.  Also indicate any coding / debugging issues you faced and how you resolved them.  If you were not able to get the program to work correctly, still include your approach and speculate as to what still needs to be corrected.

b)      Discuss the differences (if any) between the run-times of the program using the MyDictionary and the program using the DLB.  Include the (approximate) run-times for the programs for the various files in a table.  Include in this discussion an asymptotic analysis of the worst case run-time for each version of the program.  Some values to consider in this analysis may include:

i)        Number of  words in the dictionary

ii)       Number of characters in a word

iii)     Number of possible letters in a crossword location

iv)     Number of crossword locations in the puzzle

If you were unable to complete the crossword solving program, speculate (using some intelligent guessing) for the actual run-times, but still include the comparison in your paper.

4)      W SECTION ONLY: In addition to the paper requirements in 3) above, you must add a section that compares in detail the MyDictionary implementation of DictInterface to the DLB implementation.  Discuss how the data is stored, how a search is done, and how long this will take.  In particular, elaborate on how (if at all) the DLB is superior to the MyDictionary in testing for prefixes.

a)      Note: The paper will count more toward the overall project grade for the W section than for the regular section.  It will also be graded more strictly for spelling / grammar / etc. for the W section than for the regular section.  However, the W section paper will be evaluated, returned, revised and resubmitted before the overall paper grade is determined.

 

Important Notes:

Ø      Be sure to also complete Part A of this Assignment!

Ø      Be sure to thoroughly document your code.

Ø      This assignment was devised so that Part A and Part B could be implemented independently.  If you are unable to complete one part don't let that prevent you from completing the other.  Also, be sure to try each and submit something so that you can get some partial credit.

Ø      If you are interested in some extra credit for this assignment, here are some possibilities:

Ø      Create a third implementation of the DictInterface and compare its performance to that of the other two implementations.  One possibility is to use binary search of a sorted array.  However, searching for a prefix is tricky if binary search is used, so be careful with this approach (i.e. make sure it is correct).

Ø      Do more detailed analysis of the run-times of the two different DictInterface implementations.  Add timing to your program and do multiple runs to obtain (reasonably) accurate values of the actual run-times, and create a graph to show how the run-times increase as the crossword board size increases.

 



[1] If you were unable to get the crossword algorithm to work correctly, you need to test your DLB tree in some other way so that you can still get credit for it.  I recommend writing a simple test program to fill the dictionary, and then look up both words and prefixes to verify that your DLB works correctly.