CS 1501

Algorithm Implementations

Programming Project 1 Part A

(see Part B online next week)

 

Online: Saturday, May 20, 2006

Due (BOTH Parts A and B): All assignment materials: 1) All source files of program, 2) All .class files (or a .jar file containing them)  3) Well written/formatted paper explaining your search algorithm and results (see Part B for details on the paper) and 4) Assignment Information Sheet on the appropriate directory of the submission site by 11:59 PM on Wednesday, June 7, 2006.  Note: Do NOT submit the dictionary file or the input files.

Late Due Date: 11:59PM on Friday, June 9, 2006.

 

Background:

Crossword puzzles are challenging games that test both our vocabularies and our reasoning skills.  However, creating a legal crossword puzzle is not a trivial task.  This is because the words both across and down must be legal, and the choice of a word in one direction restricts the possibilities of words in the other direction.  This restriction progresses recursively, so that some word choices "early" in the board could make it impossible to complete the board successfully.  For example, look at the simple crossword puzzle below (note: no letter Xs appear in the actual puzzle shown below – in this example X is always used as a variable):

 

L

E

N

S

 

L

E

N

S

 

L

E

N

S

 

L

E

N

S

 

X1

X2

 

 

 

Q

U

A

 

 

Q

U

A

 

 

T

O

N

 

 

 

 

 

D

U

 

B

 

D

U

 

B

 

A

C

 

O

 

 

 

 

 

C

X3

 

 

 

E

X3

 

 

 

T

H

A

W

Figure 1

 

Figure 2

 

Figure 3

 

Figure 4

 

Assume that LENS has been selected for the first row of the puzzle, as shown in the Figure 1 above.  Now, the word in column two must begin with an E, the word in column 3 must being with an N and the word in column 4 must begin with an S (single character words are not considered here, so the L in column 1 is irrelevant to the rest of the puzzle).  There are many ways to proceed from this point, and finding a good way is part of the assignment.  However, if we are proceeding character by character in a row-wise fashion, we now need a letter X1 such that EX1 is a valid prefix to a word.  Several letters will meet this criterion (EA, EB and EC are all valid prefixes, just to pick the first three letters of the alphabet).  Once a possibility is selected, there are now two restrictions on the next character X2: NX2 must be a valid word and X1X2 must be a valid prefix to a word (see Figure 1). Assume that we choose Q for X1 (since EQ is a valid prefix). We can then choose U for X2, (see Figure 2 (NU is a valid word in our dictionary)). Continuing in the same fashion, we can choose the other letters shown in Figure 2 (in our dictionary QUA, DU and DC are all legal words).

 

Unfortunately, in row 4, column 2 we run into a problem.  There is no word in our dictionary EQUX3 for any letter X3 (note that since we are at a terminating block, we are no longer just looking for a prefix) so we are stuck.  At this point we need to undo some of our previous choices (i.e. backtrack) in order to move forward again toward a solution.  If our algorithm were very intelligent, it would know that the problem that we need to fix is the prefix EQU in the second column.  However, based on the way we progressed in this example, we would simply go back to the previous square (row 4, column 1), try the next legal letter there, and move forward again.  This would again fail at row 4, column 2, as shown in Figure 3.  Note that the backtracking could occur many many times for a given board, possibly going all the way back to the first word on more than one occasion.  In fact the general run-time complexity for this problem is exponential.  However, if the board sizes are not too large we can likely solve the problem (or determine that no solution exists) in a reasonable amount of time.  One solution to the puzzle above is shown in Figure 4.

 

Part A of your assignment is to create a legal crossword puzzle (if it exists) in the following way:

 

1)      Read a dictionary of words in from a file and form a MyDictionary of these words.  The interface DictInterface and the class MyDictionary are provided for you on the CS 1501 Web page, and you must use them in this assignment.  Read over the code and comments carefully so that you understand what they do and how.  The interface DictInterface will also be important for Part B of this assignment.  The file used to initialize the MyDictionary will contain ASCII strings, one word per line.  Use the file dict8.txt on the CS1501 Web page.

2)      Read a crossword board in from a file.  The name of the file should be specified by the user.  The crossword board will be formatted in the following way:

a)      The first line contains a single integer, N.  This represents the number of rows and columns that will be in the board.  Since the dictionary will contain up to 8-letter words, your program should handle crosswords up to 8x8 in size.

b)      The next N lines will each have N characters, representing the NxN total locations on the board.  Each character will be either

i)        + (plus) which means that any letter can go in this square

ii)       – (minus) which means that the square is solid and no letter can go in here

iii)     A..Z (a letter from A to Z) which means that the specified letter must be in this square (i.e. the square can be used in the puzzle, but only for the letter indicated)

For the board shown above, the file would be as follows:

 

4
++++

-+++

++-+

++++

 

Some test boards will be put onto the CS 1501 Web site soon – check back for them.

3)      Create a legal crossword puzzle for the given board and print it out to the display (standard output).  To make your assignments easier to grade, you must also print your legal crossword back to a file, in the same format as the input file.  For example, the output to the crossword shown above in Figure 4 would be

 

4

LENS

-TON

AC-O

THAW

 

Use the same name for your output file as for each input file, but with the extension .out .  Note that a given input file could have multiple solutions, or none.  If a solution exists, you only need to print out one.

 

Important Notes:

Ø      Be sure to complete Part B of this Assignment!  It will be online soon.

Ø      Search algorithm details: Carefully consider the algorithm to fill the words into the board.  Make sure it potentially considers all possibilities yet does not waste time rechecking prefixes that have already been checked.   Although you are not required to use the exact algorithm described above, your algorithm must be a recursive backtracking algorithm.  The algorithm you use can vary greatly in its efficiency.  If your algorithm is very inefficient or otherwise poorly implemented, you will lose some style points.  This algorithm is a significant part of the overall project, so put a good amount of effort into doing it correctly.  For guidance on your board-filling algorithm, it is strongly recommended that you attend recitation.

Ø      The MyDictionary implementation of the DictInterface that is provided to you should work correctly, but it is not very efficient.  Note that it is doing a linear search of an ArrayList to determine if the argument is a prefix or word in the dictionary.  In Part B of this assignment you will write a more efficient implementation of the DictInterface.

Ø      Be sure to thoroughly document your code, especially the code that fills the board.