CS 1501 Summer 2009
Practice Questions for Midterm Exam
Here are some example questions that may help
you to study for the midterm exam. Try to answer the questions fully before
looking at the answers. Though these questions indicate some of the material
that may be on the exam, they are by no means comprehensive. Remember to also
study EVERYTHING up to the point marked on the Syllabus
and Class Schedule , as well as everything
pertaining to the first 2 assignments, and your first quiz.
Fill in the Blanks Complete the statements below with the MOST APPROPRIATE words/phrases.
a) Given N keys, each with b bits, in a digital search tree, the WORST CASE search time for a key in the tree requires ____________________ key comparisons, while the AVERAGE CASE search time requires ________________ key comparisons.
b) Delete is a problem with open-addressing hashing because _______________________________________________________________________.
c) If I have an open addressing hash table of size M, and a cluster of size C, the probability that a random key will be Inserted into the location immediately after the cluster is _________________.
d) The mismatched character heuristic of the Boyer-Moore algorithm has a best case run-time of ____________.
e) If an encoding scheme satisfies the prefix property, it is certain that __________________________________________________________________.
True/False Indicate whether each of the following statements is True or False. For
False answers, explain why it is false.
a) The brute-force algorithm to find a Hamiltonian Cycle in graph has an upper-bound run-time of Theta(2n).
b) A
Patricia tree reduces the number of nodes from a Multiway Trie by eli
c) A good hash function should utilize the entire key.
Short Answers and Calculations
1) You have two programs, Program A, which runs in time k1N and Program B, which runs in time k2log2(N) for some constants k1 and k2. Assume that for a problem of size No, both programs take X seconds to execute. Approximately how much time would each program take to run if we double the problem size? Show your work.
2) Define what it means to have a collision in a hash table, and why we cannot usually prevent them from occurring.
3) Consider a file containing the following text data:
AAABBBAAB
Trace
the LZW encoding process for the file (in the same way done in handout lzw.txt,
so each "step" produces a single codeword). Assume that the extended ASCII set will use
codewords 0-255. For each step in the
encoding, be sure to show all of the information indicated below. Note: The ASCII value for 'A' is 65.
LONGEST
STEP # PREFIX MATCHED CODEWORD OUTPUT (STRING,
CODE) ADDED TO DICTIONARY
------ -------------- --------------- ----------------------------------
4) Consider Huffman compression. How would it perform on each of the following files and why? Be specific by giving approximate compression ratios in each case.
a) A file containing 1000 of each character in the alphabet
b) A file containing 1000000 As
5) Consider the mismatched character heuristic of the Boyer-Moore
string matching algorithm. For the
pattern and text strings shown below, state and justify how many total
character comparisons must be done in order to match the pattern shown
within the text string. Justify your
answer using the skip array for the pattern.
Text: ABCDXABCDYABCDZABCDE
Pattern: ABCDE
6) Justify in detail how many character comparisons are required to find a string in a DLB in the worst case. Assume that your DLB has N strings, each with a maximum of K characters, and that your alphabet has S possible characters in it.
7) Consider the SIMPLE divide and conquer algorithm for multiplying integers (not Karatsuba’s algorithm). Show the mathematical formula for how the multiplication is performed and show and justify the recurrence relation for the algorithm.
Coding
1) Assume that you are using linear probing in a hash table of Strings. Function h(x) is defined as we discussed it in class (and you do NOT have to write it). Complete the code for the Find function below, which will return true if the item is present and false otherwise. Be sure to handle ALL possibilities.
String [] table; // instance variable// other methods not shownpublic boolean Find(String item)
{ // Note
that all table locations int index = h(item); //
are initialized to null prior // to any
Inserts. Assume that // no Deletes
are allowed. // fill in code } |
SOLUTIONS
Fill in the Blanks Complete the statements below with the MOST APPROPRIATE words/phrases.
a) Given N keys, each with b bits, in a digital search tree, the WORST CASE search time for a key in the tree requires ______b______________ key comparisons, while the AVERAGE CASE search time requires ___Theta(lgN)___ key comparisons.
b) Delete is a problem with open-addressing hashing because ___if a value within a cluster is deleted, values after it in the cluster may not be found___.
c) If I have an open addressing hash table of size M, and a cluster of size C, the probability that a random key will be Inserted into the location immediately after the cluster is ___(C+1)/M_______.
d) The mismatched character heuristic of the Boyer-Moore algorithm has a best case run-time of ___N/M_____.
e) If an encoding scheme satisfies the prefix property, it is certain that ____no codeword is a prefix of any other codeword_________________.
True/False
Indicate whether each of the following statements is True or False. For False answers, explain why it is
false.
a) The brute-force algorithm to find a Hamiltonian Cycle in graph has an upper-bound run-time of Theta(2n). FALSE – the upper bound is n!
b) A
Patricia tree reduces the number of nodes from a Multiway Trie by eli
c) A good hash function should utilize the entire key. TRUE
Short
Answers and Calculations
1) You have two programs, Program A, which runs in time k1N and Program B, which runs in time k2log2(N) for some constants k1 and k2. Assume that for a problem of size No, both programs take X seconds to execute. Approximately how much time would each program take to run if we double the problem size? Show your work.
For Program A, since the time is linear, we
know that if we double the problem size the run-time should also double. Thus
we can say 2X seconds for Program A. For Program B it is more complicated,
since Program B runs in logarithmic time. However we can still solve this with
some math:
We know: k2log2(No)
= X
And we want to solve k2log2(2No)
= ?
Remembering properties of logarithms, we
can rewrite the problem as follows:
k2log2(2No) = k2[log2(2)
+ log2(No)] = k2 + k2log2(No)
= k2 + X
2) Define what it means to have a collision in a hash table, and why we cannot usually prevent them from occurring.
A collision occurs in a hash table if, for
two keys, x1 and x2, h(x1) = h(x2),
with x1 != x2. Collisions cannot
usually be prevented, since, in most instances, the key space being used (all
possible keys) is greater in size than the table size, and, by the Pigeonhole
Principle, at least two distinct keys must map to the same table location.
3) Consider a file containing the following text data:
AAABBBAAB
Trace
the LZW encoding process for the file (in the same way done in handout lzw.txt,
so each "step" produces a single codeword). Assume that the extended ASCII set will use
codewords 0-255. For each step in the
encoding, be sure to show all of the information indicated below. Note: The ASCII value for 'A' is 65.
LONGEST
STEP # PREFIX MATCHED CODEWORD
OUTPUT (STRING, CODE) ADDED TO
DICTIONARY
------ -------------- --------------- ----------------------------------
1
A 65 (AA, 256)
2
AA 256 (AAB, 257)
3
B 66 (BB, 258)
4
BB 258 (BBA, 259)
5
AAB 257 --
4) Consider Huffman compression. How would it perform on each of the following files and why? Be specific by giving approximate compression ratios in each case.
a) A file containing 1000 of each character in the alphabet
b) A file containing 1000000 As
a) Since all characters have the same frequencies,
Huffman will obtain no compression, since it depends on frequency disparities
to be effective.
b) Now, since all characters are the same, Huffman
will approach its optimal ratio of 8/1 – since the Huffman tree will have only
a single edge, thereby requiring only 1 bit to encode A, as opposed to the 8
bits required in ASCII. Note that the
tree information will take up some space, but the compression should still be
close to the optimal amount.
5)
Consider the mismatched character
heuristic of the Boyer-Moore string matching algorithm. For the pattern and text strings shown below,
state and justify how many total character comparisons must be
done in order to match the pattern shown within the text string. Justify your answer using the skip array
for the pattern.
Text: ABCDXABCDYABCDZABCDE
Pattern: ABCDE
Skip(A)=4,
Skip(B)=3, Skip(C)=2, Skip(D)=1 and Skip(E)=0. All other Skip array entries are
5. The total number of character comparisons is 8. For each of the first 3
mismatches the maximum skip value of 5 is used.
The final 5 comparisons are used to match the pattern right to left.
6) Justify in detail how many character comparisons
are required to find a string in a DLB in the worst case. Assume that your DLB has N strings, each with
a maximum of K characters, and that your alphabet has S possible characters in
it.
Recall that
a DLB "node" consists of a number of "nodelets" – one
"nodelet" per possible character for a given prefix in the
dictionary. In the worst case, all S
characters in a given "node" are used, thereby requiring S
"nodelets". If a character
being searched for in a given "node" happens to be in the last
"nodelet", S character comparisons will be required in the
examination of a single position within the string. If this worst case occurs for each position
within the string, a total of SK character comparisons will be required in total. Note that this worst case is extremely
unlikely, since after the first few levels the "nodes" typically have
very few "nodelets"' in them (which is why we use the DLB in the
first place).
7) Consider the SIMPLE divide and conquer algorithm for multiplying integers (not Karatsuba’s algorithm). Show and justify the mathematical formula for how the multiplication is performed and show and justify the recurrence relation for the algorithm.
We rewrite X = 2n/2XH
+ XL where XH are the n/2 high order bits of X and XL
are the n/2 low order bits of X. We do
the same for Y. Using the FOIL method
for multiplication yields:
XY = 2nXHYH
+ 2n/2(XHYL + XLYH) + XLYL
We can now define the recurrence: T(n) =
4T(n/2) + Theta(n), which means that the time required to multiply 2 n-bit
integers is equal to 4x the time to multiply 2 n/2-bit integers (seen through
the 4 product terms in the formula), plus Theta(n) time to add the subproducts and shift as necessary.
Coding
1)
public boolean Find(String item)
{
//
Note that all table locations
int index = h(item); //
are initialized to null prior
//
to any Inserts. Assume that no //
Deletes are allowed. for
(int i = 0; i < table.length; i++) { int
curr = (index + i) % table.length;
if (table[curr] == null) //
null location – not found return
false; else
if (table[curr].equals(item)) // found return
true; } return
false; // cycled through all locations
– not found } |