CS 1501 Algorithm Implementation

Spring 2004

Exam 2

 

 

 

 

Problem

Possible Pts

Pts Received

1

20

 

2

10

 

3

12

 

4

8

 

5

16

 

6

8

 

7

8

 

8

8

 

9

10

 

Total

100

 

 


For all questions, be sure to show your work.  Answers without work will not receive full credit.

1)      (20 points – 2 points each) Fill in the Blanks. Complete the statements with the MOST APPROPRIATE word(s) and/or phrase(s).

a)      For a variable-length codeword compression scheme such as Huffman compression to be valid, it must satisfy the prefix property, which means ___no codeword can be a prefix of another codeword___.

b)      An important difference between BFS and PFS is how the fringe is implemented.  In BFS, the fringe is a(n) _____queue_________________ and in PFS the fringe is a(n) ___priority queue____________

c)      DFS on a graph with V vertices and E edges runs in time ___Theta(V + E)___________ with an adjacency list and in time ________Theta(V2)_________ with an adjacency matrix.

d)      A graph G is said to be biconnected if ___there are 2 distinct paths between all vertex pairs____.

e)      The algorithm to find articulation points that we discussed in lecture was a modification of the _______________DFS_______________________________ algorithm.

f)       A sequence of N Inserts followed by N DeleteMins on a min-heap will have a total run-time of Theta(____NlgN_____________________).

g)      One rule for a valid flow in a network graph is: For All u in [V – {s,t}],  sum(flow on edges incident upon u) == 0.  This means ____the sum of the flow into a vertex is the same as the sum of the flow out of the vertex___________________________________.

h)      Given edge (u, v) with capacity 75 and flow 50, and using PFS to find an augmenting path, the priority value for edge (u, v) would be ______________25________________ and the priority value for edge (v, u) would be _________50______________________.

i)        The triangle inequality as applied to TSP states that ____CijCik + Ckj for all i, j, k___

j)        When considering a neighbor of a TSP tour for the 2-opt heuristic, given two edges in the original tour, (U, V) and (W, X), I remove them and replace them with _____(U, W) (V, X)_____________.

 

2)      (10 points – 2 points each) Indicate if each of the following statements is TRUE or FALSE.  For FALSE statements, INDICATE WHY THEY ARE FALSE.

 

a)      Huffman compression is an example of a lossy compression scheme.  FALSE -- lossless

b)      In LZW compression, once all available codewords have been "used" (i.e. placed into the dictionary), the input file from that point on will not be compressed.  FALSE – compression will just not improve

c)      An advantage of an adjacency matrix representation of a graph is that all neighbors of a vertex can be found in time Theta(V) (where V is the number of vertices in the graph).  FALSE – this is true but it is a disadvantage – consider sparse graphs.

d)      NP-Complete problems are problems that are known to require exponential run-times.  FALSE – not proven, just hypothesized.

e)      If I discover a true polynomial algorithm that solves the Traveling Salesman Problem, I will have proved that P = NP.  TRUE

3)      (12 points – 6 + 6) Consider the Huffman and LZW compression algorithms that we discussed in lecture. Consider a file with 1G (230) letter As in it (no other characters, except for the end of file character, which we will ignore, so its original size is 1GB).  Also consider each of the 5 compression ratios: 1/2, 1/8, 1/32, 1/128, 1/k where k > 128.

a)      Which of the compression ratios best approximates the compression ratio achieved by Huffman compression of the file?  Thoroughly justify your answer.  The correct answer without justification will get minimal credit.

Answer: If the only character present is A, it can be encoded in a single bit.  Since an ASCII A requires 8 bits, this yields a maximum compression ratio of 1/8.  A minimal overhead is required to store the tree information for the A, so the overall compression ratio will be very close to the optimal 1/8.

b)      Which of the compression ratios best approximates the compression ratio achieved by LZW compression of the file?  Assume a fixed codeword size of 16 bits is used.  Thoroughly justify your answer.  The correct answer without justification will get minimal credit.

Answer: LZW will proceed on this file in the following fashion:  The first codeword output will match 'A', the second will match 'AA', the third will match 'AAA', etc.   This is because with each codeword output a string with one additional 'A' is added to the dictionary.  Thus, each successive codeword will represent one more 'A' than the codeword before it.  Let K = the total number of codewords needed to compress the entire file.  Based on the discussion above, it is clear that 1 + 2 + 3 + … + K ≥ 230, or, based on the solution to an arithmetic series, K(K+1)/2 ≥ 230.  Solving for K exactly is tricky, since we would need the quadratic formula.  However we only need to approximate here, so we can say that approximately K2 = 230.  This solves to K = 215.  Since each codeword is 2 bytes, the size of the compressed file will be approx. 2(215) = 216.  Thus, the compression ratio is 216/230 = 1/(214), and the answer is 1/k where k > 128 (by far).

 

 

4)      (8 points) Consider a file containing the following ASCII string:

ABACABABA

Trace the LZW encoding process for the file (in the same way done in handout lzw.txt).  Assume that the extended ASCII set will use codewords 0-255.  For each step in the decoding, be sure to show all of the information indicated below, in the form shown in handout lzw.txt.  Note: The ASCII value for 'A' is 65.

 

STEP     PREFIX MATCH      CODEWORD OUTPUT   (STRING, CODE) ADDED TO DICTIONARY

----     ------------      ---------------   ----------------------------------

         1                      A                                    65                                             AB, 256

         2                      B                                     66                                             BA, 257

         3                      A                                    65                                             AC, 258

         4                      C                                     67                                             CA, 259

         5                      AB                                  256                                           ABA, 260

         6                      ABA                               260                                               ----

 

 

5)      (16 points – 8 + 8)  An array representation of a min-heap data structure of integers is shown below. 

a)      Draw the resulting array after the operation  DeleteMin(). Show your work.

1

2

3

4

5

6

7

8

9

10

11

15

20

30

25

50

40

35

60

45

90

80

 

1

2

3

4

5

6

7

8

9

10

11

20

25

30

45

50

40

35

60

80

90

 

 

b)      Write Java code (or C++) to do the Insert() method for this heap.  Assume the following instance variables:

data: the heap itself, which is an array of integers

size: an int indicating how many integers are currently in the heap (assume size is much less than the actual array size, so an insert will not cause an out of bounds exception)

Use the header below:

public void Insert(int newKey)

 

Answer: See code on p. 150 of Sedgewick text

 

 


6)      (8 points)  Consider the graph below with the edge weights shown.  Assume the vertices are stored in alphabetical order, using an adjacency matrix.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


                                                                                            

 

Complete the table below, as it would look after a Minimum Spanning Tree (starting from vertex A) were created for the graph, assuming Prim's Algorithm is used.  val[] is the edge weight associated with the vertex, and dad[] is the parent vertex in the MST.  Show your work above or in the space below the table for partial credit.

 

 

A

B

C

D

E

F

G

H

I

val

0

2

5

9

3

6

7

4

8

dad

0

A (1)

B (2)

C (3)

H (8)

C (3)

E (5)

F (6)

D (4)

 

 

 

 

 

 

7)      (8 points)  Consider the PFS algorithm discussed in lecture, implemented using a) an adjacency list and b) using an adjacency matrix.  When, if ever, would each implementation be preferred?   Thoroughly justify your answers by giving the Theta run-times of each version in different relevant situations.

 

Answer:  Generally speaking, adjacency lists are preferable for sparse graphs and adjacency matrices are preferable for dense graphs.  Specifically in this case, we know the following:

Adjacency List PFS Runtime = Theta(elgv)

Adjacency Matrix PFS Runtime = Theta(v2)

Consider now a sparse graph (e ≤ vlgv)

      Adjacency List PFS Runtime ≤ (vlgv)lgv = v(lg2v)

      Adjacency Matrix PFS Runtime = v2

      In this case the adjacency list is preferred, since v(lg2v) < v2

Consider now a dense graph (e » v2)

      Adjacency List PFS Runtime » v2lgv

      Adjacency Matrix PFS Runtime = v2

      In this case the adjacency matrix is preferred, since v2lgv > v2


8)      (8 points) Consider the weighted graph below. The numbers are the edge capacities. S is the source vertex and T is the sink vertex.

 

 

 

 

 

 

 

 

 

 

 

 


Using the BREADTH FIRST SEARCH implementation of the Ford-Fulkerson algorithm, show EACH AUGMENTING PATH generated (in the correct order that the paths are generated), the amount of flow for each path, and the Maximum Flow for the graph.  Assume that an adjacency matrix implementation is used, and that the vertices are listed in the adjacency matrix in the following order: S, A, B, C, D, E, T.   To ensure partial credit, be sure to SHOW YOUR WORK.

           

            SADT:              60

            SBDT:              40

            SCDAET:         60

            -----------------------

            TOTAL:           160

           

            Note that the last path has a backward edge in it.  See notes and text for more information.

 

 

 

 

 

9)      (10 points) Consider an unweighted graph that may or may not be connected.  Your goal is the determine the number of connected components in the graph using the DFS algorithm for adjacency matrices.  For example, for the graph below your output would be:

The graph contains 3 connected components

Note that you should not have any other output.

 

 

 

 

 

 

 

 


Below is some (modified) code from the text with comments that you should use as a starting point.

 

Data that is already initialized for you to use:

a[][] – adjacency matrix for the graph

val[] – array to determine if a vertex has been visited or not

V – number of vertices

E – number of edges

 

void search()  // Main search function

{

    id = 0;    // Assume id is a global variable

    for (int k = 1; k <= V; k++) 

           val[k] = unseen;       // Initialize all vertices to unseen

 

      int comps = 0;

      for (ink k = 1; k <= V; k++)

      {

            if (val[k] == unseen)

            {

                  comps++;

                  visit(k);

            }

      }

      System.out.println("The graph contains " + comps + " conn. Components");

      // or cout << "The graph contains " << comps << " conn. Components");

}

 

int visit(int k)        // Recursive DFS

{                      

      val[k] = ++id;

      for (int t = 1; t <= V; t++)

      {

            if ((a[k][t]) && (val[t] == unseen))

                  visit(t);

      }

}