CS 1501 Programming Project 3

Lossless Compression Schemes

Purpose: The purpose of this assignment is for you to understand and compare some lossless compression schemes.  In particular you will compare the compression using self-organizing search heuristics (move to front and transpose) algorithms that we discussed in lecture to two other compression algorithms.

General Details: In lecture we discussed (or will discuss) a number of lossless compression algorithms, including Huffman, LZW and the self-organizing search compression algorithm.  In this assignment you will implement the self-organizing search algorithm and compare its compression quality to a 2 other compression algorithms.  You will test each algorithm on various different files to see how each performs under different circumstances.  You will then tabulate your results and write a brief report comparing the algorithms.

Specific Procedure:

1)      Thoroughly read the following web sites also give brief introductions to Move-To-Front (one particular self organizing strategy) compression: http://www.arturocampos.com/ac_mtf.html and http://www.rebolforces.com/articles/compression/2.  The Transpose heuristic only moves the accessed item 1 position closer to the front.  Try a few practice examples by hand until you are completely familiar with the algorithm and how and why it works.

1)      In your implementation the original codewords should be 16-bit sequences (i.e. you will process the input file 16 bits at a time and your arrays should have 216 locations). Since your input file could be arbitrary bytes and since your output data can be fractions of bytes, both your input and output files must be BINARY files.  If you are unfamiliar with using binary files in Java, see the handout binaryio.java and ByteCountDemo.java for some help.  If/when using the bit shift operators, note the distinction between the operators >> and >>>.  Here is an applet that might help too.  Also look at the lzw.c handout for help with I/O of byte fractions. 

2)      You should use the following nonstandard coding of the integers in the coded file. : In the standard coding, each integer is encoded with16  bits in the standard manner. For example, the integer 11 would be encoded as the 16 bits    00000000 00001011. To understand the nonstandard encoding, consider an arbitrary non-negative x. Let g(x) be 0 if x=0, and otherwise let g(x) be the standard binary representation of x with the leading zeroes stripped off. So for example, if x=11, the g(x)=1011. For an integer x, define f(x) to be the 4 bit standard encoding of the number of bits in g(x) minus 1. So if x=11, the number of bits in g(x)=4, so then f(x)=0011. The nonstandard encoding of an integer x is f(x) concatenated with g(x). So if x=11, the non standard encoding of x is 00111011. Note that this is just a bit different than I explained it in class. The TAs will explain again in recitation. Here is a table to give you some hints:

Number

Non-Standard Encoding

0

00000

1

00001

27

010011011

131

011110000011

65535

11111111111111111111

3)      We will create 4 versions of your program, which we will call ListCode, depending on whether:

·         Coding or Decoding

·         Using Move-To-Front or Transpose to reorder the list

4)      Write a program to implement the algorithms that will work for an arbitrary file.  You should be able to specify the algorithms and whether you want to compress or decompress from the command line. That is:

·         ListCode –c –m filename should do Move-to-Front compression with nonstandard coding.

·         ListCode –d –m filename should do Move-to-Front decompression with nonstandard coding

·         ListCode –c –t filename should do Transpose compression with nonstandard coding

·         ListCode –d –t  filename should do Transpose decompression with nonstandard coding

5)      It will probably help if you use a hex editor; i.e. a program that will allow you to open a file and directly view/edit the hexadecimal values stored in the file. Under Windows one freeware options is xvi32, which can be found here: http://www.chmaas.handshake.de/delphi/freeware/xvi32/xvi32.htm (It will also display the binary value of each byte and allow you to
set individual bits).

6)      Once your program is working correctly, you will compare its performance with that of the standard the implementation of LZW lzw.c, and one commercial compression program..  We will provide you with a number of files to use for testing  A directory of the files can be found here, and a gzipped tar archive here. This archive can be opened with WinRar.

1.      Your program using the Move-To-Front heuristic with non-standard coding

2.      Your program using the Transpose heuristic with nonstandard coding

3.      The lzw.c program. An Windows executable lzw.exe is available.

4.      First apply your program using the Move-To-Front heuristic with nonstandard coding and then apply lzw.c

5.       One additional compression algorithm.  You may use any compression algorithm (ex: compress, gzip, pkzip or others), but you must briefly explain it in the write-up as specified below.

Run all programs on all of the files and for each file record the original size, compressed size, and compression ratio (compressed size / original size).  For each file record the compression ratio for each of the above algorithms.

7)      Write a short (about 2 pages for non-W student and 4 pages for W students) paper that minimally discusses / elaborates on each of the following:.

a)      Any implementation issues / problems you encountered and how you resolved them.

b)      How the compression ratios compare for the various test files.  Where there were differences between them, be sure to explain why.  If algorithms performed differently on different test files, explain (or speculate) why.  Speculate as to which algorithm (if any) was the best overall.

c)      For all algorithms, indicate which of the test files gave the best and worst compression ratios, and speculate as to why this was the case.  If any files did not compress at all or compressed very poorly, speculate as to why.

d)     Briefly explain the additional compression algorithm that you used in your test runs.  State the algorithm and a paragraph or two about the theory behind it.  You should find information on most compression algorithms somewhere on the Web.

e)      Include in your paper all of the compression ratio results that you recorded in 3) above.

f)       If you are in the W section, your paper should minimally be 4 pages, with at least one full page of explanation of your  algorithm choice.  As with the other projects, the paper will count more toward your overall score in the W section than in the regular sections.

8)      Submit your source code, runnable .jar file (or all .class files), write-up paper and your Assignment Information Sheet to the appropriate submission directory.  DO NOT SUBMIT THE TEST FILES – they will use up too much memory on the submission site.  As always, make sure that any executables will be able to be executed by the TAs without any modification. If you are doing the project with a partner, you need to both submit your program and your report. Make sure the names of both persons are on everything.