Project 5: Parallel Downloads

Due: Sunday, April 22th, 2012 at 11:59pm

Description

Peer-to-peer (P2P) networks and technologies have completely changed how data is currently distributed through the Internet.  When you visit a website, for instance, your web browser (the client) would contact and download a file from a web server that made the file available. This could often result in the overloading of the server if many clients tried to download the same file at the same time (e.g. flash crowds or the Slashdot effect). P2P networks don’t have this problem because they rely on the clients themselves to help distribute a file to other clients in the network.  This means that when you download a file over a P2P network, you are downloading different parts of it from different systems in parallel.

Downloading a file in parallel requires that parts of the file can be downloaded in any order and reassembled by the client. This requires that a file be divided into multiple blocks of data called chunks, which can be independently downloaded in any order. Effectively, a file is treated like a large array, with each chunk being one entry in that array. This allows a P2P client to download any chunk in the file and know exactly where in the array (file) the data should be placed, because it simply copies it to the location according to its index.

Requirements

In this project you will be creating one half of a P2P application named client.c that will download a single file from multiple remote hosts in parallel. We will provide you with the other half that provides the data to your client. In order to download the file in parallel you will need to spawn multiple threads that will each connect to a different file provider to request a specific chunk. Once a thread downloads a chunk it will have to write it out to the file output.txt. Note that because multiple threads are downloading chunks in parallel you will need to provide some synchronization to ensure the file is not corrupted.

Requirements

Your program will take as command line arguments a list of IP addresses and port numbers, like so:

./client <ipaddr1> <port1> <ipaddr2> <port2> ...

Your program will read the set of IP addresses and port numbers and create one thread for each IP address/port. This thread will then connect to the given host, and request a chunk from the file. After the chunk is received, it will request a new chunk until the complete file has been downloaded.

We will provide you with serv.c. This implements the server that will provide file chunks to your client.

Its usage is:

 ./serv -p <port> <filename>

port is the port number it will listen for connections on and filename is the name of the file it will provide to any client connections. When you connect to the server, you first have to send it the index number of the chunk you would like to request (as a string). The server will then send the chunk corresponding to the requested index.

You will be able to determine you have reached the last chunk in the file when the server closes the connection before a chunk is transferred. Note that when this happens the server might have sent a partial chunk or no data at all.

Server

You can get the server by copying it into your directory with the command:

cp /u/SysLab/shared/serv.c .

Please remember the dot at the end as it represents the current directory.

Example

We can launch as many servers to test against as you have ports. In this example, we’ll open two servers on ports 1000 and 1001 and run them in the background by using the ampersand. We can then launch our client with the two IP/port combinations and our client will request chunks from the two files until the entire file has been transferred. We can then kill the two servers by sending their process id SIGTERM.

./serv -p 1000 file.txt &

[1] 19039

./serv -p 1001 file.txt &

[2] 19040

./client 127.0.0.1 1000 127.0.0.1 1001

kill 19039

kill 19040

Ports and Addresses

For this project we will be working on thot.cs.pitt.edu. On the class website there is a list of usernames and your designated, personal port numbers. Please use these ports and only these ports. For the address of the machine, we will simply refer to it as localhost, or localhost’s reserved IP address: 127.0.0.1

Testing

thot.cs.pitt.edu is firewalled from the outside world, meaning that you will only be able to connect to your server from thot itself. In order to do this, your best bet for testing is to use a few programs:

Hints/Notes

Submission

You need to submit:

 

Make a tar.gz file as in the first assignment, named USERNAME-project5.tar.gz

Copy it to ~jrmst106/submit/449/ by the deadline for credit.