HOMEWORK 1 (CS 2731 / ISSP 2230)

Assigned: September 8, 2003

Due: September 24, 2003

Exercises

  1. Knowledge of Language (20 points)

    Inference is an essential part of natural language understanding. Informally, we define an inference as an assumption that is not explicitly stated but that most people would make during the understanding process. This might involve disambiguation or factual assumptions. Note that inferences can be wrong!

    For this question, you should list the inferences that most people would make while reading the following story: "A funny thing happened yesterday. John went to a fancy restaurant with a famous chef. John ordered the duck. The bill was big. John got a shock when the waiter came to collect the bill. He realized he didn't have a cent on him. The waiter said he could pay later. He was very nice about it."

    1. Identify as many specific inferences as you can.

    2. For each inference, state the category of knowledge that a computer would need to make the inference: phonetics and phonology, morphology, syntax, semantics, pragmatics, and/or discourse. Be sure your inferences illustrate all of the categories.

    Note that there is no "right" answer to this question. You and your classmates will likely generate a different set and different number of inferences. The assignment will be graded based upon how well your examples illustrate that inference is ubiquitous (so if you only find a few inferences then you should look harder), and on your characterizations of the types of knowledge required to make the inferences.

  2. Regular Expressions and Automata (80 points)

    1. Jurafsky & Martin 2.4 (10 points)

    2. Jurafsky & Martin 2.5 (10 points)

    3. Jurafsky & Martin 2.6. Use American time expressions (i.e., "1 PM" or "1:00 in the afternoon", not "13:00") (10 points)

      For a., b. and c. you are expected to give a graphical representation of the FSA. You are allowed to have words or other FSA names on the edges of the graph (using smaller FSAs to build the final FSA will help you with part d.). A hand drawn graphical representation is accepted and recommended.
       

    4. A time/date tagger (50 points)

      Using the FSAs you've just designed, write a program in a language of your choice that puts XML-like tags around time and date expressions. Your program should read text from standard input and output to standard output the tagged version of the text. You can assume that a date/time expression does not extend beyond end of line.

      • INPUT: a text in English.

      • OUTPUT: the same text with all date and time expressions marked by <TIME> and </TIME> (for both dates and times).

      • SAMPLE INPUT: Christmas is celebrated on the 25th of December. Christmas Eve is celebrated the night before.

      • SAMPLE OUTPUT: <TIME>Christmas</TIME> is celebrated on <TIME>the 25th of December</TIME>. <TIME>Christmas Eve</TIME> is celebrated <TIME>the night before</TIME>.

      • SUBMIT: (documented) source code; your training files; output of your program on your training files; and a README file listing all types of time and date expressions that your program can handle, and instructions on how to run your program. Make sure you document in your code how the code relates to the FSAs you designed for a., b. and c (for example, if you have designed an automaton that handles "the 5th of September" date type, comment the lines that implement that FSA with the appropriate description).

      • GRADING: Your program will be run on a unseen test file to evaluate its generality and correctness.

Submission instructions

The assignment is due on the due date before the class (at 11.00 am). For the paper submission, please bring your assignment with you in the class. If you are late, please put your submission in the TA mailbox (5th floor) and send an e-mail message indicating that you submitted the assignment (the submission date will be the e-mail date). For the electronic submission (programs, documentation) please refer to the following instructions:

You can repeat the submission steps as many time as you want, but please be aware that you are not allowed to delete any of the files. Your submission date will be the file stamp date.