Advanced Topics in NLP: Discourse Processing and Pragmatics
CS 3730/ISSP 3120 Natural Language Processing
CRN: 30717

Course Description:

This course will explore diverse topics in discourse processing and pragmatics, including theories of discourse structure, coherence, discourse markers, reference resolution, temporal interpretation, and implicature. The course will center on Natural Language Processing, but will include some literature from related areas such as linguistics to help students understand the discourse structure and pragmatics of language.

Instructor:

Instructor Dr. Jan Wiebe
Office Hours Tuesday & Thursday 1:30-2:30pm and by appointment
Office 5409 Sennott Square
Phone (412) 624-9590
Email wiebe@cs.pitt.edu
Web Site www.cs.pitt.edu/~wiebe/courses/CS3730/Spring05


Lectures:

Day Time Place
Monday & Wednesday 1:00-2:15pm 5313 Sennott Square

Prerequisites:

Graduate-level courses in Artificial Intelligence and Natural Language Processes, or permission of instructor.

Optional Textbooks:

The Handbook of Discourse Analysis (2001). Deborah Schiffrin, Deborah Tannen, and Heidi Hamilton (eds.). Malden, MA: Blackwell. This is in the bookstore.

The Handbook of Pragmatics (2003). Laurence R. Horn and Gregory Ward (eds.) Oxford: Blackwell.

Course Readings:

A course schedule, available through the course Yahoo! group, lists the papers or chapters to be discussed each day. Hardcopies of those not available electronically on the course schedule will be handed out in a previous class.

The papers marked "Other" on the course schedule are papers we would have covered if there were more time. They are not required, but would be valuable to read.

The papers represent a sampling of research in discourse and pragmatics. You are encouraged to branch out and read beyond the reading list for the course.

Course Requirements:

Course Project 40% (proposal: 8%, presentation: 8%, report: 24%)
Class Presentations 25%
Reaction Essays 25%
Class Participation 10%

* Course Projects

By February 9, you should have met with me and handed in a project proposal. By March 21, you should have met with me again and handed in a status report on your project. A draft of your report or paper is due April 18. The final report is due April 27. All of these deadlines must be met to receive credit on the course project. Students will give project presentations during the final days of class.

Feel free to discuss your ideas for projects before writing your proposal. Your project should be non-trivial and interesting, yet feasible given the time frame.

There are five options for the project:

* A corpus annotation project. This type of project must be done in pairs. It will involve developing annotation instructions, gathering a corpus, performing a training round of annotation, discussing the results with each other, revising the annotation instructions, and then annotating a fresh test set. Inter-coder reliability should be reported (percentage agreement and Kappa). The amount of data annotated need not be large.
* Implement and evaluate an algorithm that performs some type of discourse or pragmatic processing (such as anaphora resolution, recognizing discourse relations, discourse segmentation, resolving bridging inferences, tracking a temporal reference frame, etc.) This type of project may be done in pairs or individually (or, if the project can support it, in larger groups than two).
* Use discourse or pragmatic knowledge to improve an application system such as a question answering system. Processing may be fully automatic, or your system may take manual annotations as input. This type of project may be done in pairs or individually (or, if the project can support it, in larger groups than two).
* Read 15 or more papers on a topic and write a paper about them. The goal is not to reiterate everything in the papers, but rather to address a specific set of issues and points of comparison between the papers. These should be specified in your project proposal. Your paper should be at least 15 pages long, single spaced. It should be well written, clear, and interesting, and should accomplish the goals laid out in the proposal for the project. This type of project must be done individually.
* Write an NSF-style research proposal, though the bibliography need not be as extensive as in an actual submitted proposal. A sample proposal will be made available to students choosing this option. Note that you must hand in a proposal for your proposal, just as for the other course project options. This type of project may be done in pairs or individually.
The following data will be made available to the class: The Rhetorical Structure Theory Discourse Treebank Publication, produced by the Linguistic Data Consortium (LDC). The RST Discourse Treebank contains a selection of 385 Wall Street Journal articles from the Penn Treebank which have been annotated with discourse structure in the framework of Rhetorical Structure Theory (RST). In addition, the corpus includes a number of humanly-generated extracts and abstracts associated with the original documents.

If you choose one of the computational options for the project, you may also use other annotated data to evaluate your system. You may use any existing annotations that are available to you. Or, you may annotate a small test set yourselves to evaluate your system.

* Reaction Essays and Annotations To help presenters direct class discussions better, everyone is expected to write a short reaction essay for each assigned paper. Your reactions should be concise (200-300 words), and they should be well written.

Given the space limitation, your reaction essay should not include a complete summary of the reading material. You should limit your reactions/ideas to one or two per paper. Your ideas may have a number of forms: (a) You may compare the work to related material; (b) You may hypothesize about ways in which the work could have been improved; (c) You may think about ways to expand on the work (conceptually or computationally); (d) You may critique the work, including its conceptual framework, methodology, and/or results; or (e) you may describe something you don't understand that you would like the class to discuss (explain exactly what you don't understand).

For some papers, you will also be asked to perform some annotations of a text snippet which are relevant to the paper.

Reaction essays (and annotations, if applicable) are due by noon two days (counting weekend days) before the class during which the paper will be presented.

We will use the Yahoo group pittcs3730. Enter it through yahoo.com, or go there directly: http://groups.yahoo.com/group/pittcs3730.

Please post your reaction essays in plain text. Do not use attachments. Post them as messages, not as uploaded files. It is preferable to post annotations in the same way. However, if you want to prepare some annotations in a format other than plain text (in powerpoint or word, for example), then you may upload them as files. Also, please upload your powerpoint presentations after you have given them. Please be sure to follow the naming conventions given below for both plain-text messages and uploaded files.

The group is by invitation only, and the messages will be accessible to group members only. Please sign up for a Yahoo account, if you don't have one already, and send your Yahoo user name to wiebe@cs.pitt.edu . I will invite you to join the group. You will receive the invitation at your Yahoo email account.

Note that you need to click on SECURE each time you enter your password into Yahoo, if you want it to be secure.

Please post to the group by sending email to pittcs3730@yahoogroups.com or by posting a message on the Web site.

You will have options for message delivery: Send individual email messages, Daily digest (send many emails in one message), Special notices (Only send me important update emails from the group moderator), and No email (Don't send me email, I'll read the messages at the Web site). It's up to you whether you want to receive the messages in email or simply access them at the Web site.

You will also have the option to add another email address, such as your cs.pitt.edu address (which Yahoo will make you verify). If you do this, then that is the email account you can use to send messages to and/or receive messages from pittcs3730@yahoogroups.com (if you opt to use email).

Please use the following naming conventions for your message subject lines and for any annotation files uploaded to the site:
last-name-of-first-author last-2-digits-publication-year your-last-name essay (for reaction essays) and annots (for annotations). For example, Theresa Wilson's reaction essay for Grosz and Sidner 1986 should have the subject line "Grosz 86 Wilson essay".

* Class Presentations Each student will give two class presentations. You should prepare slides for your presentation. Since everyone in the class will have read the assigned papers, there is no need to reiterate everything in the paper. You should present the claims of the paper, and address some interesting issues. What did we learn from the paper? What future work does the research suggest? If the reaction essays include annotations, you should also lead a class discussion of the annotations. The reaction essays for your paper should be good sources of issues and questions for discussion.

Feel free to meet with me before your presentation. Send email to arrange a time, if you cannot come to office hours. Bring a sketch of your presentation. I can answer background questions you may have, and help you figure out what to focus on in your presentation.

Please upload your presentations to the Yahoo group after you have given them in class (click on "Files" to upload them). The file names: each paper should be included in the name, and "pres" should be included in the file name. For example, suppose Theresa Wilson presents Mann and Thompson 1988 and Hobbs 1979 (Day 6) using powerpoint. The name of her uploaded file should be Mann88Hobbs79WilsonPres.ppt

* Class Participation and Late Assignments Because in-class discussions are an important part of this course, absences are strongly discouraged. Even if an absence is unavoidable, you are still responsible for making arrangements to turn in the assignments on time. You are also responsible for obtaining the materials passed out and the information presented during the missed class.

No extensions will be given for reaction essays. In case of extraordinary circumstances (hospitalization, family emergency) you should contact me as soon as possible so that we may arrange an extension for assignments prior to the due date. Late assignments will not be accepted.

* Academic Integrity Academic integrity: if you include material from any source in your presentation and/or projects, you must acknowledge it. Your presentations and projects should represent your (and your partner's, if applicable) original work.