Natural Language Processing (CS 3730 / ISSP 3120), Fall 2001

Time: MW 12:00-1:20  Place 234 Eberly Hall 
Professor:  Diane Litman Office Hours:  M 1:30-3 (214 MIB); Th 10-11:30 (741 LRDC); F 12-1 (741 LRDC)
Email:  litman@cs.pitt.edu Phone:  412-624-8838 (MIB); 412-624-1261 (LRDC)

Description:

This course provides an introduction to the field of natural language processing (NLP) - the creation of computer programs that can understand, generate, and learn natural language. We will use natural language understanding as a vehicle to introduce the three major subfields of NLP: syntax (which concerns itself with determining the structure of a sentence), semantics (which concerns itself with determining the explicit meaning of a single sentence), and pragmatics (which concerns itself with deriving the implicit meaning of a sentence when it is used in a specific discourse context). The course will introduce both knowledge-based and statistical methods for NLP, and will illustate the use of such methods in a variety of application areas.

Text:

Speech and Language Processing by Jurafsky and Martin. It should be available from the Campus bookstore, as well as from Amazon and other online providers.

Errata for the text

Books on Reserve at the Eberly Hall Library (all books are now there):

  • Natural Language Understanding by James Allen, 1995. ISBN# 0-8053-0334-0
  • Foundations of Statistical Natural Language Processing, by Christopher D. Manning and Hinrich Schutze, 1999. ISBN# 0-262-13360-1
  • A Comprehensive Grammar of English Language, by Randolf Quirk, Sidney Greenbaum, Geoffrey Leech, Jan Svartvik, 1985.

    Be sure to get the linguistic background handouts from the first two books.

    Requirements:

    Concepts taught in class will be reinforced with assignments (both problem sets and programming), and exams.

    Announcements:

    Grades

    Grades are now available. You were a great class and I enjoyed the semester! I hope you did too.

    Advanced Topics in Artificial Intelligence (CS 3710 / ISSP 3565):

    I will be teaching a seminar next semester which is a natural follow-up to this course. The class will involve the presentation and discussion of papers in discourse and dialogue. It will meet in LRDC 814 (a conference room), MW 2:00 PM - 03:20 PM. Details can be found here.

    Survey on Conversational Agents:

    If you want to have some input on CMU's Universal Speech Interface project, fill out their 10 minute survey.

    Prosody and ASR:

    If you want to see why class was cancelled, check out ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding.

    ACL'02 Announcement:

    40th Annual Meeting of the Association for Computational Linguistics, 7 - 12 July, 2002, Philadelphia, PA, USA. Call for Papers.

    Interesting Links (besides the resources available from the text homepage):

    Chapters 1 and 2:

    Try out one of the many versions of Eliza on the web. Code and an article about a similar program called Racter (thanks to Chad Lane for these links). Eliza, Racter, and other classic programs.

    Chapter3:

    AT&T Labs - Research Finite State Machine Library

    Later Chapters - pointers from Steffi (thanks!):

    Appelt and Israel's information extraction tutorial (IJCAI-99).

    Framenet.

    Michael Collins' Parser (requires a tagger to work).

    Chapter 19:

    Allen's Dialogue Modeling for Spoken Language Systems tutorial (ACL Workshop 1997).

    Hirschberg's Intonational Variation in Spoken Dialogue Systems tutorial.

    Syllabus:

    Week Class Topic Reading Assignments
    1 Aug 27 Course Overview and Administration    
      Aug 29 Knowledge of Language Ch 1  
    2 Sep 3 Labor Day Holiday    
      Sep 5 Regular Expressions and Automata Ch 2  
    3 Sep 10 ...continued    
      Sep 12 Morphology and Finite State Transducers Ch 3 HW1
    4 Sep 17 ...continued    
      Sep 19 N-Grams Ch 6 (through 6.4)  
    5 Sep 24 ...continued    
      Sep 26 Part of Speech Tagging Ch 8 HW1 due
    6 Oct 1 ...continued    
      Oct 3 Context-Free Grammars Ch 9 HW2 (please make sure you have the corrected version of question 2).

    I understand that some people are having trouble getting the ngram software to work... If you have a cs account, use the machines bert or ernie. Otherwise, Stefanie Bruninghaus has provided her fixes (thanks Steffi!), which seems to work for some other machines (README, Stats, count-unigrams.pl).

    7 Oct 8 ...continued    
      Oct 10 Parsing with CFGs Ch 10  
    8 Oct 15 ...continued    
      Oct 17 Features and Unification Ch 11  
    9 Oct 22 No class - instructor away    
      Oct 24 No class - instructor away   HW2 due (turn in to Angela Balcita, MIB 212)
    10 Oct 29 Representing Meaning Ch 14   
      Oct 31 Midterm Exam (covers through Ch 11)    
    11 Nov 5 Semantic Analysis Ch 15 (skip 15.2 though)   
      Nov 7 ...continued    
    12 Nov 12 Lexical Semantics Ch 16  
      Nov 14 ...continued; Word Sense Disambiguation Relevant parts of Ch 17 HW3. Note: the data you get from Cobuild (Q 4) might be noisy.
    13 Nov 19 Discourse Ch 18  
      Nov 21 Thanksgiving Holiday    
    14 Nov 26 ...continued    
      Nov. 28 Dialogue and Conversational Agents Ch 19  
    15 Dec 3 ...continued   HW3 due
      Dec 5 ...continued; Generation Ch 20 (sections 1-3)  
    16 Dec 10 ...continued; Summing Up    
      Dec. 12 Final Exam (non-cumulative, covers Ch 1, and from Ch 14 on)    

    Lecture Notes:

    Available on request.