Founded in 1966

Learning to Model Text Structure

Dr. Regina Barzilay, MIT

Friday, April 2, 2004
10:30am - SENSQ 5317
Refreshments at 10am in SENSQ 5319

Abstract

The natural language processing community has struggled for years to develop computational models of text structure. Such models are essential both for interpretation of human-written text and for evaluation of machine-generated text. Applications such as text summarization and machine translation would greatly benefit from such models.

In this talk, I will present our first steps towards learning to model text structure. I will describe two models that are induced from a large collection of unannotated texts. The first model captures the notion of text cohesion by considering connectivity patterns characteristic of well-formed texts. These patterns are inferred from a matrix that combines distributional and syntactic information about text entities. The second model captures the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. I will present an effective method for learning content models, utilizing a novel adaptation of algorithms for Hidden Markov Models. To conclude my talk, I will show how these text models can be effectively integrated into natural language generation and summarization systems.

This is joint work with Mirella Lapata and Lillian Lee.