Learning to Model Text Structure
Dr. Regina Barzilay, MIT
Friday, April 2, 2004
10:30am - SENSQ 5317
Refreshments at 10am in SENSQ 5319
Abstract
The natural language processing community has struggled for years to develop computational models of text structure. Such models are essential both for interpretation of human-written text and for evaluation of machine-generated text. Applications such as text summarization and machine translation would greatly benefit from such models.
In this talk, I will present our first steps towards learning to model text structure. I will describe two models that are induced from a large collection of unannotated texts. The first model captures the notion of text cohesion by considering connectivity patterns characteristic of well-formed texts. These patterns are inferred from a matrix that combines distributional and syntactic information about text entities. The second model captures the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. I will present an effective method for learning content models, utilizing a novel adaptation of algorithms for Hidden Markov Models. To conclude my talk, I will show how these text models can be effectively integrated into natural language generation and summarization systems.
This is joint work with Mirella Lapata and Lillian Lee.





