Dissertation Outline

My thesis is entitled "Applications of Discourse Structure for Spoken Dialogue Systems". A brief description of its current state is available below. For more details, please read my Ph.D. proposal document or browse the slides from the oral examination.

· Thesis Statement & Contributions

· Background

· Intuitions

· Applications

· Performance analysis

· Characterization of Speech Recognition Problems

· Characterization of User Affect

· Graphical representation - The Navigation Map

· Satellite Projects


Thesis Statement & Contributions

This thesis investigates and validates the utility of discourse structure for spoken dialogue systems in complex domains. Results show that discourse structure enables new insights on system performance, user affect and speech recognition problems. For dialogue system users, we find that a graphical representation of the discourse structure (the Navigation Map) is preferred over not having it. Current experiments are investigating the objective utility of a modification suggested by the performance analysis and of the graphical representation of the discourse structure.


Background

Dialogues (human-human or human-computer) have an inherent structure called the discourse structure. However, due to the relatively simple structure of dialogues in previous spoken dialogue systems, discourse structure has seen limited applications in these systems. In contrast, dialogues in complex domains like tutoring exhibit a richer discourse structure which enables new applications of this concept.


Intuitions

There are 4 intuitions behind this work. They are grouped below based on the application they relate to.

Performance analysis
The task of performance analysis is to discover factors that relate to or impact the system performance. For tutoring dialogue systems, the main performance metric is student learning (defined as the difference between the test given after and before interacting with the system). We had two intuitions regarding discourse structure for this task:

Intuition 1 - Conditioning: Phenomena related to performance are not uniformly important across the dialogue but have more weight at specific places in the dialogue. For tutoring, for example, our intuition says that it is more important if the user was correct at specific places in the dialogue rather than throughout the whole dialogue. "Specific places in the dialogue" is defined based on discourse structure transitions.

Intuition 2 - Discrimination: "Good" dialogues have a discourse structure different from "bad" dialogues. That is, we should be able to figure out the system performance just by looking at the structure of dialogues.

Characterization of discourse phenomena
Intuition 3 - Interaction: Dialogue phenomena are not uniformly distributed across the dialogue but are more frequent at specific places in the dialogue. For example, speech recognition and user affect are more frequent after certain transitions in the dialogue.

User side applications
Intuition 4 - Visual: It is easier for users to follow the conversation with the system if a graphical representation of the discourse structure is present.


Application: Performance Analysis

To validate the Conditioning intuition, we compared two parameter sets in terms of their ability to describe system performance: correctness parameters (e.g. % correct) and transition-correctness parameters (e.g. % correctness after a certain transition). We find that correctness parameters are not informative while the transition-correctness parameters offer interesting insights regarding performance. In particular we find that the user correctness (correct or incorrect) after a PopUp transition is indicative of whether that user learned (more or less).

Based on the hypothesis that the PopUp-(In)Correct parameters capture (failed) successful learning events, we implemented a modification of the system that changes its behavior after PopUp transitions based on user's correctness. We are currently investigating the effectiveness of this modification in a new user study.

For the Discrimination intuition, we compare dialogues by looking at transitions of length two in the dialogue structure: the transition-transition parameters. We find that several of these parameters (e.g. Push-Push) can discriminate users that learn more from user that learn less.

[More Details]


Application: Characterization of Speech Recognition Problems

We investigate the Interaction intuition by looking at statistical dependencies between the discourse structure transition and the presence of various types of speech recognition problems. An empirical study finds that certain discourse structure transitions have specific interaction patterns with SRP (e.g. Push and PopUp transitions have problematic interactions with AsrMis). From the dialogue designer perspective, our results suggest that particular attention should be paid at specific locations in the discourse structure (e.g. semantic interpretation problems due to speech recognition after PopUp transitions). In addition, the observed interactions suggest that discourse structure can be an informative feature for automatic prediction of speech recognition problems.

[More Details]


Application: Characterization of User Affect

We also investigate the Interaction intuition for one class of user affect: uncertainty. We find that discourse structure can be used to characterize user uncertainty over and above correctness. We find that specific transitions in the discourse structure are associated with an increase or decrease of uncertainty. If we discount for correctness, which is interacts significantly with uncertainty, we find additional interactions. The observed interactions suggest that discourse structure can be an informative feature for automatic prediction of user affect.

[More Details]


Application: Graphical Representation - The Navigation Map

If the Visual intuition is true, then users would prefer and learn better with a system that displays the graphical representation of the discourse structure (the Navigation Map - NM). We run a within-subjects user study focused on user's perception of the system with and without the NM. An analysis of users' ratings indicates that users prefer the NM-enabled version on various dimensions. The NM presence allows users to better identify and follow the tutoring plan and to better integrate the instruction. It was also easier for users to concentrate and to learn from the system if the NM was present.

We are currently running a between-subjects user study that investigates the objective utility of the Navigation Map.

[More Details]


Satellite Projects

Discourse structure transitions have proved to be useful for the system-side applications presented above. Other projects from our group have taken advantage of this information source:

· A project that investigates the utility of affect parameters for PARADISE performance models. The study makes heavy use of the Conditioning intuition and finds among others that transition-affect bigrams are very informative.
Katherine Forbes-Riley, Mihai Rotaru, Diane J. Litman (2008) “The Relative Impact of Student Affect on Performance Models in a Spoken Dialogue Tutoring System”. User Modeling and User Adapted Interaction (UMUAI) Special Issue on Affective Modeling and Adaptation, 18 (1-2). [abstract]

· A project that investigate which PARADISE interaction parameters generalize across corpora. The study finds among others that parameters derived from transition-correctness bigrams generalize well and show up in most PARADISE models.
Katherine Forbes-Riley, Diane J. Litman, Amruta Purandare, Mihai Rotaru and Joel Tetreault (2007). “Comparing Linguistic Features for Modeling Learning in Computer Tutoring Dialogues”. In Proceedings of International Conference on Artificial Intelligence in Education (AIED 2007), Marina Del Rey, USA. [abstract]

· A project that investigates emotion prediction in our system using a variety of sources. The depth in the discourse structure is used as one of the features though the study does not investigate its relative utility.
Hua Ai, Diane Litman, Kate Forbes-Riley, Mihai Rotaru, Joel Tetreault, and Amruta Purandare (2006). “Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs”. In Proceedings of Interspeech 2006, Pittsburgh, USA. [abstract]