|
|
|
|
This thesis investigates and validates the utility of discourse structure for spoken dialogue systems in complex domains. Results show that discourse structure enables new insights on system performance, user affect and speech recognition problems. For dialogue system users, we find that a graphical representation of the discourse structure (the Navigation Map) is preferred over not having it. Current experiments are investigating the objective utility of a modification suggested by the performance analysis and of the graphical representation of the discourse structure.
|
|
|
Dialogues (human-human or human-computer) have an inherent structure called the discourse structure. However, due to the relatively simple structure of dialogues in previous spoken dialogue systems, discourse structure has seen limited applications in these systems. In contrast, dialogues in complex domains like tutoring exhibit a richer discourse structure which enables new applications of this concept.
|
|
|
There are 4 intuitions behind this work. They are grouped below based on the application they relate to.
Performance analysis
The task of performance analysis is to discover factors that relate to or impact the system performance. For tutoring dialogue systems, the main performance metric is student learning (defined as the difference between the test given after and before interacting with the system). We had two intuitions regarding discourse structure for this task:
Intuition 1 - Conditioning: Phenomena related to performance are not uniformly important across the dialogue but have more weight at specific places in the dialogue. For tutoring, for example, our intuition says that it is more important if the user was correct at specific places in the dialogue rather than throughout the whole dialogue. "Specific places in the dialogue" is defined based on discourse structure transitions.
Intuition 2 - Discrimination: "Good" dialogues have a discourse structure different from "bad" dialogues. That is, we should be able to figure out the system performance just by looking at the structure of dialogues.
Characterization of discourse phenomena
Intuition 3 - Interaction: Dialogue phenomena are not uniformly distributed across the dialogue but are more frequent at specific places in the dialogue. For example, speech recognition and user affect are more frequent after certain transitions in the dialogue.
User side applications
Intuition 4 - Visual: It is easier for users to follow the conversation with the system if a graphical representation of the discourse structure is present.
|
|
|
To validate the Conditioning intuition, we compared two parameter sets in terms of their ability to describe system performance: correctness parameters (e.g. % correct) and transition-correctness parameters (e.g. % correctness after a certain transition). We find that correctness parameters are not informative while the transition-correctness parameters offer interesting insights regarding performance. In particular we find that the user correctness (correct or incorrect) after a PopUp transition is indicative of whether that user learned (more or less).
Based on the hypothesis that the PopUp-(In)Correct parameters capture (failed) successful learning events, we implemented a modification of the system that changes its behavior after PopUp transitions based on user's correctness. We are currently investigating the effectiveness of this modification in a new user study.
For the Discrimination intuition, we compare dialogues by looking at transitions of length two in the dialogue structure: the transition-transition parameters. We find that several of these parameters (e.g. Push-Push) can discriminate users that learn more from user that learn less.
[More Details]
| · |
Mihai Rotaru and Diane J. Litman (2006) “Exploiting Discourse Structure for Spoken Dialogue Performance Analysis”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sydney, Australia.
[abstract]
|
|
| |
In this paper we study the utility of discourse structure for spoken dialogue performance modeling. We experiment with various ways of exploiting the discourse structure: in isolation, as context information for other factors (correctness and certainty) and through trajectories in the discourse structure hierarchy. Our correlation and PARADISE results show that, while the discourse structure is not useful in isolation, using the discourse structure as context information for other factors or via trajectories produces highly predictive parameters for performance analysis.
|
|
|
|
|
|
We investigate the Interaction intuition by looking at statistical dependencies between the discourse structure transition and the presence of various types of speech recognition problems. An empirical study finds that certain discourse structure transitions have specific interaction patterns with SRP (e.g. Push and PopUp transitions have problematic interactions with AsrMis). From the dialogue designer perspective, our results suggest that particular attention should be paid at specific locations in the discourse structure (e.g. semantic interpretation problems due to speech recognition after PopUp transitions). In addition, the observed interactions suggest that discourse structure can be an informative feature for automatic prediction of speech recognition problems.
[More Details]
| · |
Mihai Rotaru and Diane J. Litman (2006) “Discourse Structure and Speech Recognition Problems”. In Proceedings of Interspeech 2006, Pittsburgh, USA (People's Choice Best Paper Award).
[abstract]
|
|
| |
We study dependencies between discourse structure and speech recognition problems (SRP) in a corpus of speech-based computer tutoring dialogues. This analysis can inform us whether there are places in the discourse structure prone to more SRP. We automatically extract the discourse structure by taking advantage of how the tutoring information is encoded in our system. To quantify the discourse structure, we extract two features for each system turn: depth of the turn in the discourse structure and the type of transition from the previous turn to the current turn. The Chi Square test is used to find significant dependencies. We find several interesting interactions which suggest that the discourse structure can play an important role in several dialogue related tasks: automatic detection of SRP and analyzing spoken dialogues systems with a large state space from limited amounts of available data.
|
|
|
|
|
|
We also investigate the Interaction intuition for one class of user affect: uncertainty. We find that discourse structure can be used to characterize user uncertainty over and above correctness. We find that specific transitions in the discourse structure are associated with an increase or decrease of uncertainty. If we discount for correctness, which is interacts significantly with uncertainty, we find additional interactions. The observed interactions suggest that discourse structure can be an informative feature for automatic prediction of user affect.
[More Details]
| · |
Katherine Forbes-Riley, Mihai Rotaru, Diane J. Litman, Joel Tetreault (2007) “Exploring Affect-Context Dependencies for Adaptive System Development”. In Proceedings of HLT/NAACL 2007 (late-breaking news award).
[abstract]
|
|
| |
We use the Chi Square test to investigate the context dependency of student affect in our computer tutoring dialogues, targeting uncertainty in student answers in 3 automatically monitorable contexts. Our results show significant dependencies between uncertain answers and specific contexts. Identification and analysis of these dependencies is our first step in developing an adaptive version of our dialogue system.
|
|
|
|
|
|
If the Visual intuition is true, then users would prefer and learn better with a system that displays the graphical representation of the discourse structure (the Navigation Map - NM). We run a within-subjects user study focused on user's perception of the system with and without the NM. An analysis of users' ratings indicates that users prefer the NM-enabled version on various dimensions. The NM presence allows users to better identify and follow the tutoring plan and to better integrate the instruction. It was also easier for users to concentrate and to learn from the system if the NM was present.
We are currently running a between-subjects user study that investigates the objective utility of the Navigation Map.
[More Details]
| · |
Mihai Rotaru and Diane J. Litman (2007) “The Utility of a Graphical Representation of Discourse Structure in Spoken Dialogue Systems”. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), Prague, Czech Republic.
[abstract]
|
|
| |
In this paper we explore the utility of the Navigation Map (NM), a graphical representation of the discourse structure. We run a user study to investigate if users perceive the NM as helpful in a tutoring spoken dialogue system. From the users' perspective, our results show that the NM presence allows them to better identify and follow the tutoring plan and to better integrate the instruction. It was also easier for users to concentrate and to learn from the system if the NM was present. Our preliminary analysis on objective metrics further strengthens these findings.
|
|
|
|
|
|
Discourse structure transitions have proved to be useful for the system-side applications presented above. Other projects from our group have taken advantage of this information source:
|
· A project that investigates the utility of affect parameters for PARADISE performance models. The study makes heavy use of the Conditioning intuition and finds among others that transition-affect bigrams are very informative.
|
|
|
Katherine Forbes-Riley, Mihai Rotaru, Diane J. Litman (2008) “The Relative Impact of Student Affect on Performance Models in a Spoken Dialogue Tutoring System”. User Modeling and User Adapted Interaction (UMUAI) Special Issue on Affective Modeling and Adaptation, 18 (1-2).
[abstract]
|
|
|
|
| |
We hypothesize that student affect is a useful predictor of spoken
dialogue system performance, relative to other parameters. We test this hypothesis
in the context of our spoken dialogue tutoring system, where student learning is
the primary performance metric. We first present our tutoring system and corpora,
which have been annotated for student affect, correctness, and discourse structure.
We then discuss unigram and bigram parameters which we derive from these corpora.
The unigram parameters represent each annotation type individually, as well as
system-generic features. The bigram parameters represent annotation combinations,
including student state sequences and student states in the discourse structure
context. We then use these parameters to build learning models. First, we build
simple models based on correlations between each of our parameters and learning.
Our results suggest that student affect parameters are among our most useful predictors
of learning, particularly in specific discourse structure contexts. Next, we build
complex learning models, using the PARADISE framework, in which our parameters
are input to a multivariate linear regression, yielding a model containing only the
most useful subset of parameters. Our approach is a value-added one; we perform
a number of model-building experiments, both with and without including student
affect parameters, and then compare the performance of the models on the training
and the test sets. Our results show that when included as inputs, affect parameters
are selected as predictors in most models, and many of these models show high
generalizability in testing. Our results also show that overall, the affect-included
models significantly outperform the affect-excluded models.
|
|
|
|
· A project that investigate which PARADISE interaction parameters generalize across corpora. The study finds among others that parameters derived from transition-correctness bigrams generalize well and show up in most PARADISE models.
|
|
|
Katherine Forbes-Riley, Diane J. Litman, Amruta Purandare, Mihai Rotaru and Joel Tetreault (2007). “Comparing Linguistic Features for Modeling Learning in Computer Tutoring Dialogues”. In Proceedings of International Conference on Artificial Intelligence in Education (AIED 2007), Marina Del Rey, USA.
[abstract]
|
|
|
|
| |
We compare the relative utility of different automatically computable
linguistic feature sets for modeling student learning in computer dialogue tutoring.
We use the PARADISE framework (multiple linear regression) to build a learning
model from each of 6 linguistic feature sets: 1) surface features, 2) semantic
features, 3) pragmatic features, 4) discourse structure features, 5) local dialogue
context features, and 6) all feature sets combined. We hypothesize that although
more sophisticated linguistic features are harder to obtain, they will yield stronger
learning models. We train and test our models on 3 different train/test dataset pairs
derived from our 3 spoken dialogue tutoring system corpora. Our results show that
more sophisticated linguistic features usually perform better than either a baseline
model containing only pretest score or a model containing only surface features,
and that semantic features generalize better than other linguistic feature sets.
|
|
|
|
· A project that investigates emotion prediction in our system using a variety of sources. The depth in the discourse structure is used as one of the features though the study does not investigate its relative utility.
|
|
|
Hua Ai, Diane Litman, Kate Forbes-Riley, Mihai Rotaru, Joel Tetreault, and Amruta Purandare (2006). “Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs”. In Proceedings of Interspeech 2006, Pittsburgh, USA.
[abstract]
|
|
|
|
| |
In this study, we incorporate automatically obtained system/user performance features into machine learning experiments to detect student emotion in computer tutoring dialogs. Our results show a relative improvement of 2.7% on classification accuracy and 8.08% on Kappa over using standard lexical, prosodic, sequential, and identification features. This level of improvement is comparable to the performance improvement shown in previous studies by applying dialog acts or lexical-/prosodic-/discourse- level contextual features.
|
|
|
|
|