|
|
My research investigates the challenges and opportunities that arise from building spoken dialogue systems in complex domains like the tutoring domain. Currently, my work focuses on Applications of Discourse Structure for Spoken Dialogue Systems in complex domain.
Most of my research has been performed on the ITSPOKE speech-based computer tutor. Nonetheless, my work can be easily replicated in other dialogue systems from different domains.
Here is a list of projects I am currently or was involved in:
|
|
Dialogues (human-human or human-computer) have an inherent structure called the discourse structure. However, due to the relatively simple structure of dialogues in previous spoken dialogue systems, discourse structure has seen limited applications in these systems. In contrast, dialogues in complex domains like tutoring exhibit a richer discourse structure which enables new applications of this concept.
I am pursuing two types of applications of discourse structure:
|
· System side applications - my work investigates if the discourse structure information is useful for various spoken dialogue system tasks: performance analysis, characterization of user affect and characterization of speech recognition problems. |
|
· User side applications - my work investigates whether the discourse structure information is useful for users through a graphical representation of the discourse structure (the Navigation Map). |
[More Details]
[Publications]
Core papers:
| · |
Mihai Rotaru and Diane J. Litman (2006) “Exploiting Discourse Structure for Spoken Dialogue Performance Analysis”. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Sydney, Australia.
[abstract]
|
|
| |
In this paper we study the utility of discourse structure for spoken dialogue performance modeling. We experiment with various ways of exploiting the discourse structure: in isolation, as context information for other factors (correctness and certainty) and through trajectories in the discourse structure hierarchy. Our correlation and PARADISE results show that, while the discourse structure is not useful in isolation, using the discourse structure as context information for other factors or via trajectories produces highly predictive parameters for performance analysis.
|
|
| · |
Mihai Rotaru and Diane J. Litman (2006) “Discourse Structure and Speech Recognition Problems”. In Proceedings of Interspeech 2006, Pittsburgh, USA (People's Choice Best Paper Award).
[abstract]
|
|
| |
We study dependencies between discourse structure and speech recognition problems (SRP) in a corpus of speech-based computer tutoring dialogues. This analysis can inform us whether there are places in the discourse structure prone to more SRP. We automatically extract the discourse structure by taking advantage of how the tutoring information is encoded in our system. To quantify the discourse structure, we extract two features for each system turn: depth of the turn in the discourse structure and the type of transition from the previous turn to the current turn. The Chi Square test is used to find significant dependencies. We find several interesting interactions which suggest that the discourse structure can play an important role in several dialogue related tasks: automatic detection of SRP and analyzing spoken dialogues systems with a large state space from limited amounts of available data.
|
|
| · |
Mihai Rotaru and Diane J. Litman (2007) “The Utility of a Graphical Representation of Discourse Structure in Spoken Dialogue Systems”. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), Prague, Czech Republic.
[abstract]
|
|
| |
In this paper we explore the utility of the Navigation Map (NM), a graphical representation of the discourse structure. We run a user study to investigate if users perceive the NM as helpful in a tutoring spoken dialogue system. From the users' perspective, our results show that the NM presence allows them to better identify and follow the tutoring plan and to better integrate the instruction. It was also easier for users to concentrate and to learn from the system if the NM was present. Our preliminary analysis on objective metrics further strengthens these findings.
|
|
| · |
Mihai Rotaru (2007) “Applications of Discourse Structure for Spoken Dialogue Systems”. Ph.D. Thesis Proposal, Computer Science Department, University of Pittsburgh.
[abstract]
|
|
| |
Due to the relatively simple structure of dialogues in previous spoken dialogue systems, discourse structure has seen limited applications in these systems. We investigate the utility of discourse structure for spoken dialogue systems in complex domains (e.g. tutoring). Two types of applications are being pursued: on the system side and on the user side. On the system side, we investigate if the discourse structure information is useful for various spoken dialogue system tasks: performance analysis, characterization of user affect and characterization of speech recognition problems. On the user side, we investigate whether the discourse structure information is useful for users of a spoken dialogue system through a graphical representation of the discourse structure.
|
|
| · |
Katherine Forbes-Riley, Mihai Rotaru, Diane J. Litman, Joel Tetreault (2007) “Exploring Affect-Context Dependencies for Adaptive System Development”. In Proceedings of HLT/NAACL 2007 (late-breaking news award).
[abstract]
|
|
| |
We use the Chi Square test to investigate the context dependency of student affect in our computer tutoring dialogues, targeting uncertainty in student answers in 3 automatically monitorable contexts. Our results show significant dependencies between uncertain answers and specific contexts. Identification and analysis of these dependencies is our first step in developing an adaptive version of our dialogue system.
|
|
Satellite papers:
| · |
Katherine Forbes-Riley, Mihai Rotaru, Diane J. Litman (2008) “The Relative Impact of Student Affect on Performance Models in a Spoken Dialogue Tutoring System”. User Modeling and User Adapted Interaction (UMUAI) Special Issue on Affective Modeling and Adaptation, 18 (1-2).
[abstract]
|
|
| |
We hypothesize that student affect is a useful predictor of spoken
dialogue system performance, relative to other parameters. We test this hypothesis
in the context of our spoken dialogue tutoring system, where student learning is
the primary performance metric. We first present our tutoring system and corpora,
which have been annotated for student affect, correctness, and discourse structure.
We then discuss unigram and bigram parameters which we derive from these corpora.
The unigram parameters represent each annotation type individually, as well as
system-generic features. The bigram parameters represent annotation combinations,
including student state sequences and student states in the discourse structure
context. We then use these parameters to build learning models. First, we build
simple models based on correlations between each of our parameters and learning.
Our results suggest that student affect parameters are among our most useful predictors
of learning, particularly in specific discourse structure contexts. Next, we build
complex learning models, using the PARADISE framework, in which our parameters
are input to a multivariate linear regression, yielding a model containing only the
most useful subset of parameters. Our approach is a value-added one; we perform
a number of model-building experiments, both with and without including student
affect parameters, and then compare the performance of the models on the training
and the test sets. Our results show that when included as inputs, affect parameters
are selected as predictors in most models, and many of these models show high
generalizability in testing. Our results also show that overall, the affect-included
models significantly outperform the affect-excluded models.
|
|
| · |
Katherine Forbes-Riley, Diane J. Litman, Amruta Purandare, Mihai Rotaru and Joel Tetreault (2007). “Comparing Linguistic Features for Modeling Learning in Computer Tutoring Dialogues”. In Proceedings of International Conference on Artificial Intelligence in Education (AIED 2007), Marina Del Rey, USA.
[abstract]
|
|
| |
We compare the relative utility of different automatically computable
linguistic feature sets for modeling student learning in computer dialogue tutoring.
We use the PARADISE framework (multiple linear regression) to build a learning
model from each of 6 linguistic feature sets: 1) surface features, 2) semantic
features, 3) pragmatic features, 4) discourse structure features, 5) local dialogue
context features, and 6) all feature sets combined. We hypothesize that although
more sophisticated linguistic features are harder to obtain, they will yield stronger
learning models. We train and test our models on 3 different train/test dataset pairs
derived from our 3 spoken dialogue tutoring system corpora. Our results show that
more sophisticated linguistic features usually perform better than either a baseline
model containing only pretest score or a model containing only surface features,
and that semantic features generalize better than other linguistic feature sets.
|
|
| · |
Hua Ai, Diane Litman, Kate Forbes-Riley, Mihai Rotaru, Joel Tetreault, and Amruta Purandare (2006). “Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs”. In Proceedings of Interspeech 2006, Pittsburgh, USA.
[abstract]
|
|
| |
In this study, we incorporate automatically obtained system/user performance features into machine learning experiments to detect student emotion in computer tutoring dialogs. Our results show a relative improvement of 2.7% on classification accuracy and 8.08% on Kappa over using standard lexical, prosodic, sequential, and identification features. This level of improvement is comparable to the performance improvement shown in previous studies by applying dialog acts or lexical-/prosodic-/discourse- level contextual features.
|
|
|
|
|
|
Designing a spoken dialogue system involves many non-trivial decisions. This is in part due to a variety of dialogue phenomena that occur during a dialogue. Being able to detect and handle such phenomena has a big impact on the success of a dialogue system. For example, detecting and handling speech recognition problems is crucial for a dialogue system.
Instead of looking at dialogue phenomena in isolation, my research attempts to understand the inherent interactions that exist between these phenomena. For example, a string of speech recognition problems is likely to result in a frustrated user. Several phenomena are being investigated: speech recognition problems, user affect (e.g. certainty, frustration), user state (e.g. correctness), discourse transitions (e.g. crossing a discourse segment boundary). An empirical approach is being used: statistical dependencies between dialogue phenomena are mined from a corpus of dialogues. Analyses of these dependencies offer additional insights about the dialogue phenomenon and suggests new handling strategies.
[More Details]
[Publications]
| · |
Mihai Rotaru and Diane J. Litman (2006) “Dependencies between Student State and Speech Recognition Problems in Spoken Tutoring Dialogues”. In Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (Coling/ACL), Sydney, Australia.
[abstract]
|
|
| |
Speech recognition problems are a reality in current spoken dialogue systems. In order to better understand these phenomena, we study dependencies between speech recognition problems and several higher level dialogue factors that define our notion of student state: frustration/anger, certainty and correctness. We apply Chi Square analysis to a corpus of speech-based computer tutoring dialogues to discover these dependencies both within and across turns. Significant dependencies are combined to produce interesting insights regarding speech recognition problems and to propose new strategies for handling these problems. We also find that tutoring, as a new domain for speech applications, exhibits interesting tradeoffs and new factors to consider for spoken dialogue design.
|
|
| · |
Mihai Rotaru and Diane J. Litman (2006) “Discourse Structure and Speech Recognition Problems”. In Proceedings of Interspeech 2006, Pittsburgh, USA.
[abstract]
|
|
| |
We study dependencies between discourse structure and speech recognition problems (SRP) in a corpus of speech-based computer tutoring dialogues. This analysis can inform us whether there are places in the discourse structure prone to more SRP. We automatically extract the discourse structure by taking advantage of how the tutoring information is encoded in our system. To quantify the discourse structure, we extract two features for each system turn: depth of the turn in the discourse structure and the type of transition from the previous turn to the current turn. The Chi Square test is used to find significant dependencies. We find several interesting interactions which suggest that the discourse structure can play an important role in several dialogue related tasks: automatic detection of SRP and analyzing spoken dialogues systems with a large state space from limited amounts of available data.
|
|
| · |
Mihai Rotaru, Diane J. Litman, and Katherine Forbes-Riley (2005) “Interactions between Speech Recognition Problems and User Emotions”. In Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech-2005/Eurospeech), Lisbon, Portugal.
[abstract]
|
|
| |
Understanding how speech recognition problems affect the interaction with the user is a topic of great interest for the spoken dialogue community. In this paper, we examine the dependencies between speech recognition problems in adjacent turns. We also examine the dependencies between speech recognition problems and student emotions within a turn and in adjacent turns. We apply Chi Square analysis to a corpus of speech-based computer tutoring dialogues to discover these dependencies. We find that rejections are followed by more rejections than expected if there was no dependency between rejections, and that misrecognitions are followed by more misrecognitions than expected. We also find a strong dependency between recognition problems in the previous turn and user emotion in the current turn: after a system rejection there are more emotional user turns than expected. Surprisingly, in our data, we find no relationship between user emotions and recognition problems within a turn nor between previous turn user emotions and current turn recognition problems.
|
|
| · |
Katherine Forbes-Riley, Mihai Rotaru, Diane J. Litman, Joel Tetreault (2007) “Exploring Affect-Context Dependencies for Adaptive System Development”. In Proceedings of HLT/NAACL 2007 (late-breaking news award).
[abstract]
|
|
| |
We use the Chi Square test to investigate the context dependency of student affect in our computer tutoring dialogues, targeting uncertainty in student answers in 3 automatically monitorable contexts. Our results show significant dependencies between uncertain answers and specific contexts. Identification and analysis of these dependencies is our first step in developing an adaptive version of our dialogue system.
|
|
| · |
Mihai Rotaru (2007) “Applications of Discourse Structure for Spoken Dialogue Systems”. Ph.D. Thesis Proposal, Computer Science Department, University of Pittsburgh.
[abstract]
|
|
| |
Due to the relatively simple structure of dialogues in previous spoken dialogue systems, discourse structure has seen limited applications in these systems. We investigate the utility of discourse structure for spoken dialogue systems in complex domains (e.g. tutoring). Two types of applications are being pursued: on the system side and on the user side. On the system side, we investigate if the discourse structure information is useful for various spoken dialogue system tasks: performance analysis, characterization of user affect and characterization of speech recognition problems. On the user side, we investigate whether the discourse structure information is useful for users of a spoken dialogue system through a graphical representation of the discourse structure.
|
|
|
|
|
|
Detecting and reacting to user emotions is considered to be an important direction for improving spoken dialogue systems in domains like call centers and intelligent tutoring systems. Previous work on emotion prediction uses features derived from a variety of sources: prosody, acoustic information, lexical information, and meta-dialogue information. An important factor when computing these features is deciding the level of granularity within the turn to extract them from.
While the majority of previous work computes features at the turn level, my work explores the benefit of sub-turn level features (e.g. word-level features). The intuition behind using sub-turn features is that they offer a better approximation of the acoustic-prosodic profile and that emotion might not be expressed over the entire turn. Using sub-turn level features is not straightforward as there is a mismatch between the label level (i.e. turn level) and the feature level. We explore various ways of solving this issue and show that sub-turn level features are more informative than turn-level features for emotion prediction.
[Publications]
| · |
Greg Nicholas, Mihai Rotaru, Diane J. Litman (2008) “An Investigation of Using Word-level Features for Emotion Prediction”. Submitted to Speech Communication.
|
| · |
Greg Nicholas, Mihai Rotaru, and Diane J. Litman (2006) “Exploiting Word-level Features for Emotion Prediction”. In Proceedings of IEEE/ACL Workshop on Spoken Language Technology (SLT). Aruba.
[abstract]
|
|
| |
In this paper we study two techniques for combining word-level features for emotion prediction. Prior research has primarily focused on the use of turn-level features as predictors. Recently, the utility of word-level features has been highlighted but only tested on relatively small human-computer corpora. We extend over previous work by investigating the strengths and weaknesses of two different techniques for using word-level features and by using a larger corpus of human-computer dialogue. Our results confirm that the word-level pitch features fare better than the turn-level ones regardless of the combination technique. In addition, we find that each word combination technique has different strengths and weaknesses in terms of precision and recall.
|
|
| · |
Mihai Rotaru and Diane J. Litman (2005) “Using Word-level Pitch Features to Better Predict Student Emotions during Spoken Tutoring Dialogues”. In Proceedings of the 9th European Conference on Speech Communication and Technology (Interspeech-2005/Eurospeech), Lisbon, Portugal.
[abstract]
|
|
| |
In this paper, we advocate for the usage of word-level pitch features for detecting user emotional states during spoken tutoring dialogues. Prior research has primarily focused on the use of turn-level features as predictors. We compute pitch features at the word level and resolve the problem of combining multiple features per turn using a word-level emotion model. Even under a very simple word-level emotion model, our results show an improvement in prediction using word-level features over using turn-level features. We find that the advantage of word-level features lies in a better prediction of longer turns.
|
|
|
|
|
|
Here is a list of older research projects and several class-related projects.
|
|