My research investigates the challenges and opportunities that arise from building spoken dialogue systems in complex domains like the tutoring domain. Currently, my work focuses on Applications of Discourse Structure for Spoken Dialogue Systems in complex domain.

Most of my research has been performed on the ITSPOKE speech-based computer tutor. Nonetheless, my work can be easily replicated in other dialogue systems from different domains.

Here is a list of projects I am currently or was involved in:

Applications of Discourse Structure for Spoken Dialogue Systems (dissertation)

Dialogues (human-human or human-computer) have an inherent structure called the discourse structure. However, due to the relatively simple structure of dialogues in previous spoken dialogue systems, discourse structure has seen limited applications in these systems. In contrast, dialogues in complex domains like tutoring exhibit a richer discourse structure which enables new applications of this concept.

I am pursuing two types of applications of discourse structure:

· System side applications - my work investigates if the discourse structure information is useful for various spoken dialogue system tasks: performance analysis, characterization of user affect and characterization of speech recognition problems.

· User side applications - my work investigates whether the discourse structure information is useful for users through a graphical representation of the discourse structure (the Navigation Map).

[More Details]
[Publications]


Interactions between Dialogue Phenomena

Designing a spoken dialogue system involves many non-trivial decisions. This is in part due to a variety of dialogue phenomena that occur during a dialogue. Being able to detect and handle such phenomena has a big impact on the success of a dialogue system. For example, detecting and handling speech recognition problems is crucial for a dialogue system.

Instead of looking at dialogue phenomena in isolation, my research attempts to understand the inherent interactions that exist between these phenomena. For example, a string of speech recognition problems is likely to result in a frustrated user. Several phenomena are being investigated: speech recognition problems, user affect (e.g. certainty, frustration), user state (e.g. correctness), discourse transitions (e.g. crossing a discourse segment boundary). An empirical approach is being used: statistical dependencies between dialogue phenomena are mined from a corpus of dialogues. Analyses of these dependencies offer additional insights about the dialogue phenomenon and suggests new handling strategies.

[More Details]
[Publications]


Emotion prediction

Detecting and reacting to user emotions is considered to be an important direction for improving spoken dialogue systems in domains like call centers and intelligent tutoring systems. Previous work on emotion prediction uses features derived from a variety of sources: prosody, acoustic information, lexical information, and meta-dialogue information. An important factor when computing these features is deciding the level of granularity within the turn to extract them from.

While the majority of previous work computes features at the turn level, my work explores the benefit of sub-turn level features (e.g. word-level features). The intuition behind using sub-turn features is that they offer a better approximation of the acoustic-prosodic profile and that emotion might not be expressed over the entire turn. Using sub-turn level features is not straightforward as there is a mismatch between the label level (i.e. turn level) and the feature level. We explore various ways of solving this issue and show that sub-turn level features are more informative than turn-level features for emotion prediction.

[Publications]


Other projects

Here is a list of older research projects and several class-related projects.