|
|
Here is a list of older research projects and several class-related projects.
|
|
We empirically evaluate whether combining the outputs of seven reading comprehension QA systems submitted as the final projects for a graduate level class can improve over the performance of any individual system. Our results, replicated using two different publicly available reading test corpora, demonstrate the utility of system combination via majority voting in our restricted domain question answering task.
[Publications]
| · |
Mihai Rotaru and Diane J. Litman (2005) “Improving Question Answering for Reading Comprehension Tests by Combining Multiple Systems”. In Proceedings of the American Association for Artificial Intelligence (AAAI) 2005 Workshop on Question Answering in Restricted Domains, Pittsburgh, PA.
[abstract]
|
|
| |
Most work on reading comprehension question answering
systems has focused on improving performance by adding
complex natural language processing (NLP) components to
such systems rather than by combining the output of
multiple systems. Our paper empirically evaluates whether
combining the outputs of seven such systems submitted as
the final projects for a graduate level class can improve over
the performance of any individual system. We present
several analyses of our combination experiments, including
performance bounds, impact of both tie-breaking methods
and ensemble size on performance, and an error analysis.
Our results, replicated using two different publicly available
reading test corpora, demonstrate the utility of system
combination via majority voting in our restricted domain
question answering task.
|
|
|
|
|
|
Previous work has shown that when machine learning is applied to many natural language processing tasks, exceptional training examples play an important role in improving generalization accuracy. We are exploring whether such results generalize to spoken dialogue, and how different formalizations of "exceptionality" impact the performance of memory-based and rule-based learning algorithms.
[Publications]
| · |
Mihai Rotaru and Diane J. Litman (2003) “Exceptionality and Natural Language Learning”. In Proceedings of the Conference on Computational Natural Language Learning (CoNNL) 2003, Edmonton, Canada.
[abstract]
|
|
| |
Previous work has argued that memory-based learning is better than abstraction-based learning for a set of language learning tasks. In this paper, we first attempt to generalize these results to a new set of language learning tasks from the area of spoken dialog systems and to a different abstraction-based learner. We then examine the utility of various exceptionality measures for predicting where one learner is better than the other. Our results show that generalization of previous results to our tasks is not so obvious and some of the exceptionality measures may be used to characterize the performance of our learners.
|
|
|
|
|
|
We propose a graphical representation of the query history for users exploring a complex data space (e.g. real estate domain) in a multimodal dialogue system. The graphical representation enhances exploration by providing a structured view of the history and by predicting items of interest for the user. It can be used to guide tasks like query relaxation, summarization and query refinement and for information push.
This is my 2004 internship project at IBM T.J. Watson research center. For more details, contact my internship mentor, Shimei Pan.
|
|
|
We automatically extract user preferences from user's interaction with a multimodal dialogue system in the real estate domain. We build a user preference model which is used to guide query relaxation in the system: whenever the user's query returns no items, a set of relevant items are proposed based on the preference model.
This is my 2003 internship project at IBM T.J. Watson research center. For more details, contact my internship mentor, Shimei Pan.
|
|
|
This project compares the performance of 3 methods for finding representative components in a dataset: Principal Component Analysis (PCA), Probabilistic PCA (PPCA) and Probabilistic HITS (PHITS). Two type of data sets are used: a link data set (copublication data set from Auton Lab) and a microarray dataset (DNA microarray containing genes of Saccharomyces cerevisiae).
Class: Advanced Topics in Machine Learning
Project report
|
|
|
This project explores an alternative compression technique for sound files containing human speech only (e.g. an interpretation of a text by a voice talent). The technique saves only the speech transcription (obtained manually) and the prosodic information (obtained automatically by running the Sphinx2 recognizer in forced alignment mode). The sound file is reconstructed from this information using a speech synthesized trained on the same voice. Results show a very good compression rate (1% of the MP3 version of the original file), a relatively good quality of the reconstructed sound file and good transfer to synthesizers built from other voices.
Class: Speech II - Phonetics, Prosody, Perception, Synthesis
Project report, Presentation
|
|