Semi-Supervised Approaches to Multilingual Processing
In today's information-driven world, the ability to process data expressed in non-English languages has become increasingly important; therefore, advancement in multilingual processing technologies is of great practical values. In order to be able to process a wide variety of input, multilingual processing systems are typically trained using machine learning techniques rather than hand-crafted. Moreover, because the systems may not have access to many multilingual resources, the learning is often unsupervised rather than supervised; i.e., the system must learn without outside help. Although this approach has the benefit that it requires minimal human intervention, its cost is that the system may not learn as well. This project explores applying semi-supervised learning to improve multilingual processing, taking advantages of both the information of annotated data and the abundance of unannotated data.
One focus of this project is on applying cross-language projection to rapidly build up a Chinese parser. Another is on exploring ways to improve cross-language projection through better word alignments.