Founded in 1966

Towards Accurate and Efficient Classification: A Discriminative and Frequent Pattern-based Approach

Hong Cheng (University of Illinois at Urbana-Champaign)

Thursday February 21, 2008
10:00 a.m. - SENSQ 5317

Refreshments at 9:30 a.m.

Hosted by

Abstract

Classification is an essential theme widely studied in machine learning, statistics, and data mining.  A lot of classification methods have been proposed in literature, most of which assume that the input data is in a feature vector representation.  However, in many applications, it is desirable to construct accurate classification models on complex structural data which has no initial feature vector representation, including transactions, sequences, graphs, semi-structured data, and texts.  A primary question is how to construct a discriminative and compact feature set, on the basis of which, classification could be performed directly.  A concrete example is classifying chemical compounds to various classes (e.g., toxic vs. nontoxic, active vs. inactive).  While simple features such as atoms and links are too simple to preserve the structural information, graph kernels make it hard to interpret the classifiers. 

My goal is to use discriminative frequent patterns to characterize complex structural data and thus enhance the classification power.  Theoretical analysis is provided to justify the discriminative power of frequent patterns.  Two efficient search strategies have also been designed to directly mine the most discriminative patterns.  Based on these results, I developed a framework of discriminative frequent pattern-based classification which could lead to a highly accurate, efficient and interpretable classifier on complex data.  The proposed pattern-based classification has been demonstrated useful in applications such as chemical compound classification, text categorization as well as software engineering.

 

Biography of speaker

Hong Cheng is currently a Ph.D. candidate in the Department of Computer Science, at University of Illinois at Urbana-Champaign.  She got her M. Phil degree from Hong Kong University of Science and Technology in 2003 and B.S. degree from Zhejiang University in 2001, both in Computer Science.  Her research interests include data mining, machine learning and database systems.  She has published over 20 research papers in international conferences, journals and book chapter, including SIGKDD, SDM, VLDB, ICDE, ICDM, ACM Transactions on KDD, and Data Mining and Knowledge Discovery, and received research paper awards at ICDE’07, SIGKDD’06 and SIGKDD’05.

You are using an older browser that does not support current Web standards. Although this site is viewable in all browsers, it will look much better in a browser that supports Web standards.