Founded in 1966

Mining Salient Patterns in High Dimensional Data

Jinze Liu (University of North Carolina)

Monday, February 19th, 2007
10 am - SENSQ 5317

Refreshments at 9:30 a.m

Hosted by

Abstract

The paradigm of data-driven hypothesis generation adopted by many researchers leads to the collection of a large number of attributes for numerous objects using modern high-throughput technologies. In the resultant datasets, the attributes describing the objects are generally viewed as separate dimensions. Clustering techniques is a common routine for grouping objects with similar attributes. However, conventional techniques fail for high-dimensional data, since meaningful similarity may be defined by only some (unknown) subset of attributes.

Subspace clustering techniques have been proposed recently to analyze high-dimensional datasets in bioinformatics, medical informatics, and the social sciences. I will present two new subspace clustering methods. Order-Preserving Clustering (OPC) is a pattern-based subspace clustering algorithm that identifies objects that exhibit consistent tendency across subsets of attributes. OPC has direct applications in the identification of co-regulated gene clusters in gene expression data. The Approximate Frequent Itemset (AFI) model is a noise-tolerant subspace clustering model that can reveal significant but imperfect associations not identified by traditional approaches.

I will discuss the efficiency and the effectiveness of both approaches measured on synthetic and real biological datasets. The biological relevance of the clusters is demonstrated by significant associations between genes in a subspace cluster and specific function categories of existing biological classifications. I will conclude my talk with my future research plans in the areas of data mining, statistical genetics and computational systems biology.

You are using an older browser that does not support current Web standards. Although this site is viewable in all browsers, it will look much better in a browser that supports Web standards.