Software Toolkit for Explorative Pattern mining of genomic data:

I am involved in this project in Dr.Madhavi Ganapathiraju's lab. It has been observed that genomic data availability grows faster than the Moore’s law.The Human genome is about 3 Billion nucleotides long. Currently, there are no tools that are readily available that allow flexible & scalable analysis of genome sequences to discover previously unknown patterns. For some known patterns, there are tools that can locate them, but these do not scale even to a fraction of the genomic data sizes; they are restricted in parameters and are not versatile for explorative pattern analysis. We have developed a toolkit which uses Suffix arrays to efficiently pre-process the genomic data. Now various pattern mining applications have been developed which run in linear time. These include finding Splice Sites, DNA palindromes, gene family in chromosomes, methylation prediction etc.

For more details contact Dr. Madhavi Ganapathiraju http://www.dbmi.pitt.edu/madhavi/