Software Toolkit for
Explorative Pattern mining of genomic data:
I am involved in this project in Dr.Madhavi Ganapathiraju's lab.
It
has been observed that genomic data availability grows faster than the
Moore’s
law.The
Human genome is about 3 Billion
nucleotides long. Currently, there are no tools that are readily
available that
allow flexible & scalable analysis of genome sequences to
discover
previously unknown patterns. For some known patterns, there are tools
that can
locate them, but these do not scale even to a fraction of the genomic
data
sizes; they are restricted in parameters and are not versatile for
explorative
pattern analysis. We have developed a toolkit which uses Suffix arrays
to efficiently pre-process the genomic data. Now various pattern mining
applications have been developed which run in linear time. These
include finding Splice Sites, DNA palindromes, gene family in
chromosomes, methylation prediction etc.
For more details contact Dr. Madhavi Ganapathiraju http://www.dbmi.pitt.edu/madhavi/