Graphs to Diversity: Extracting Genomic Variation from Sequence Graphs

Recent advances in genome sequencing technologies have enabled the sequencing of bacteria directly from the environment, providing a broader outlook on the diversity of bacteria than ever before possible. Recent studies of environmental samples have revealed complex communities containing many previously unknown species, and uncovered a large amount of genetic variation and diversity even among closely related strains. Characterizing this genomic variation is critical in studies of microbial ecology and evolution, yet currently available computational tools, originally developed for the study of single organisms, are ill-suited for this task.

This proposal aims to develop the theoretical and computational infrastructure for the study of genomic variation within mixtures of organisms. The proposed research relies on both theoretical and empirical analyses of the structure of genome assembly graphs in order to characterize graph signatures that are correlated with intra- and inter- species polymorphisms. A particular focus is placed on understanding and using the information provided by next generation sequencing technologies as well as other high-throughput experimental techniques. The proposed work provides critical analysis tools to help biologists explore the genetic variation within the environment.

Additional information about this project is available at

Principal Investigators