Joint CMU-Pitt Ph.D. Program in Computational BiologyRobert F. Murphy and Ivet Bahar, Directors | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Seminar Series Abstracts
Fridays at 11am.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| August 31, 2007 @ BST3 | Nir Friedman | Hebrew University | homepage | |
Natural history and evolutionary principles of gene duplication in fungiGene duplication and loss is a powerful source of functional innovation. However, the general principles that govern this process are still largely unknown. With the growing number of sequenced genomes, it is now possible to examine these events in a comprehensive and unbiased manner. Here, we develop a novel procedure that resolves the evolutionary history of all genes in a large group of species. We apply our procedure to seventeen fungal genomes to create a genome-wide catalog of gene trees that determine precise orthology and paralogy relations across these species. We show that gene duplication and loss is highly constrained by the functional properties and interacting partners of genes. In particular, stress-related genes exhibit many duplications and losses while growth-related genes show selection against such changes. This dichotomy is relaxed following whole-genome duplication. Duplicated genes rarely diverge with respect to biochemical function, but typically diverge with respect to regulatory control. Surprisingly, paralogous modules of genes rarely arise, even following whole-genome duplication. Rather, gene duplication drives the modularization of functional networks through specialization, thereby disentangling cellular systems.This is joint work with Ilan Wapiski, Avi Pfeffer, and Aviv Regev. | ||||
| September 7, 2007 @ MI | Shlomo Ta'asan | Carnegie Mellon University | homepage | |
Mathematical modeling appraoches for systems biologyThis talk will focus on modeling and simulation challenges in systems biology. Features that need to be addressed are high variability, incomplete knowledge, structural hierarchy and multiple scales. I will focus mainly on two approaches that answer some of the difficulties. The first is a flexible modeling platform that allows for easy construction of hierarchical complex models from genes to organs. It can be used with differential equations, stochastic differential equations and logical networks. The second approach is a black box approach that relies on data alone to build dynamical models that best fit the data. The resulting models are linear and describe the response of the system to perturbations. This approach can be applied directly to real data including microarrays, multiplex (protein) data, flow Cytometry and physiological data all within the same model. | ||||
| September 21, 2007 @ BST3 | James Faeder | University of Pittsburgh | ||
Rule-based modeling of signal-transduction systemsBinding interactions among proteins and other biomolecules occur at the level of domains and motifs and often follow phosphorylation or other posttranslational modifications. The catalog of these functional elements and their interactions is continually growing and poses a major barrier to the development of predictive mathematical models because of combinatorial complexity, the explosion in the number of possible chemical species and reactions that can occur in such networks. The BioNetGen (BNG) language uses graphs to represent proteins and other biomolecules, with nodes representing functional subunits of these molecules and edges representing binding interactions. Graph rewriting rules describe biochemical transformations, such the formation or dissociation of bonds or state changes. This language enables the construction of precise and comprehensive models and greatly expands the scope of information that can be incorporated into models of cellular networks. BNG also incorporates a wide range of analytical and simulation tools, and networks generated by BNG can be exported in the Systems Biology Markup Language and other formats allowing interoperability with other modeling platforms. Standard methods for simulating reaction networks, such as ODE's and the Gillespie algorithm, are often not adequate to simulate networks arising from the rule-based description of realistic signaling cascades, requiring the development of simulation algorithms that avoid explicit generation of the reaction network. I will describe recent progress in the development of such algorithms and also present several applications to the modeling of signal transduction networks, including immune and growth factor receptors. | ||||
| September 28, 2007 @ BST3 | Adrian Elcock | University of Iowa | homepage |
Molecular simulations of bacterial cytoplasmThis talk will outline current efforts in our laboratory to develop realistic molecular-level simulations of events occuring in the E. coli cytoplasm. A brief (accessible) introduction to our computational methodology will be given, and the advantages and disadvantages of the approach will be highlighted. Simulations will then be presented showing: (a) diffusion and interaction in 1000-molecule models of the cytoplasmic environment, (b) synthesis of nascent protein chains in a molecular model of a polyribosome, and (c) preliminary work addressing the capture of unfolded proteins by the GroEL chaperonin. Finally, the prospects for combining these models into a more complete model of life inside a prokaryotic cell will be briefly discussed. |
| October 5, 2007 @ MI | Dmitri Chklovski | HHMI - Janelia Farms | homepage | |
High-throughput reconstruction of brain circuits: how machine vision will revolutionize neuroscienceHow does electrical activity in neuronal circuits give rise to intelligent behavior? We believe that this question is impossible to answer without a comprehensive description of neurons and synaptic connections between them. Absence of such a description, often called a wiring diagram, has been holding back the development of neuroscience. We believe that recent technological advances in high-resolution imaging and machine vision will make possible the reconstruction of whole wiring diagrams of simpler organisms or significant parts of more complex systems, such as the mammalian neocortex. Such reconstructions promise to revolutionize neuroscience just like human genome sequencing revolutionized molecular biology. | ||||
| October 12, 2007 @ MI | Ming Li | University of Waterloo | homepage | |
Modern homology searchHomology search, finding similar parts between two sequences, is the most fundamental and popular task in bioinformatics.Traditional homology search technology is a heuristic science. The search is either too slow or too insensitive. When it does return something, the results are simply some non-specific fragments of alignments. We introduce new ideas, including a new mathematical theory of optimized spaced seeds, that allow modern homology search achieve high sensitivity, high specificy, and high speed simultaneously. This methodology is now implemented in most modern homology search software serving thousands of queries daily. Joint work with Bin Ma, John Tromp, X.F. Cui, B. Brejova, T. Vinar, D. Shasha | October 19, 2007 @ MI | Bud Mishra | New York University | homepage |
SMASH: Single Molecule Approach to Sequencing by HybridizationSMASH is a technology for sequencing a human size genome of 6 Gigabases (including both haplotypes) without using any prior sequence information. We have aimed the technology for eventually (e.g., in less than a decade) achieving a competitively low cost for each genome sequence produced (e.g. US$1000 or less), while assuring a high quality (e.g., standard of "high quality draft sequence" similar to the mouse genome sequence published in December 2002). This technology is hoped to play a significant disruptive role in the future predictive personalized biomedicine as well as other areas of biotech industries.These goals require successful integration of three different component technologies: (1) Optical Mapping to create Ordered Restriction Maps with respect to an enzyme, (2) Hybridization of a pool of oligonucleotide probes (LNA probes) with Single Genomic dsDNAs, and (3) Algorithms to solve "localized versions" of PSBH (Positional Sequencing by Hybridization) problems over the whole genome. Unlike many of its competitors, the technology works with small amount of genomic materials, operates top-down, employs a Bayesian algorithm to create haplotypic sequence assembly without an auxiliary shotgun assembler, tolerates noise in the data well and is cost-effective at multiple scales. By construction, it avoids errors due to hompolymeric runs, haplotypic ambiguities and large-scale rearrangement errors. Its scientific feasibility has been demonstrated through many important algorithmic, chemical, and mathematical innovations over the last two years, further reassuring the soundness of the principles, science, and strategy for technology development. | ||||
| October 26, 2007 @ MI | Uwe Ohler | Duke University | homepage | |
Motifs and patterns: Using sequence and image data to understand gene regulation in eukaryotesWe develop and use computational approaches to understand the biology of gene regulation in eukaryotic organisms using different large-scale datasets. Much of our current work is focused on comparative DNA and RNA sequence analysis. Here, we develop and apply probabilistic models of coding and non-coding eukaryotic genes and their regulatory regions and elements. In this talk, I will concentrate on our efforts to understand the core transcriptional machinery in Drosophila, and in particular on the analysis of alternative transcription start sites and a new motif finder to identify regulatory elements with spatial preferences. Comparative genomics has received a lot of attention with respect to its use to elucidate transcriptional elements, and I will present a framework to simulate non-coding sequence evolution under different constraints, which allows us to study the turnover of functional sites and assess the performance of multiple sequence aligners to delineate functional elements.As second topic, I will address our efforts to extract and compare spatiotemporal expression data from high-resolution microscopy. In contrast to microarrays, which usually provide data for many genes at one time point, image data typically provide us with expression information for only one gene, but with the advantage of high spatial and/or temporal resolution, and can in addition often be obtained in vivo. We have developed a prototype for the automatic analysis of microscopy gene expression data, so far focusing on 2D image data sets from two different model organisms (Arabidopsis and Drosophila). Image analysis approaches will be useful to lay the groundwork to reconstruct regulatory networks on the level of tissues or even individual cells. | ||||
| November 2, 2007 @ MI | Olga Troyanskaya | Princeton University | homepage | |
Combining genomic data, computation, and experiments to discover novel biologyUnderstanding of gene function and regulation on a whole-genome scale is the key challenge for systems biology. Discovery and wide adoption of functional genomics technologies in the past decades promised a rapid means to address this challenge and has fueled development on numerous computational methods to deal with the resulting data. However, functional understanding of the proteome still lags behind experimental data generation. My group addresses this data-knowledge disconnect through integrated analysis of diverse functional genomic data, including gene expression microarrays and physical interaction studies, and through close integration of computation and experiments in an iterative framework. I will present our recent developments both in integrated analysis of diverse data (for yeast and laboratory mouse) and our iterative computational-experimental framework and its successful application to discover novel biology of the mitochondria. | November 9, 2007 @MI | Jeffrey Skolnick | Georgia Institute of Technology | homepage |
Prediction of protein structure and function on a proteomic scaleA novel method for the prediction of protein structure and function based on the sequence-to-structure-to-function paradigm has been developed. We first show recent results that suggest that for compact single domain proteins, the PDB is most likely complete and that the completeness can be explained by the packing of compact, hydrogen bonded, secondary structural elements. We next summarize the results from TASSER-lite, the fast version of the structure prediction algorithm TASSER as applied to comparative modeling. Then we summarize our recent performance in CASP7. Next, we next present results from the application of our structure prediction algorithm, TASSER to all GPCRs in the human genome. Based on confidence criteria, 90% should have approximately correct structures, and clustering shows that structurally similar GPCRs have similar function even when their sequences are diverse. Finally, we describe our multimeric structure prediction algorithm, m-TASSER, and its application to the prediction of protein-protein interactions. | ||||
| November 16, 2007 @ BST3 | Gary An | Northwestern University | homepage | |
Dynamic Knowledge Representation using Agent Based Modeling: A Multi-scale Modeling Architecture for Acute InflammationThe hierarchical structure of biological systems is well recognized. Information is generated by research endeavors at multiple scales and hierarchies of organization: gene => protein/enzyme => cell => tissue => organ => organism. The existence of these hierarchies presents significant challenges for the translation of mechanistic research results from one organizational level to another. Furthermore, the research community itself remains relatively compartmentalized, leading to barriers to communication and adding an additional challenge to the synthesis of basic science data into a unified whole. Therefore there is a general need within the biomedical research community to be able to dynamically represent the state of its knowledge. Agent Based Modeling (ABM) is a computational modeling technique that is well suited for synthetic dynamic knowledge representation via aggregated modular multi-scale models. Heterogeneous individual agent behavior is aggregated into population behavior that mirrors the behavior of the higher-hierarchical system as a whole, thus performing the trans-hierarchical function desired in an integrative framework. This talk will present a series of ABMs of acute inflammatory processes developed at multiple levels of resolution, extending from intracellular signaling leading up to simulated organ function and organ-organ interactions. Each of these scales of models matches a particular level of biomedical research, and these models can be viewed as aids in knowledge representation that can facilitate the translation of biomedical knowledge both "vertically" across these scales and "horizontally" across the research community. | ||||
| November 30, 2007 | Tom Mitchell | Carnegie Mellon University | homepage | |
Brains, Meaning and Corpus StatisticsHow does the human brain represent meanings of words and pictures in terms of the neural activity observable through fMRI brain imaging? Recent brain imaging studies have proven that different spatial patterns of fMRI neural activation are associated with thinking about particular semantic categories of words and pictures (e.g., tools, buildings, animals). As a next step we seek a general theory capable of predicting the neural activity associated with arbitrary English words, including words for which we do not yet have brain image data. This talk will present a first such predictive theory, in the form of a computational model trained using a combination of co-occurrence statistics from a trillion-word text corpus, and observed fMRI data associated with viewing several dozen concrete nouns. Once trained, the model predicts fMRI activation for any other concrete noun appearing in the tera-word text corpus, with highly significant accuracies over the 60 nouns for which we currently have fMRI data. | ||||
| December 7, 2007 | Gregory Voth | University of Utah | homepage | |
The Multiscale Challenge for Biomolecular Systems: A Systematic ApproachA multiscale theoretical and computational methodology will be presented for characterizing biomolecular systems and assemblies across multiple length- and time-scales. The approach provides a connection between atomistic molecular dynamics, reduced mesoscopic models, and near continuum-scale mechanics. At the heart of the methodology is a new and systematic multiscale coarse-graining theory for linking the atomistic-scale interactions to the mesoscale and beyond. Applications of the overall approach will be given for membranes, peptides, and proteins. | ||||
| January 18, 2008 | Nancy Zhang | Stanford University | homepage | |
A multisample change-point model for DNA copy number analysisThe DNA copy number of an individual can be viewed as a change-point process along the chromosome, with a "normal" level at 2 and "aberrations" being locations where the copy number deviates from normal. Chromosomal aberrations occur naturally in the human population, and is a common source of genetic variation. High throughput genomic profiling technologies have been developed to measure DNA copy number at a fine scale along the chromosome. Given this data for a sample of individuals from the population, how do we statistically detect locations of shared aberration across individuals?We discuss the properties of this type of data and propose a mixture model for its analysis, where at each change-point, the sample is composed of a mixture of individuals who have the change and those who do not. We have experimented with several statistics for detection of shared change-points. For some of the statistics, large sample tail approximations for significance evaluation can be derived. We compare the performance of these statistics in the context of DNA copy number detection using replicate samples from the same individual and from parent-child trios. | ||||
| January 25, 2008 | Olga Troyanskaya | Princeton University | homepage | |
Combining genomic data, computation, and experiments to discover novel biologyUnderstanding of gene function and regulation on a whole-genome scale is the key challenge for systems biology. Discovery and wide adoption of functional genomics technologies in the past decades promised a rapid means to address this challenge and has fueled development on numerous computational methods to deal with the resulting data. However, functional understanding of the proteome still lags behind experimental data generation. My group addresses this data-knowledge disconnect through integrated analysis of diverse functional genomic data, including gene expression microarrays and physical interaction studies, and through close integration of computation and experiments in an iterative framework. I will present our recent developments both in integrated analysis of diverse data (for yeast and laboratory mouse) and our iterative computational-experimental framework and its successful application to discover novel biology of the mitochondria. | ||||
| February 1, 2008 | Ivet Bahar | University of Pittsburgh | homepage | |
Supramolecular Machinery: Insights from Elastic Network ModelsMany proteins function as molecular machines. Understanding the principles that control the machinery of biomolecular systems can be a challenge due to the involvement of multiple subunits and cooperative interactions manifested by allosteric changes in conformations beyond the range of atomic simulations. We have developed and utilized low resolution models to explore the collective dynamics of such complex systems, and to bridge structure and function, through the paradigm structure-encodes-dynamics-encodes-function. The elastic network models and methods we introduced to this aim have found utility in many applications and have helped us gain insights into the intrinsic, structure-encoded ability of native structures to energetically favor the reconfigurations between functional substates. An overview of these recent progresses will be presented, along with the application to a few systems. | ||||
| February 8, 2008 @ BST3 | Harmen Bussemaker | Columbia University | homepage | |
Data-driven biophysical modeling of (post-)transcriptional networksver the past decade, whole genome sequencing and the related development of DNA microarrays have revolutionized molecular biology. These technologies have made it possible to study genome expression from a global perspective. Rather than focus on the properties of individual genes, researchers can now study their function and regulation as part of a network of interactions with other genes and their products. Our laboratory uses computation and quantitative modeling to understand how the structure and function of the genomewide regulatory network emerges from molecular interactions between DNA, RNA, and proteins. Of central importance are transcription factors, which connect the upstream signaling pathways that relay the internal and external signals of the cell with the downstream target genes whose transcription rates they control. Two key functional characteristics of transcription factors are their DNA sequence specificity and their tissue/condition-specific regulatory activity. However, it is often difficult to measure these properties directly. Our laboratory has pioneered different approaches for inferring them as "hidden variables" from functional genomics data of different type through integrative, model-based analysis. In this talk, we will give an overview of our approach and discuss a number of specific topics, including our MatrixREDUCE algorithm, which uses a purely biophysical model to quantify sequence specificity, and the condition-specific control of transcript stability by the RNA-binding factors Puf3p and Puf4p. | ||||
| February 15, 2008 @ BST3 | Bruce Tidor | Massachusets Inst. of Technology | homepage | |
Electrostatics in ligand binding and designElectrostatic interactions are prevalent at protein interfaces and are important for protein-protein and protein-ligand association. However, due to the large desolvation penalty incurred by polar and charged groups upon binding, their behavior and role is often non-intuitive. Continuum electrostatic calculations have been used to probe interactions at binding sites and as tools in the design of new or altered binding partners. Novel design procedures have been implemented and applied to a variety of biological systems, demonstrating the importance of optimizing electrostatic interactions and the utility of this approach to ligand design. Results will focus on the development of small-molecule inhibitors for enzymes and the problem of antibody affinity maturation. | ||||
| February 22, 2008 @ BST3 | Takis Benos | University of Pittsburgh | homepage | |
What can evoluton say about gene regulation? From DNA signals to gene networksWith many eukaryotic genomes already sequenced, it has become apparent that differences in the non-coding genomic regions may account for most of the diversity we observe both within and between species. Changes in non-coding sequences can alter the expression of genes, if for example, they are located in a transcription factor binding site. In this seminar, we will present our recent results on turnover of the transcription factor binding sites in vertebrates and the (co)evolution of binding motifs and their associated transcription factors. We will also discuss how these results can be used to improve our motif prediction algorithms and how to infer the identity of a transcription factor from its DNA motif alone. Finally, we will show how computational predictions of transcription factor - microRNA interplay can lead to the discovery of "critical" regulatory loops that are implicated in terminal disease states like lung idiopathic pulmonary fibrosis (IPF). | ||||
| February 29, 2008 | Thomas E. Cheatham, III | University of Utah | homepage | |
Promise and peril in the simulation of nucleic acids and protein structure, dynamics and interactionsBiomolecular simulation methods are widely applied to give insight into the structure, dynamics and interactions of biomolecules. These methods are routinely used in applications ranging from structure prediction to drug design. Most often, practitioners in the field go to great lengths to convince you how great their methods are, and how such methods provide novel biological insight. Less often do you hear about the serious artifacts in the underlying potentials or force fields, or limits in the methods, or hear discussed the seriously steep learning curves which must be surmounted to use the programs effectively. Moreover, as atomistic simulations of biomolecules consume ever-larger portions of the allocated HPC cycles at supercomputer centers, the standard model of run, then analyze, is beginning to break down as we get overloaded by the data. The emergence of petascale computing and continued optimization of the simulation codes throws in further challenges as we move from single one-off simulations towards sampling ensembles. I'll outline our experiences in the large-scale simulation of nucleic acids and proteins, hopefully pointing out both promise and peril along the way as we try to answer the following questions: How can we facilitate exploration of the methods and data? How can we better assess and validate the simulation results? How can we convince the larger community of the power of biomolecular simulation? | ||||
| March 7, 2008 | Sridhar Hannenhalli | University of Pennsylvania | homepage | |
Computational analysis of eukaryotic transcriptional regulation and its evolutionTranscriptional control is recognized as an important component of the overall regulation of cellular processes. We have been focusing on computational approaches to a variety of problems pertaining to eukaryotic transcriptional regulation, namely, (1) representation and identification of transcription factor binding sites, (2) PolII promoter prediction, (3) Predicting interaction among transcription factors, (4) identifying groups of TFs - Transcriptional modules - that co-regulate a set of transcripts. I will present a brief overview of the computational approaches and challenges as well as a number of applications. I will also present some recent unpublished work pertaining to the evolution of paralogous transcription factors. | ||||
| March 21, 2008 | Mark Gerstein | Yale University | homepage | |
Human Genome AnnotationA central problem for 21st century science will be the analysis and understanding of the human genome. My talk will be concerned with topics within this area, in particular annotating pseudogenes (protein fossils) in the genome. I will discuss a comprehensive pseudogene identification pipeline and storage database we have built. This has enabled use to identify >10K pseudogenes in the human and mouse genomes and analyze their distribution with respect to age, protein family, and chromosomal location. One interesting finding is the large number of ribosomal pseudogenes in the human genome, with 80 functional ribosomal proteins giving rise to ~2,000 ribosomal protein pseudogenes.I will try to inter-relate our studies on pseudogenes with those on tiling arrays, which enable one to comprehensively probe the activity of intergenic regions. At the end I will bring these together, trying to assess the transcriptional activity of pseudogenes. Throughout I will try to introduce some of the computational algorithms and approaches that are required for genome annotation and tiling arrays -- i.e. the construction of annotation pipelines, developing algorithms for optimal tiling, and refining approaches for scoring microarrays. | ||||
| March 28, 2008 | Marcelo Magnasco | Rockefeller University | homepage | |
Sparse time-frequency representations and the neural coding of sound.Auditory neurons preserve exquisite temporal information about sound features, but we do not know how the brain uses this information to parse the rapidly changing sounds of the natural world. A simple argument for making effective use of temporal information in the auditory nerve leads us to consider the reassignment class of time-frequency representations as a potential model of auditory processing. We show that these representations are sparse even for spectrally dense signals. Many details of complex sounds that are virtually undetectable in standard sonograms are readily perceptible and visible in reassignment; as the only known class of time-frequency representations that is always ''in focus'' this methodology may help explain the remarkable acuity of auditory perception. We also consider how to determine, experimentally, when a neural code embeds information in the detailed timing of spikes. We show that standard ``spike-triggered'' receptive field constructions are inadequate to extract this level of information and present a new method, ``differential reverse correlations'', based on correlating small changes in spike timing due to small changes to the stimulus. | ||||
| April 4, 2008 | Marti Head | GlaxoSmithKline | ||
Structure-Based Design at GSKMarti Head is the Director of Computational Chemistry US at GlaxoSmithKline Pharmaceuticals. She will describe how computational chemistry fits into the GSK research organization and will present examples of how computational methodologies have had an impact on drug discovery projects. | ||||
| May 13, 2008 | Trey Ideker | University of California, San Diego | homepage | |
Mapping gene regulatory pathways by assembly of physical and genetic interactionsPhysical and genetic mapping data have become as important to Network Biology as they were to the Human Genome Project. Physical interaction maps are being constructed through systematic measurements of protein-protein, protein-DNA, and protein-small molecule interactions. Genetic interaction maps are being generated by large-scale screening of synthetic-lethals and epistasis, by multipoint gene association studies, and by mapping the effects of natural and prescribed genetic variations on gene expression. We are working on ways of integrating physical and genetic interaction maps to assemble models of gene regulatory pathways. These efforts face several challenges, including: increasing the coverage of each type of network; establishing methods to assemble individual interaction measurements into contiguous pathway models; and annotating these pathways with detailed functional information. Efforts in each of these areas will be described. Using integrative tools, we are constructing network models to explain the physiological response of yeast to DNA damaging agents.
Relevant articles and links: | ||||
|
|
|