Decoding Gene Regulation
Interpretation of nucleic acid sequence is fundamental to molecular biology, and decoding gene regulation in particular remains a vexing problem. Only a small minority of most genomes is protein coding. In contrast, the number and variety of apparent regulatory sequences continues to grow - in human, eclipsing the number of protein coding genes by orders of magnitude. Determining how regulatory sequences are recognized and interpreted by cells is fundamental to dissecting gene expression mechanisms, interpreting the significance of sequence variants, and understanding the function and evolution of genomes. It remains surprisingly difficult to precisely predict gene expression patterns from sequence, however. This problem represents one of the grand challenges of our era. Our approach centres on study of the molecules that recognize specific sequences: transcription factors, RNA binding proteins, nucleosomes, and enzymes that act on specific sequences. We employ a battery of laboratory assays and associated computational methods to measure and describe their sequence and structure preferences, and to understand how they work together to orchestrate chromatin organization and gene regulation.