Skip to main content

How to interpret the cis-regulatory sequence rules learned by a deep learning model

 
 

Speaker: Julia Zeitlinger (Stowers Institute for Medical Research, US)
Date: 06/07/2023
Time: 10:00 CEST
Host: Ben Lehner, CRG

If you would like to attend the seminar, please register here.

The cis-regulatory code that instructs gene regulation during development, also known as the genome’s second code, is a fundamentally unresolved problem. Recent progress has provided proof-of-principle evidence that this complex cis-regulatory code can be learned with neural networks. The new approach is fundamentally different from traditional methods in that the sequence rules are learned inside a black box directly from genomic sequences through their ability to better predict a specific genomics data set. This dramatically improves the predictive performance, but requires rigorous approaches for extracting, understanding and validating the learned sequence rules to make sure that they represent biology. I will describe how we use this approach using mouse or Drosophila development as model systems and uncover sequence rules that we can validate with experiments. The goal is to understand the underlying biophysical processes and constraints and to create a general model of how the cis-regulatory code is read out by transcription factors. We strive to use this knowledge to create more powerful deep learning models that learn cis-regulatory sequence rules more broadly across cell types.