Sahu et al., Nature Genetics, 2022

The regulatory code that determines gene activity in human cells has remained poorly understood despite numerous genome-scale studies about transcription factor (TF) binding in vitro and in vivo. To address this knowledge gap, we measured the gene regulatory activity from a collection of DNA sequences that together are 100 times larger than the entire human genome. For this, we utilized massively parallel reporter assays, where the regulatory activity of millions of DNA sequences can be simultaneously studied in one large-scale assay. Our novel reporter constructs and ultra-complex libraries derived from a collection of TF motifs, entire fragmented human genomic DNA, and random synthetic DNA sequences enabled systematic analysis of sequence determinants of human gene regulatory elements. Our results show that individual transcription factors typically contribute to gene regulation in an additive manner without specific interactions with other factors. The motif grammar for the gene regulatory code is relatively weak, and thus we conclude that the DNA binding motifs constitute the key atomic unit of gene expression.

Traditionally, active regulatory elements have been thought to be located within open and accessible chromatin regions. However, enhancer activity measurements from our ultra-complex genomic libraries combined with epigenetic analyses revealed that the gene regulatory elements in the human cells can be classified into different types based on the chromatin context they reside in, and we defined three classes of active enhancers: classical, chromatin context-dependent and closed chromatin enhancers.

In conclusion, our large-scale systematic experiments revealed that human gene regulatory logic is unexpectedly simple. These results pave the towards the ultimate aim of regulatory genomics: predicting gene expression from a sequence.

Social Media: