All publications (Google Scholar)

Major publications (Pubmed)


de Boer, C., Taipale, J. Hold out the genome: A roadmap to solving the cis-regulatory code. Nature, 2023.
Morgunova, E., Taipale J. Structural insights into the interaction between transcription factors and the nucleosome. Curr Opin Struct Biol 71:171-179, 2021.
Lambert, S.A., Jolma, A., Capitally, L.F., Das, P.K., Yin, Y., Albu, M., Chen, X., Taipale, J.*, Hughes, T.R.*, Weirauch, M.T.* The Human Transcription Factors. Cell 172:650-665, 2018.
Sur, I., Taipale, J. The role of enhancers in cancer. Nature Reviews Cancer 16:483–493, 2016.

Preprints (not peer reviewed):
Hong et al., Scaling laws of human transcriptional activityBioRxiv, 2023.
Morgunova et al., Interfacial water confers transcription factors with dinucleotide specificity, BioRxiv, 2023.
Sur et al., Shared requirement for MYC upstream super-enhancer region in tissue regeneration and cancerBioRxiv, 2023.
Kauko et al., Lineage-specific oncogenes drive growth of major forms of human cancer using common downstream mechanisms, BioRxiv, 2023.


A competitive precision CRISPR method to identify the fitness effects of transcription factor binding sites
Pihlajamaa et al., Nature Biotechnology 41:197–203, 2023
We describe a competitive genome editing method that measures the effect of mutations on molecular functions, based on precision CRISPR editing using template libraries with either the original or altered sequence, and a sequence tag, enabling direct comparison between original and mutated cells. Using the example of the MYC oncogene, we identify important transcriptional targets and show that E-box mutations at MYC target gene promoters reduce cellular fitness.

Sequence determinants of human gene regulatory elements
Sahu et al., Nature Genetics 54:283-294, 2022
DNA determines where and when genes are expressed, but the full set of sequence determinants that control gene expression is not known. Here, we measured the transcriptional activity of DNA sequences that represent an ~100 times larger sequence space than the human genome using massively parallel reporter assays (MPRAs). Machine learning models revealed that transcription factors (TFs) generally act in an additive manner with weak grammar, and that most enhancers increase expression from a promoter by a mechanism that does not appear to involve specific TF–TF interactions. We also show that few TFs are strongly active in a cell, with most activities being similar between cell types. Individual TFs can have multiple gene regulatory activities, including chromatin opening and enhancing, promoting and determining transcription start site (TSS) activity, consistent with the view that the TF binding motif is the key atomic unit of gene expression.
In this work, we also report the discovery of a novel type of enhancer-like element, the chromatin chromatin-context dependent enhancer. 

Upregulation of ribosome biogenesis via canonical E-boxes is required for Myc-driven proliferation
Zielke et al., Developmental Cell 57:1024-1036, 2022
The transcription factor Myc drives cell growth across animal phyla and is activated in most forms of human cancer. However, it is unclear which Myc target genes need to be regulated to induce growth and whether multiple targets act additively or if induction of each target is individually necessary. Here, we identified Myc target genes whose regulation is conserved between humans and flies and deleted Myc-binding sites (E-boxes) in the promoters of fourteen of these genes in Drosophila. E-box mutants of essential genes were homozygous viable, indicating that the E-boxes are not required for basal expression. Eight E-box mutations led to Myc-like phenotypes; the strongest mutant, ppanEbox−/−, also made the flies resistant to Myc-induced cell growth without affecting Myc-induced apoptosis. The ppanEbox−/− flies are healthy and display only a minor developmental delay, suggesting that it may be possible to treat or prevent tumorigenesis by targeting individual downstream targets of Myc.

Systematic analysis of binding of transcription factors to noncoding variants
Yan et al., Nature 591:147-151, 2021
Many sequence variants have been linked to complex human traits and diseases, but deciphering their biological functions remains challenging, as most of them reside in noncoding DNA. Here we have systematically assessed the binding of 270 human transcription factors to 95,886 noncoding variants in the human genome using an ultra-high-throughput multiplex protein-DNA binding assay, SNP-SELEX. We report highly predictive models for 94 human transcription factors and demonstrate their utility in determining the mechanism of action of variants identified using genome-wide association studies, and in understanding of the molecular pathways involved in diverse human traits and diseases.

Human cell transformation by combined lineage conversion and oncogene expression
Sahu et al., Oncogene 40:5533-5547, 2021
Cancer is the most complex genetic disease known, with mutations implicated in more than 250 genes. However, it is still elusive which specific mutations found in human patients lead to tumorigenesis. Here we show that a combination of oncogenes that is characteristic of liver cancer (CTNNB1, TERT, MYC) induces senescence in human fibroblasts and primary hepatocytes. However, reprogramming fibroblasts to a liver progenitor fate, induced hepatocytes (iHeps), makes them sensitive to transformation by the same oncogenes. The transformed iHeps are highly proliferative, tumorigenic in nude mice, and bear gene expression signatures of liver cancer. These results show that tumorigenesis is triggered by a combination of three elements: the set of driver mutations, the cellular lineage, and the state of differentiation of the cells along the lineage. Our results provide direct support for the role of cell identity as a key determinant in transformation and establish a paradigm for studying the dynamic role of oncogenic drivers in human tumorigenesis.

The interaction landscape between transcription factors and the nucleosome
Zhu et al., Nature 562:76-81, 2018 
The packaging of DNA on nucleosomes makes it more difficult for transcription factors (TFs) to access DNA. We found here that TFs have evolved several different mechanisms to get around the problem, allowing the reading of the important messages in our genome that tell cells how to construct and maintain our tissues and organs. The reported findings uncover a rich, interactive landscape between transcription factors and the nucleosome, thus paving a way to a thorough understanding of the complicated DNA decoding mechanisms in higher organisms. The findings also provide a basis for future studies aimed at understanding transcriptional regulation based on biochemical principles. As aberrant transcription factor activity is linked to many human diseases, including cancer, the findings are also relevant to understanding mechanisms of human disease. 
See also Dodonova et al., Nature 580:669–672, 2020 

CRISPR-Cas9 genome editing induces a p53-mediated DNA damage response
Haapaniemi et al., Nature Medicine 24:927-930, 2018 
In the last few years, CRISPR-Cas9 has become a highly popular genome editing tool that is now transforming the field of biology. This is due to CRISPR-Cas9 allowing scientists to edit genomes with unprecedented precision, efficiency and flexibility compared with other commonly used methods such as RNAi. We found here that DNA double-stranded breaks created by CRISPR-Cas9 activate p53, causing cell growth arrest. Inhibition of p53 prevented activation and increased efficiency of precision genome editing. Controlling the DNA damage response to allow efficient gene editing will be important in developing the next generation of safe and efficient genome editing technologies.

A protein activity assay to measure global transcription factor activity reveals determinants of chromatin accessibility
Wei et al., Nature Biotechnology 36:521-529, 2018 
In this work, we developed a massively parallel protein activity assay (Active Transcription Factor Identification, ATI) that can measure the DNA binding activity of all TFs in a particular cell type. We found that only a small number of TFs demonstrated strong DNA binding in each of the tissues studied. These results suggest that, despite the presence of hundreds of TFs in most tissues, just a handful of them determine the overall gene expression landscape of a cell. Speaking about the research, Professor Jussi Taipale explained:The finding that some transcription factors are much more active than others indicates that the regulatory system is far simpler than what we had imagined. We previously thought that all transcription factors can work together in millions of different ways to regulate genes. Instead, it now looks like weaker transcription factors need to work with the strong ones to get anything done. This makes the regulatory system very hierarchical, and simplifies the task of evolution. In a hierarchical system it is easier to evolve sets of co-expressed genes that work together to accomplish a particular task.”

Two distinct DNA sequences recognized by transcription factors represent enthalpy and entropy optima
Morgunova et al., eLife 7:e32963, 2018
Most TFs prefer to bind to a single maximal affinity sequence, and sequences that are closely related to it. However, several TFs can bind with high-affinity to multiple different sequences, and populations of sequences that are closely related to these local optima. In this work, we used X-ray crystallography, computational modelling and thermodynamic measurements to study four human TFs that can each bind to two different gene regulatory sequences. We found that the TFs bound each sequence with a similar strength, but through different mechanisms. Binding to one DNA sequence utilised an enthalpy-based mechanism where rigid water bridges linked the TF to the DNA. In contrast, binding to the second DNA sequence employed an entropy-based mechanism where the movement of water molecules increased disorder, giving strength to the interaction.

Impact of cytosine methylation on DNA binding specificities of human transcription factors
Yin et al., Science 356:eaaj2239, 2017 
The DNA letter C exists in two forms, cytosine and methylcytosine, which can be thought of as the same letter with and without an accent (C and Ç). Methylation of DNA bases is a type of epigenetic modification, a biochemical change in the genome that doesn’t alter the DNA sequence. The two variants of C have no effect on the kind of proteins that can be made, but they can have a major influence on when and where the proteins are produced. Previous research has shown that genomic regions where C is methylated are commonly inactive and that many TFs are unable to bind to sequences that contain the methylated Ç. By analysing hundreds of different human TFs, Taipale lab has found that certain transcription factors actually prefer the methylated Ç. These include TFs that are important in embryonic development, and for the development of prostate and colorectal cancers. “The results suggest that such ‘master’ regulatory factors could activate regions of the genome that are normally inactive, leading to the formation of organs during development, or the initiation of pathological changes in cells that lead to diseases such as cancer”, says Professor Jussi Taipale.

DNA-dependent formation of transcription factor pairs alters their binding specificity
Jolma et al., Nature 527:384–388, 2015
In this work, we show that that the ‘grammar’ of the human genetic code is more complex than that of even the most intricately constructed spoken languages in the world. The findings, published in the journal Nature, explain why the human genome is so difficult to decipher – and contribute to the further understanding of how genetic differences affect the risk of developing diseases on an individual level.

CTCF/cohesin-binding sites are frequently mutated in cancer
Katainen et al., Nature Genetics47:818-821, 2015
Mutations that lead to cancer are not only occurring in the 2% of the DNA that encodes for proteins, but also in the non-coding regions. These regions are determining when and where the genes are expressed. In the largest cancer genome study performed in the Nordic countries, researchers led by Professors Jussi Taipale and Lauri Aaltonen studied more than two hundred whole genomes from colorectal cancer samples and detected a distinct accumulation of mutations at sites where the proteins CTCF and cohesin bind to the DNA.

Conservation of transcription factor binding specificities across 600 million years of bilateria evolution
Nitta et al., eLife 4:e04837, 2015
In this work, we found that the binding specificities of TFs – the language used in the switches that turn genes on and off has remained the same across millions of years of evolution. The findings, which are published in the scientific journal eLife, indicate that the differences between animals reside in the content and length of the instructions that are written into DNA using this conserved language.

DNA-binding specificities of human transcription factors
Jolma et al., Cell 152:327-339, 2013
In this work, we describe binding specificity models for the majority of all human TFs, approximately doubling the coverage compared to existing systematic studies. Our results also reveal additional specificity determinants for a large number of factors for which a partial specificity was known before, including a commonly observed A- or T-rich stretch flanking core-binding motifs. Global analysis of the data reveals that homodimer orientation and spacing preferences, and base stacking interactions have a larger role in TF-DNA binding than what has been previously appreciated. We further describe a binding model incorporating these features that is required to understand binding of TFs to DNA.

Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites
Yan et al., Cell 154:801-813, 2013
Here, we have developed a high-throughput ChIP-seq method, and mapped the binding patterns of hundreds of TFs in a human cell-line. Global analysis of the binding patterns indicate that TF binding cluster to a much larger degree than previously anticipated, with TF clusters occupying less than 1% of the genome. The TF clusters were strongly enriched in binding motifs, evolutionary conserved, and predictive of gene expression. Interestingly, virtually all TF clusters contained cohesin, a ring-shaped molecule known to be important in transcription and in sister chromatid cohesion in mitosis. Follow-up experiments indicate that cohesin has a causative role in maintaining the pattern of TF binding across cell division, by enricling DNA throughout replication and at chromatin condensation, when TFs are displaced from chromatin. Thus, we propose that cohesin acts as a cellular memory, that helps replicate the accessibility information imprinted by TFs displacing nucleosomes on DNA.

Counting absolute numbers of molecules using unique molecular identifiers
Kivioja et al., Nature Methods 9:72-74, 2012
In the manuscript we describe a universal method that can be applied to counting the absolute number of molecules in a sample. The method is based on labeling of the molecules to be counted in such a way that all molecules in the sample become unique. The method completely eliminates PCR bias, a common problem in accurately determining the number of RNA or DNA molecules in a cell. The described method can be applied to improve accuracy of almost any next generation sequencing method, including ChIP-sequencing, genome assembly, diagnostic applications and manufacturing process control and monitoring.

Mice lacking a Myc enhancer that includes human SNP rs6983267 are resistant to intestinal tumors
Sur et al., Science 338:1360-1363, 2012
In this work, we generated mice deficient in Myc-335, a putative MYC regulatory element that contains rs6983267, a SNP accounting for more human cancer-related morbidity than any other genetic variant or mutation. In Myc-335 null mice, Myc transcripts were expressed in the intestinal crypts in a pattern similar to that in wild-type mice but at modestly reduced levels. The mutant mice displayed no overt phenotype but were markedly resistant to intestinal tumorigenesis induced by the APCmin mutation. These results highlight the fact that although a disease-associated polymorphism typically has a relatively modest effect size, the element that it affects can be critically important for the underlying pathological process. The finding also indicates that normal growth control and pathological growth induced by cancer can utilize different mechanisms.



Taipale Lab Cambridge

Taipale Lab KI

Taipale Lab Helsinki

CoE in Tumor Genetics Research



Social Media: