All publications (Google Scholar)

Major publications (Pubmed)

Preprints (not peer reviewed):

Sahu et al., Cellular transformation by combined lineage conversion and oncogene expression. BioRxiv, 2020.

Zielke et al., Myc-dependent cell competition and proliferative response requires induction of the ribosome biogenesis regulator Peter PanBioRxiv, 2020.

Taipale, Romer and Linnarsson, Population-scale testing can suppress the spread of COVID-19. MedRxiv, 2020.


The interaction landscape between transcription factors and the nucleosome
Zhu et al., Nature 562:76-81, 2018 
The packaging of DNA on nucleosomes makes it more difficult for transcription factors (TFs) to access DNA. We found here that TFs have evolved several different mechanisms to get around the problem, allowing the reading of the important messages in our genome that tell cells how to construct and maintain our tissues and organs. The reported findings uncover a rich, interactive landscape between transcription factors and the nucleosome, thus paving a way to a thorough understanding of the complicated DNA decoding mechanisms in higher organisms. The findings also provide a basis for future studies aimed at understanding transcriptional regulation based on biochemical principles. As aberrant transcription factor activity is linked to many human diseases, including cancer, the findings are also relevant to understanding mechanisms of human disease. 
See also Dodonova et al., Nature 580:669–672, 2020 

CRISPR-Cas9 genome editing induces a p53-mediated DNA damage response
Haapaniemi et al., Nature Medicine 24:927-930, 2018 
In the last few years, CRISPR-Cas9 has become a highly popular genome editing tool that is now transforming the field of biology. This is due to CRISPR-Cas9 allowing scientists to edit genomes with unprecedented precision, efficiency and flexibility compared with other commonly used methods such as RNAi. We found here that DNA double-stranded breaks created by CRISPR-Cas9 activate p53, causing cell growth arrest. Inhibition of p53 prevented activation and increased efficiency of precision genome editing. Controlling the DNA damage response to allow efficient gene editing will be important in developing the next generation of safe and efficient genome editing technologies.

A protein activity assay to measure global transcription factor activity reveals determinants of chromatin accessibility
Wei et al., Nature Biotechnology 36:521-529, 2018 
In this work, we developed a massively parallel protein activity assay (Active Transcription Factor Identification, ATI) that can measure the DNA binding activity of all TFs in a particular cell type. We found that only a small number of TFs demonstrated strong DNA binding in each of the tissues studied. These results suggest that, despite the presence of hundreds of TFs in most tissues, just a handful of them determine the overall gene expression landscape of a cell. Speaking about the research, Professor Jussi Taipale explained:The finding that some transcription factors are much more active than others indicates that the regulatory system is far simpler than what we had imagined. We previously thought that all transcription factors can work together in millions of different ways to regulate genes. Instead, it now looks like weaker transcription factors need to work with the strong ones to get anything done. This makes the regulatory system very hierarchical, and simplifies the task of evolution. In a hierarchical system it is easier to evolve sets of co-expressed genes that work together to accomplish a particular task.”

Two distinct DNA sequences recognized by transcription factors represent enthalpy and entropy optima
Morgunova et al., eLife 7:e32963, 2018
Most TFs prefer to bind to a single maximal affinity sequence, and sequences that are closely related to it. However, several TFs can bind with high-affinity to multiple different sequences, and populations of sequences that are closely related to these local optima. In this work, we used X-ray crystallography, computational modelling and thermodynamic measurements to study four human TFs that can each bind to two different gene regulatory sequences. We found that the TFs bound each sequence with a similar strength, but through different mechanisms. Binding to one DNA sequence utilised an enthalpy-based mechanism where rigid water bridges linked the TF to the DNA. In contrast, binding to the second DNA sequence employed an entropy-based mechanism where the movement of water molecules increased disorder, giving strength to the interaction.

Impact of cytosine methylation on DNA binding specificities of human transcription factors
Yin et al., Science 356:eaaj2239, 2017 
The DNA letter C exists in two forms, cytosine and methylcytosine, which can be thought of as the same letter with and without an accent (C and Ç). Methylation of DNA bases is a type of epigenetic modification, a biochemical change in the genome that doesn’t alter the DNA sequence. The two variants of C have no effect on the kind of proteins that can be made, but they can have a major influence on when and where the proteins are produced. Previous research has shown that genomic regions where C is methylated are commonly inactive and that many TFs are unable to bind to sequences that contain the methylated Ç. By analysing hundreds of different human TFs, Taipale lab has found that certain transcription factors actually prefer the methylated Ç. These include TFs that are important in embryonic development, and for the development of prostate and colorectal cancers. “The results suggest that such ‘master’ regulatory factors could activate regions of the genome that are normally inactive, leading to the formation of organs during development, or the initiation of pathological changes in cells that lead to diseases such as cancer”, says Professor Jussi Taipale.

DNA-dependent formation of transcription factor pairs alters their binding specificity
Jolma et al., Nature 527:384–388, 2015
In this work, we show that that the ‘grammar’ of the human genetic code is more complex than that of even the most intricately constructed spoken languages in the world. The findings, published in the journal Nature, explain why the human genome is so difficult to decipher – and contribute to the further understanding of how genetic differences affect the risk of developing diseases on an individual level.

CTCF/cohesin-binding sites are frequently mutated in cancer
Katainen et al., Nature Genetics47:818-821, 2015
Mutations that lead to cancer are not only occurring in the 2% of the DNA that encodes for proteins, but also in the non-coding regions. These regions are determining when and where the genes are expressed. In the largest cancer genome study performed in the Nordic countries, researchers led by Professors Jussi Taipale and Lauri Aaltonen studied more than two hundred whole genomes from colorectal cancer samples and detected a distinct accumulation of mutations at sites where the proteins CTCF and cohesin bind to the DNA.

Conservation of transcription factor binding specificities across 600 million years of bilateria evolution
Nitta et al., eLife 4:e04837, 2015
In this work, we found that the binding specificities of TFs – the language used in the switches that turn genes on and off has remained the same across millions of years of evolution. The findings, which are published in the scientific journal eLife, indicate that the differences between animals reside in the content and length of the instructions that are written into DNA using this conserved language.

DNA-binding specificities of human transcription factors
Jolma et al., Cell 152:327-339, 2013
In this work, we describe binding specificity models for the majority of all human TFs, approximately doubling the coverage compared to existing systematic studies. Our results also reveal additional specificity determinants for a large number of factors for which a partial specificity was known before, including a commonly observed A- or T-rich stretch flanking core-binding motifs. Global analysis of the data reveals that homodimer orientation and spacing preferences, and base stacking interactions have a larger role in TF-DNA binding than what has been previously appreciated. We further describe a binding model incorporating these features that is required to understand binding of TFs to DNA.

Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites
Yan et al., Cell 154:801-813, 2013
Here, we have developed a high-throughput ChIP-seq method, and mapped the binding patterns of hundreds of TFs in a human cell-line. Global analysis of the binding patterns indicate that TF binding cluster to a much larger degree than previously anticipated, with TF clusters occupying less than 1% of the genome. The TF clusters were strongly enriched in binding motifs, evolutionary conserved, and predictive of gene expression. Interestingly, virtually all TF clusters contained cohesin, a ring-shaped molecule known to be important in transcription and in sister chromatid cohesion in mitosis. Follow-up experiments indicate that cohesin has a causative role in maintaining the pattern of TF binding across cell division, by enricling DNA throughout replication and at chromatin condensation, when TFs are displaced from chromatin. Thus, we propose that cohesin acts as a cellular memory, that helps replicate the accessibility information imprinted by TFs displacing nucleosomes on DNA.

Counting absolute numbers of molecules using unique molecular identifiers
Kivioja et al., Nature Methods 9:72-74, 2012
In the manuscript we describe a universal method that can be applied to counting the absolute number of molecules in a sample. The method is based on labeling of the molecules to be counted in such a way that all molecules in the sample become unique. The method completely eliminates PCR bias, a common problem in accurately determining the number of RNA or DNA molecules in a cell. The described method can be applied to improve accuracy of almost any next generation sequencing method, including ChIP-sequencing, genome assembly, diagnostic applications and manufacturing process control and monitoring.

Mice lacking a Myc enhancer that includes human SNP rs6983267 are resistant to intestinal tumors
Sur et al., Science 338:1360-1363, 2012
In this work, we generated mice deficient in Myc-335, a putative MYC regulatory element that contains rs6983267, a SNP accounting for more human cancer-related morbidity than any other genetic variant or mutation. In Myc-335 null mice, Myc transcripts were expressed in the intestinal crypts in a pattern similar to that in wild-type mice but at modestly reduced levels. The mutant mice displayed no overt phenotype but were markedly resistant to intestinal tumorigenesis induced by the APCmin mutation. These results highlight the fact that although a disease-associated polymorphism typically has a relatively modest effect size, the element that it affects can be critically important for the underlying pathological process. The finding also indicates that normal growth control and pathological growth induced by cancer can utilize different mechanisms.


Lambert, S.A., Jolma, A., Capitally, L.F., Das, P.K., Yin, Y., Albu, M., Chen, X., Taipale, J., Hughes, T.R., Weirauch, M.T. The Human Transcription Factors. Cell 172:650-665, 2018.

Taipale, J. The chromatin of cancer. Science 362:401-402, 2018.

Taipale, J. Informational limits of biological organisms. The EMBO journal 37:e96114, 2018.

Sur, I., Taipale, J. The role of enhancers in cancer. Nature Reviews Cancer 16:483–493, 2016.


Taipale Lab Cambridge

Taipale Lab KI

Taipale Lab Helsinki

CoE in Tumor Genetics Research


Social Media: