L1000 assay and the Connectivity Map dataset

From Clinfowiki
Jump to: navigation, search

The L1000 Assay is a method to measure genome-wide mRNA. This assay was developed by the Broad Institute, as part of the project Big Data to Knowledge (BD2K) - Library of Integrated Network-Based Cellular Signatures (LINCS) Consortium.

The motivating goal of developing this assay was the creation of a large dataset of gene transcriptional responses to chemical, genetic and disease perturbation in cell lines. An underlying assumption of this database is that several gene signatures influenced by perturbations may provide information to mechanistic and circuit-level biological insights.

To achieve this goal, it was required to submit cell lines to perturbations such as drugs, gene knockdown and gene over-expression, and then measure the gene expression resulting from these perturbations.

However, common gene expression profiling methods are expensive and not scalable to be performed in high throughput. Thus, a cost-effective assay was necessary to expand the perturbational dataset to more combinations of cell lines, perturbations, doses and times. According to Subramanian et al (1), the reagent cost of the L1000 assay is approximately two dollars. This reduction in cost of measuring mRNA enabled the creation of a dataset with more than 1 million profiles of functional perturbations in cultured human cells, called Connectivity Map (CMap).

The Assay

The L1000 Assay consists on the measurement of 978 genes (landmark genes) and computational inference of expression level of 11,350 genes.

The steps of measuring the landmark genes transcript levels in cells can be summarized as follows: - Cells are lysate and submitted to reverse transcription of mRNA; - The measurement of transcription of the 978 landmark genes is performed by ligation-mediated amplification (LMA) followed by capture of the amplification products on fluorescently addressed microspheres (1); - Measures of landmark transcription is captured and processed.

The procedures post-detection of the landmark detection: - Inference of transcription level of 11,350 genes; - Normalization; - Calculation of differential expression of genes; - Replicate-consensus signature.

Validation of the assay was performed comparing L1000 with RNA-Seq, which showed a strong degree of similarity between both platforms. It is worth noting that the computational inference of the remaining 11,350 was shown to accurately infer the expression level of 9,196 (81%) genes. (1)

Data available – the Connectivity Map (CMap)

A pilot for the Connectivity Map (CMap) dataset was developed and described by Lamb et al (2). This pilot dataset was constructed by treating cells from 3 different cell lines with 164 drugs and measuring mRNA expression profiling using Affymetrix microarrays. An expansion of the dataset using this measurement method proved to be cost-prohibitive, leading to the development of the L1000 assay.

After validating the L1000 assay, Subramanian et al (2017) performed 1,319,138 gene profiles, for a total of 473,647 gene signatures, after consolidating replicates. Each gene signature is the response of cells from one cell line to one perturbagen in a given dose and time following the treatment.

In total, the Connectivity Map has data for the following number of perturbations and cell lines:

Perturbations: 19,811 small molecule compound Gene knock-down and/or over-expression of 5,075 genes 314 biologics

Cell lines: 71 cell lines, from 19 primary sites

All data are public available at the Gene Expression Omnibus (GEO: GSE92742). Also, users can access the data and query it against a gene signature of interest in the Connectivity Map linked user environment - the CLUE platform.

Uses in Clinical Research

One example of study that used the Connectivity Map dataset is the paper Treatment of Obesity with Celastrol, by Liu et al. (3) The authors queried the Connectivity Map for compounds that generated gene expression signatures similar to an expression profile assumed to be important in obesity treatment.

Celastrol was a compound that showed a high level of similarity with the gene expression profile of interest of the authors, and experiments in mice were conducted with this compound. As a final result, the authors showed with in vivo experiments that celastrol has the potential to be used as an anti-obesity therapeutic agent. (3)

This example illustrates one of the main goals of the CMap dataset: help researchers screen for drugs and/or strategies of perturbations that may promote a desired gene transcription.

In addition, other uses of the CMap data in clinical research are (1):

- Annotate function of uncharacterized small molecules

- Discover pathway membership of gene products

- Connect disease states to pathways and small molecules

- Screening for drugs that cause a given gene expression profile

- Drug repurposing

It is important to highlight that, with the current state of the L1000 technology, its main strength – generating a genome-wide mRNA measure in a cost-effective way – comes at the expense of not reproducing the whole complexity of organisms, which may play a crucial role in most fields of clinical research. Therefore, the data in the CMap dataset should not be used directly for clinical applications. (1) Several limitations related to the process of gathering the data make it imperative that any observation based on L1000 data to be followed by in vitro and in vivo validation, before being translated to clinical use.


1. Subramanian A, Narayan R, Corsello SM, et al. (2017) A next generation Connectivity Map: L1000 platform and the first 1,000,000 profiles. Cell 171,6: 1437-1452.e17. doi:10.1016/j.cell.2017.10.049

2. Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, et al. (2006). The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935.

3. Liu J, Lee J, Hernandez MAS, Mazitschek R, Ozcan U. (2015) Treatment of obesity with celastrol. Cell 161, 5:999 – 1011

Submitted by Carolina Heimann