Skip to content
6 minutes

With the boom in single cell sequencing over the last years, a lot of data has been generated – and excitingly, more and more of this data is publicly available. That’s why the generation and utilisation of atlases and public data is the second trend that we’ve identified in the field of single cell sequencing.

Advantages of using public datasets

The increasing availability of atlases and public datasets now gives scientists access to a wide variety of data derived from a range of different tissues and conditions. Importantly, using public data can drastically reduce the cost and preliminary data collection that is necessary for hypothesis generation. In some cases, if the right atlases or public datasets are available, you might be able to completely skip setting up preliminary experiments, and instead directly form testable hypotheses based on data collected by others.

On the other hand, public datasets can also be used to augment your own proprietary datasets. High-resolution public atlases often contain robust information about many different cell types – i.e. they capture information about many genes per cell, over many cells. More specifically, many databases routinely capture hundreds of thousands of cells, and sometimes now over millions of cells. This makes them a very rich resource to pull from. 

By integrating proprietary datasets with data from comprehensive and well-labeled reference atlases, you can efficiently map and annotate your high-throughput - but often lower-depth - sequencing data. This approach leverages the extensive and detailed information, including valuable (cell type) labels, from the reference datasets and allows for deep sequencing insights at a reduced cost. Similarly, atlases can also be used as healthy controls to compare to proprietary datasets – again reducing the amount of data you need to collect yourself and on your own dime.

Benefits for a variety of projects

In regards to leveraging public datasets, there are a variety of projects that can benefit. We’ve highlighted a few of them below.

Drug Repurposing


Imagine that you have a particular compound that has been tested and approved for a given indication, and now you want to test if it can be applied in additional circumstances. Then you may be in luck: public atlases provide you with access to a wide variety of data sets that are derived from many different tissues and conditions – meaning that you can directly screen for the targets of your compound of interest across these datasets.

Once you’ve identified several datasets or novel indications in which the target of your compound is repeatedly present, you can then conduct further analyses to extract additional important information. For instance, a meta-analysis can be performed to assess across multiple datasets whether the target is differentially expressed in the disease condition versus healthy controls. This sets you well on your way for drug repurposing, without having to enter the wet lab!



Identification of targets and biomarkers

There is already a range of public databases available that can support target, drug, and biomarker discovery and validation, as reviewed here. And as we discussed above, using publicly available data for preliminary research or to augment the proprietary data that you have collected can reduce costs and shorten timelines.

In terms of target and biomarker identification, using public atlases can contribute to achieving an early, detailed understanding of disease subtypes and cellular mechanisms, or to help align therapeutic targets with appropriate disease subtypes, enhancing model accuracy and improving your chance of success. In addition, combining atlas data with CRISPR screening technologies can lead to more insightful target prioritisation and biomarker discovery early on in your R&D process. Public data can also be used as a reference to improve your understanding of how drugs of interest may affect different cell types, or to guide stratification of patient groups. In summary, using high-quality data from the beginning of your investigation can give your research a great boost in the right direction.



Understanding mechanisms of action and cell type-specific responses

In general, a key advantage of single cell sequencing over bulk sequencing is that you gain more granularity in your data. This means that, for instance, you can already assess not only whether a target of interest is expressed in a tissue, but also in which cell types the target is expressed (or not) - which can have toxicological consequences. Using public data to explore questions such as this can save a lot of money down the line, by reducing the likelihood that a lead would need to be discarded due to toxicological issues.

In addition, there are more and more public atlases for specific model systems. This provides the opportunity for direct comparison of cellular responses in selected models versus human diseases - ultimately allowing for improved selection and relevancy of preclinical models.

The challenge of data quality

There are clearly many advantages of using public datasets – but one major challenge is ensuring that the data you want to use is of high quality. There are still challenges and inconsistencies with, for instance, insufficient (meta)data annotation. We always recommend to scope your datasets for inclusion carefully, perform rigorous quality control steps, and compare different studies that have been executed completely independently from each other to assess if the results are reproducible and robust.

Making better decisions, earlierSC_blog2

Drug discovery is a long and expensive road - and public data resources can bring a wealth of biological information to your fingertips, helping you to make the right decisions early on. Ultimately, making more data-driven decisions earlier should increase your chance of success in drug, target, and biomarker discovery, by, for example…

  • identifying more accurate drug targets and biomarkers,
  • enhancing the selection and relevance of preclinical models, 
  • and identifying potential toxicological effects due to off-target actions of drugs.

Although public atlases are undoubtedly a very valuable resource, it’s important to make sure that you’re only considering high quality and biologically relevant data to drive your decision-making. And, although public data resources are openly accessible to anyone, it is still important to have a bioinformatics expert on-board to assess data quality and effectively integrate data derived from multiple datasets. This will allow you to truly extract all the insights available from these vast stores of information.

Are you looking for a bioinformatics partner to ensure that you get started on the right track?

Then reach out to BioLizard today – we’re ready to help you make the best data-driven decisions possible!


P.S. Did you enjoy this article? Then be sure to read about the first trend in single cell sequencing that we identified, here!

Recommended Reading