How to get your microbiome research pixel perfect
Imagine that you are a scientist in the early 2000s, and you’re interested in studying microbiomes. You love going out into the field and collecting samples, but after each expedition, you know that there is a long road ahead of you. After collecting a sample, you will need to culture it in a petri dish and allow the different microbes in your sample to grow, before you can even start to use laborious methods like gram-staining and microscopy to try to pick apart which microbial species are present in that sample. And even after all that, you’d still miss some of the less abundant and unculturable microbial species.
Back in the day, microbiome analysis was a very slow research method. But if we fast forward to today, a lot has changed. Now, we can use molecular methods such as DNA sequencing to identify the microbes present in our samples.
These new techniques not only give you much more information, for instance by allowing you to detect species that cannot be cultured in the lab but also can give you data almost instantly. However, of course, this also means the typical workflow looks a bit different than it did a few decades ago, which naturally comes with new challenges. In the first blog in this series, we looked at some of the applications of microbiome research. Now, let’s have a look at how microbiome research is actually performed!
Step one: Collecting samples
The quality of your sample can make or break your microbiome research. Because there are microbes all around us all the time, you have to be careful not to allow ‘non-sample’ microbes to contaminate your sample. Sterile working conditions are a necessity! On top of that, it is important to remember that each microbial species has its own preferred environment. Just like you might find it difficult to be exposed to temperatures of 50 degrees celsius for a long period of time, some microbes might find living at room temperature a challenge. Or, take a gut microbiome sample, for example. The microbes that grow in our intestines normally exist in an environment with very low oxygen. If they come into contact with oxygen, it will influence their gene expression and even their survival, which will impact your results. So, be sure to know your sample, and treat it with care: both using a sterile workflow, and storing it in conditions in which the microbes in your sample will flourish, or at least survive.
Another best practice is to take multiple samples including at different points in time. Microbiomes can be variable and dynamic over time and in response to different environmental cues. This means that the time point at which you take a sample can substantially impact which microbes you’ll pick up and what state they will be in. To make sure that you are making sound conclusions from your data, it’s a good idea to take multiple samples (preferably at different times, if time is not already part of your experimental set-up), and to take note of other environmental factors that may influence your results, in order to reduce bias.
Step two: Sequencing sample DNA
Sequencing microbial samples allow you to monitor which microorganisms are present within your sample. State-of-the-art sequencing tools like Illumina PE & SE or Pacbio SE will provide you with a lot of data; sequencing output includes millions of DNA sequences. This massive amount of data will then need to be sifted through carefully, and each sequence matched to a specific microbe. To find out exactly which bacteria or viruses are in your sample, bioinformatics approaches can help you match DNA sequences to a reference database. In our experience, we have seen two major pitfalls that can occur at this point in your research.
One pitfall occurs when a DNA sequence doesn’t match any specific species in your reference database. Because there are so many bacteria, viruses, and other microbes that we haven’t discovered yet, it’s very possible (and sometimes also exciting) that this will happen. Depending on where your sample comes from, you might be left with a large proportion of unknown sequences. This so-called “dark matter” is often excluded from further analyses. However, if you’re interested in identifying new species and if you use the right tools, systemic characterization of these unknown sequences can allow you to discover never-before-seen microbes – turning this pitfall into a plus!
A second common pitfall is that it’s almost impossible to capture all of the microbes in your sample. Some microbes will inevitably be very small minorities within your sample, and as a result, there will be only a limited amount of their DNA available, which your sequencing method may not pick up. It’s important to be aware of this potential bias because even a small number of microorganisms can have significant biological impacts on an environment. At this point in time, there is no active way to prevent this bias, so all you can do for now is simply to be aware of it when forming hypotheses.
Step three: Analysing sequence data
Once you have all your sequence data, it’s time to decide how you will perform your analysis! There are two major approaches used in the analysis of sequencing data: looking at diversity and looking at differences.
The first analysis approach involves assessing the diversity of microbes within one sample. In other words, how many different microbial species are represented in your data? This is useful information to know if you want to, for example, assess how healthy or robust a microbiome is. For instance, imagine that one human patient’s gut microbiome is dominated by only two bacterial species, while another patient’s gut hosts five dominating species. In the case of the first patient, the loss of even one species will have a bigger impact on their microbiome’s structure, versus if it had consisted of multiple different bacterial species as is the case for the second patient. This can be important information to know before assigning antibiotic treatments to patients, which can disrupt sensitive microbiomes. So, you can see how diversity analyses can provide you with very useful information.
Secondly, you can investigate the differences between samples taken at different time points. This type of analysis can be used to understand the effect of a treatment or of other specific factors of interest. If we stick with our antibiotic treatment example, analysing the difference between samples taken pre- and post-antibiotic treatment would allow us to identify the different types of microorganisms that thrived or perished after the antibiotic treatment or the metabolic processes that the medical treatment induced in those microorganisms. By investigating the differences between samples taken at different time points, you can assess if interventions cause statistically and biologically significant impacts, and the conclusions drawn from these comparisons can give you vital insights into how factors like drug treatment, pollution, light exposure, or other agents might impact your microbiome of interest.
Before you embark on a comparative study, however, you should be aware of a common pitfall: the coincidental correlations bias. This bias describes how changes that occur by chance can be easily confused with changes caused by your intervention of interest. Because microbes are inherently dynamic in nature, you will always observe differences between samples taken at different times. As a result, the difference that you detect between your samples could indeed be the effect of your treatment, or they could be purely coincidental. Coming back to our antibiotic example again, a given microbial species may no longer be picked up after treatment because the antibiotic in question killed it, or simply because of some other chance third variable that you hadn’t considered. You can best deal with this potential bias by making sure you have a suitable control in your experimental set-up, as well as by taking and analysing multiple samples. When correlations happen across many samples, and all of them are significantly different from your control, it becomes less likely that the effect that you observe is coincidental. Just like in other biological research fields, it’s nice to have a sufficiently large n number!
An important side note: data types
There are many different technologies that can be used to sequence DNA, and each technology has its own strengths and weaknesses. Some technologies are more commonly used in industry than others, and still more tools are up-and-coming as the popularity of sequencing data increases.
A few years ago, we would have had to explain to you how to deal with every single data type in a step-by-step protocol at this point. But never fear, you’re in luck! BioLizard has recently developed a tool called proBiome that automatically recognises and processes a range of common data types. By automatically formatting and standardising your datasets, proBiome streamlines your data analysis process and plays a holistic role in microbiome analysis workflows. proBiome does the hard part for you, by implementing state-of-the-art biostatistical frameworks to examine the diversity of your samples and the impact of your intervention of interest. proBiome not only saves you a lot of time and effort, but it also makes microbiome analysis easy! The proBiome workflow includes a user-friendly interface that generates clear, interactive, web-based reports that will provide you with all of the details you need to know about your microbiome data.
Discover proBiome today and ask for a free demo of the platform.
I want to learn more about proBiome!