# Microbial network analysis: How to unravel interactions between taxa

Studying the plant microbiome tells us which phyla, families or species (collectively referred to as ‘taxa’) are present in and around the plant, and how abundant they are. For example, the bacterial phyla Proteobacteria, Bacteroidetes, Firmicutes and Actinobacteria are enriched in the rhizosphere compared to bulk soil for most plant species (Trivedi *et al.*, 2021).

Highly abundant or ‘dominant’ taxa with a specific function are expected to significantly influence this function by the force of their numbers. For example, abundance and diversity of mycorrhizal fungi are associated with increased plant growth and health. Mycorrhizal fungi colonize plants’ roots and positively affect nutrient uptake, promote resistance to biotic and abiotic stress and even form networks facilitating communication between plants (van der Heijden* et al.*, 2015)!

# Numbers aren’t everything!

However, abundance and importance are not always correlated. Less conspicuous taxa might still play a crucial role in the microbial community. For example*, Burkholderia* bacteria are endosymbionts of mycorrhizal fungi, and influence the abundance of these important microorganisms (Banerjee * et al*., 2018). The term ‘keystone taxa’ was coined to describe taxa that play an important role in the community, irrespective of their numbers.

# It’s all connected: The roles of keystone taxa and how to identify them

Keystone taxa influence the microbiome through affecting the abundance of other species, or produce metabolites that alter the microbial composition. For example, specific *Pseudomonas fluorescens* strains might produce 2,4-diacetylphloroglucinol, which in turn suppresses the fungus causing take-all disease in wheat (Banerjee * et al*., 2018)*.*

But how can we identify these keystone taxa, if they are not necessarily present in high numbers? The answer is to not only identify which taxa are present, and in which numbers, but to also study how they interact with each other.

There are several different types of interactions that might exist between different taxa. One well-known type of interaction between two taxa is win-lose, eg. when one predates or parasitises on the other. In another type of interaction, two taxa might coexist resulting in mutual benefit, also known as mutualism. Commensal relationships are those where one taxon benefits from the other, without harming or helping it. Lastly, the counterpart of commensalism is amensalism, where one party is negatively affected by the presence of the other, while the second party is not affected by the interaction.

Network inference is a technique that can help you to overcome the challenge of predicting these complex interactions from a dataset. In a microbial network, the taxa are represented by nodes, and their interactions as lines or arrows connecting the nodes. These lines or arrows are respectively termed ‘edges’ or ‘directed edges’.

There are multiple strategies for finding these edges. For instance, similarity-based network inference can be used to find pairwise relationships between taxa, and more complex relationships can be identified by building regression or rule-based networks (Faust & Raes, 2012).

Image from Faust & Raes, 2012

One of the advantages of using networks is that they are flexible: we can use genomic, but for example also transcriptomic or proteomic data infer them. Even environmental factors can be taken into account, by adding these factors to the network as extra nodes.

# Three types of network analysis you can consider

## #1: Similarity-based networks

When you have multiple samples, you can compare these to identify species that mostly occur together, or that are mutually exclusive, thus predicting positive or negative relationships between them. To accomplish this, we would start by determining the similarity between the distributions of two taxa, and next assess the statistical significance of the resulting similarity score. The latter is usually done by permuting the dataset to obtain a null distribution. These steps are then repeated for each pair of species, and the significant interactions are used to build a network (Faust & Raes, 2012).

There are several key decisions for this type of network inference, which a skilled bioinformatician can advise you on:

**The choice of metric to determine the similarity between taxon distributions**. For abundance data, the Pearson or Spearman correlation are often used. When only presence-absence data is available, oftentimes hypergeometric distributions are employed. On time series data, local similarity analysis can be applied to use shifts in populations over time for building the network.**The permutation method to determine the null distribution**. Different data re-shuffling methods may preserve different properties of the dataset, therefore it is important to know which properties need to be preserved based on the experimental conditions.*p***value correction for multiple testing**.

## #2: Regression-based networks

While the pairwise iteration above will identify one-on-one relationships between taxa, more complex interactions - where the abundance of one species depends on multiple others - will be missed. These can be predicted using regression: modeling the abundance of one species using a combination of other species. This results in a network where one edge can connect multiple nodes (Faust & Raes, 2012).

Note that both this regression method and the similarity method described above need to be interpreted carefully. They will identify correlations between certain species, but as we covered in a previous blog article, **correlation is not causation**: there may not be a biological reason behind the observed mathematical relationships.

## #3: Dynamic networks

Even further increasing complexity, we might consider the effect of time on our network. This can be accomplished by leveraging dynamic networks, which consist of mathematical equations describing changes in the network over time. For example, these types of networks might pick up oscillations in the microbiome, where one species rapidly expands, but is then again controlled and decreased by the effects of another, maintaining a dynamic balance.

Several approaches have been proposed to infer dynamic networks, such as the generalized Lotka–Volterra equations (Mounier *et al*., 2008). For the interested reader, more details on the methods and pitfalls of different strategies for network inference are outlined by Faust & Raes (2012).

# Applications of network inference

Networks are a powerful tool for analyzing the complex dynamics of the microbiome. They allow for the identification of keystone taxa, which appear as highly connected ‘hubs’ in the network - reflective of their key role in the microbial community (Banerjee * et al*., 2018). We can also use networks to predict the effects of increasing a specific species, for example to improve plant performance.

While powerful, as with any form of mathematical modeling in the ever-ongoing attempt to better understand biology, the choice of metrics and methods is crucial, and knowledge of the drawbacks and pitfalls of the analysis is imperative to avoid wrong or over-interpretation of the results and false conclusions. Therefore, it’s essential to combine the expertise of bioinformaticians and biostatisticians together with microbiome domain experts, in order ensure that the right statistical and data science methodology is chosen, and that data are interpreted in context of deep biological understanding.

Are you looking for an expert bioinformatics and biostatistics partner that deeply understands microbiome analysis? BioLizard is ready to support you!

We have proven expertise in microbiome analysis, including causal inference, and have even built a user-friendly tool for interactive exploration and querying of microbiome data - no data science expertise needed. We understand statistics - but we speak the language of microbiology.

**Reach out to BioLizard today to start discussing how we can help you extract new insights from your microbiome data. **