Skip to content
7 minutes

In the world of AI, life sciences is an exciting space to work. New models and algorithms appear almost on a daily basis, helping us to understand how biology works, and how we can do things like diagnosing human health, designing new drugs, or building new microbial cell factories. It is at the core of what we do at BioLizard.

In such a fast-moving landscape, new models and algorithms appear with very different levels of maturity, from early-stage research prototypes to quickly established giants like AlphaFold.
As the field evolves, the code stacks we rely on as (bio-)informaticians evolve with it. Pipelines that once ran smoothly for years can suddenly fail because of small updates in dependent packages. Soon, you find yourself stuck in an endless cycle of updating R packages and testing each step along the way. We have all been there.

DevOps to the rescue

Before you ask, DevOps is not a housekeeping gene!

DevOps is a term used in software development and covers the combined practices of development and operations. It focuses on building, testing, and deploying software efficiently and reliably. In this context, DevOps spans several areas, including governance, automation, continuous integration and continuous delivery (CI/CD), and infrastructure as code.

Bioinformatics, however, comes with its own quirks. It is often more exploratory and experimentally driven, as researchers work toward uncovering meaningful insights while tools and analyses evolve in parallel.

In this blog post, we want to highlight what we consider most important when applying these principles to bioinformatics research and tool development.

Version 7.1_final_v3_updated

When developing analysis pipelines and experimenting with code, everything begins with proper versioning. This applies to any code-based work. Version control makes it easier to track who made changes, when they were made, and why. It also enables you to introduce new features through an appropriate branching strategy without risking the stability of your main version. And if something does break, reverting to the last stable state can be done with a single command.

The most widely used version control system today is Git¹. Git is a Distributed Version Control System² that can support projects of any size, from small scripts to large-scale software.

At BioLizard, GitHub is our preferred platform for version control, although we also work comfortably with Azure DevOps, GitLab, and others.

But it works in my environment

Creating a new pipeline that only works on your local environment will not only lead to reproducibility issues but will also make collaboration a nightmare. This is where CI/CD becomes essential.

CI/CD (Continuous Integration and Continuous Delivery/Deployment) is a set of practices designed to streamline the software development lifecycle.

blogpost-image

 

Its main purpose is to reduce the amount of manual intervention required by automating the building, testing, and deployment of software. With CI/CD pipelines in place, new code changes can be integrated and delivered quickly and reliably. This not only accelerates release cycles but also reduces the risk of human error, resulting in more stable and higher-quality applications. Ultimately, CI/CD creates a continuous flow from code commit to production, allowing teams to focus on meaningful work rather than repetitive tasks.

Continuous integration (CI) involves four steps: Plan, Code, Build and Test. Planning and Coding still rely heavily on human input (although this is changing fast with the rise of LLMs). Standardisation tools such as linters, formatters, and security checkers can support this phase. The Build and Test steps, on the other hand, can be fully automated through build systems and testing frameworks.

Continuous Development begins with Deploy, where the built and tested application is automatically placed on a server or cloud environment. The next step is Release, which makes the new version available to users, either through a manual approval (Continuous Delivery) or an immediate, automated rollout (Continuous Deployment). Once the software is live, the Operate phase ensures stability and performance. Throughout this entire process, Monitor is a crucial, ongoing activity that involves tracking key metrics and user behaviour to provide a continuous feedback loop, ensuring the system is working as intended and informing future development cycles.

At BioLizard, as a consultancy firm focused on data science and generating insights from complex datasets, our emphasis is usually on the CI part of CI/CD. To ensure the highest possible quality for our clients, we rely on a two-step code checking:

  • pre-commit
  • Github Actions

Pre-commit gives us fast feedback loops by checking formatting (Ruff) and security (Bandit) before each commit. This ensures that quality and security are built in from the beginning.

GitHub Actions does the same checks, but adds automated testing using Pytest. Some tests take longer to run, so we intentionally avoid including them in pre-commit in order to keep development cycles short.

We also use Vulture on an ad-hoc basis for detecting dead code, helping us keep our repositories lean and clean.

All of these tools are supported by templates, allowing our scientists to focus 100% of their time on science, uncovering new insights from complex data.

Just run it in the cloud

Using state-of-the-art tooling often requires more computational power than a local laptop can provide. For this reason, we offload large compute jobs to the cloud. There are several ways to address compute limitations, including virtual machines, batch jobs, and cloud functions. The choice depends on the specific task at hand.

Setting up a cloud environment in the UI can be tedious and is often not reproducible. Fortunately, Infrastructure as Code (IaC)³  solves this problem. IaC tools allow the entire cloud environment to be defined in configuration files that can be stored in version control. This combination is extremely powerful as it enables reproducibility, versioning of the cloud setup while also making collaboration straightforward.

Using tools such as Terraform or OpenTofu, we have created templates that allow us to spin up cloud environments quickly and securely for large compute workloads. For example, on GCP, we rely on Identity-Aware Proxy together with a Virtual Private Cloud to safeguard our virtual machines. The goal remains the same: maximise the time spent on the fun part, extracting insights from data, rather than wrestling with infrastructure setup.

This is also where the CD component of CI/CD plays a larger role. Tools like GitHub Actions enable automated deployment in addition to continuous integration. This ensures that our clients always have access to the latest version of our solutions, without manual intervention.

Conclusion

At BioLizard, science is our core. Science is only good science when it is reproducible. As we are processing large volumes of data through complex pipelines, our outcome will only be reproducible if our code is of top quality. This requires well-structured code that is consistent and maintainable.

To achieve this, we rely on a range of DevOps tools that automate critical checks and reduce the risk of human error. This automation strengthens repeatability across our workflows and frees our bioinformaticians to focus on what truly matters: generating new scientific insights rather than dealing with technical overhead.

This is just a brief look into how we approach coding and scientific work at BioLizard. Curious to learn more? Reach out, we’d be happy to explore how we can collaborate.

 

¹ https://rhodecode.com/blog/156/version-control-systems-popularity-in-2025

² https://git-scm.com/

³ https://www.redhat.com/en/topics/automation/what-is-infrastructure-as-code-iac

 

Bioinformatics is an exciting and rapidly evolving field. However, anyone who has worked in it knows there's a significant setup required before the actual science begins. You need to configure environments, install appropriate packages, and ensure everyone on the team can reproduce the same results.

That's where DevOps can make life much easier.

Imagine you're a bioinformatician starting a new project. The first steps are typically tedious:

  • Provisioning a Virtual Machine (VM) — a challenging task that requires decisions about VM type and resource allocation.
  • Installing the correct versions of Python, R, Conda, and other packages.
  • Managing contributions from different team members, which could result in disorganised code.
  • Preventing regression bugs without unified testing methods. Over time, package deprecation can introduce pipeline errors that aren't immediately detected.

This overhead consumes valuable time that could otherwise be spent developing pipelines and analysing data.

Now let's apply software & DevOps practices to this scenario...

Infrastructure as Code with Terraform

Instead of manually repeating this setup each time, DevOps practices allow you to define infrastructure as code. With Terraform, you can create reusable blueprints for VMs or containers that come pre-configured with:

  • Specific versions of Python, R, and Conda
  • Common bioinformatics libraries and packages

Now, launching a consistent environment is as simple as running a single command.

Collaboration: Consistent Code with Linters

Once the project is running, collaboration becomes the next hurdle. When multiple people contribute code, it can quickly become inconsistent and difficult to read.

Using linters and formatters for Python, R, and other languages ensures:

  • Uniform coding style
  • Easier peer reviews
  • A smoother onboarding experience for new team members

This standardization improves readability and maintainability.

CI/CD: Making Pipelines Reliable and Future-Proof

CI/CD (Continuous Integration and Continuous Delivery/Deployment) is a set of practices that streamline the software development lifecycle by automating the integration, testing, and deployment of code changes.

As projects evolve, bioinformaticians often revisit their pipelines to analyze new datasets. Here, CI/CD practices really shine:

  • Unit and integration tests ensure that code changes don't break existing functionality.
  • When packages are deprecated or updated, your tests catch issues early.
  • Automated testing provides confidence that pipelines remain reliable as they evolve.

This makes your analysis robust and minimizes surprises when re-running workflows months or years later.

Conclusion: Let DevOps Do the Heavy Lifting

Bioinformatics is challenging enough. DevOps doesn't just simplify workflows; it ensures your projects are consistent, collaborative, and sustainable. From setting up preconfigured environments with Terraform to enforcing code quality with linters and safeguarding pipelines with CI/CD, DevOps empowers bioinformaticians to focus more on science and less on setup.

Recommended Reading