Variant caller runtime benchmarking

BioLizard benchmarked the runtime of elPrep against variant callers like freebayes and DeepVariant using public WGS data on EC2 instances. The study delivered normalized core-hour comparisons, revealed optimization opportunities, and strengthened BioLizard’s partnership with imec.

About the client

imec is a world-leading R&D and innovation hub in nanoelectronics and digital technologies. elPrep is developed by ExaScience Life Lab, a division of imec that focuses on scalable software solutions for data-intensive and high-performance computing problems, primarily in life sciences.

Project overview

elPrep is a high-performance tool for analyzing .sam/.bam files (up to and including variant calling) in sequencing pipelines.

A comparison between elPrep and GATK has already been published, but a comparison to other software such as freebayes and DeepVariant has not been done yet. The goal of this project was to compare the run time of elPrep to other variant callers.

As an added bonus, performing such a comparison allowed BioLizard to gain more insight into the capabilities of elPrep. BioLizard aims to use these insights to better assist future clients with questions regarding elPrep and setting up variant calling workflows.

“Streamline your genomic research with an integrated tool that excels in speed and accuracy.” – imec

Our approach

  • Set up ec2 instances for elPrep and its peer software (freebayes & DeepVariant).
  • Perform variant calling on a public WGS dataset from Genome in a Bottle. The process was performed on the full data set and a variety of subsampled data sets.
  • Collect runtimes of the different variant callers and runs, and normalize them to core hours.
  • Generate clear tables and graphs to compare the results.

Results

Run time comparison

Run time comparison between the different variant callers.

possible optimizations

Discovered further possible optimizations to improve user experience.

partnership-1

Partnership between imec and BioLizard to better assist clients.

Check out the imec website for more information on elPrep!

https://139582766.hs-sites-eu1.com/hs-web-interactive-139582766-76449037512

This work was performed in partnership with imec.

LOGO-IMEC
How spatial biology improves clinical trial success in oncology

How spatial biology improves clinical trial success in oncology

In oncology, the drug development path is unique: Phase 0 and Phase I trials are typically conducted in patients rather than healthy volunteers, allowing for early assessment of efficacy and patient selection alongside safety. Yet, even with this early clinical insight, many cancer drugs show promise in the lab but fail to transition effectively into the clinic. This often happens because, while we verify that a drug’s target is present, we frequently overlook its context, specifically its location, the surrounding microenvironment, and its interaction with neighboring cells. By revisiting real-world examples of discontinued trials, this post explains why understanding the “where” is just as critical as the “what”, and how spatial biology is positioning itself as a valuable avenue for validating clinical potential.

Why bioinformatics workflows require experienced software engineers

Bioinformatics pipelines break for the smallest reasons: package updates, shifting dependencies, or “it only works on my machine.” This post explains why experienced software engineers and DevOps practices (Git, CI/CD, IaC) are essential to keep workflows reproducible, stable, and scalable.