imec elPrep variant caller runtime benchmarking
Imec is a world-leading R&D and innovation hub in nanoelectronics and digital technologies. elPrep is developed by ExaScience Life Lab, a division of imec that focuses on scalable software solutions for data-intensive and high-performance computing problems, primarily in life sciences.
elPrep is a high-performance tool for analyzing .sam/.bam files (up to and including variant calling) in sequencing pipelines.
A comparison between elPrep and GATK has already been published, but a comparison to other software such as freebayes and DeepVariant has not been done yet. The goal of this project was to compare the run time of elPrep to other variant callers. Doing such a comparison allowed BioLizard to gain more insight into the capabilities of elPrep. BioLizard aims to use the insights to better assist future clients with questions regarding elPrep and setting up variant calling workflows.
“Streamline your genomic research with an integrated tool that excels in speed and accuracy.” - imec
• Setup ec2 instances for elPrep and its peer software (freebayes & DeepVariant).
• Perform variant calling on a public WGS dataset from Genome in a Bottle. The process was performed on the full data set and a variety of subsampled data sets.
• Collect runtimes of the different variant callers and runs and normalize them to core hours.
• Generate tables and graphs to compare the results.
Run time comparison between the different variant callers.
Discovered further possible optimizations to improve user experience.
Partnership between imec and BioLizard to better assist clients.