Julia: Different datasets have different needs
21 Sep 2022
Just like we use different technologies to answer different scientific questions in the laboratory, different datasets have different needs in regard to programming languages that are best suited for their analysis. It’s not a black-and-white issue by any means. In a similar way to how both flow cytometry and confocal microscopy might be able to provide the answers you are looking for within an experimental setup, two different programming languages might be able to tackle the setup of the same data analysis pipeline.
However, sometimes one specific tool will allow for new insights like no other could. Just like you need to understand the pros and cons of using the different instruments in your laboratory to test different scientific hypotheses, at BioLizard, we have a deep understanding of programming languages to make sure that we will be able to assess which language is best suited to the current question at hand, within your dataset.
In the first blog in this series, we introduced Rust, a new programming language with great promise for the efficient, speedy, and safe analysis of data. Today, we’ll take a look at Julia.
The Julia language is a relatively new programming language developed with numerical computing in mind. Since its debut in 2012, Julia quickly grew into a powerful, general purpose language suited for tasks as diverse as data wrangling, web development, and climate model simulation.
A key ambition of Julia is to solve the two-language problem. This is a common headache in scientific computing, in which a researcher will first write a prototype algorithm in an easy, but slow, language like R, and then later (if the prototype works) rewrite that algorithm in a harder, but faster language like C++. Julia aims to help researchers skip this multi-language approach by providing high-level, readable syntaxt is understandable to programmers but still as fast as, or even faster than, low-level languages such as C and Fortran. The saying goes, “Walk like Python, run like C,” but we could add, “Skip like Julia”.
Another core concept of Julia is that it is comfortable with heterogeneity. Julia makes use of “multiple dispatch”, or the ability to define multiple versions of the same function with different argument types. In practical terms, this means that Julia packages can often be applied to problems that were never imagined by the creators of those packages – and that functions from seemingly unrelated packages will “just work” when applied to a new type of data. This is a powerful capability in the context of biological sciences, where there is a high level of heterogeneity in the types of collected data. The relatively abstract nature of the Julia language makes it particularly flexible for handling such diverse data.
On top of this, Julia also permits metaprogramming, or the ability of programs to modify their own source code while running. This allows a sort of reflection and learning within the software that is well-matched to the iterative process of understanding highly complex biological systems from viral genomics to immune cell differentiation pathways, and holds promise for artificial intelligence implementations.
Like Rust, Julia is supported by a vibrant, although relatively small, community. We see a bright future for Julia in biology, knowing that its libraries for biological modelling are best-in-class and that its bioinformatics library even outperforms highly optimised domain-specific languages. If you have generated a large amount of diverse, heterogenous data, Julia might be the best match for your dataset.
Does the Julia language sound like a fit for your needs, but do you need some strategic advice to get the most out of it? BioLizard can support you as a true end-to-end data partner, all the way from supporting you from early stages in strategic advice, to downstream and hands-on support. Contact us to discuss your data analysis needs – and don’t forget to return here next week for the third and final edition of this blog series on new programming languages. In the last blog in the series, we’ll look into something a little different: a new, fast, and straightforward open-source UI software development kit.
This text was prepared in collaboration with Erik Vandeputte – Software & IT architecture Team Lead & Michiel Stock – Bioinformatics and machine learning consultant at BioLizard.