Rust: A promising programming language for large biological datasets
20 Sep 2022
Not everybody can cook a perfect meal…
To really excel at this, you need a deep understanding of how flavours will mix and enhance each other, and to be able to draw on that knowledge to identify exactly which spice or condiment will bring a dish to the next level. In much the same way, having a solid background in programming languages and knowing which one to use when is one of the main skills that every Lizard has in their toolbox.
Within the bioinformatics community, there is no doubt that R and Python are the most successful programming languages, and are likely to remain popular for some time. However, just like salt is not always the answer to improving every dish, R and Python are not always the best solution for every data analysis problem.
From experience, we have seen that effective software is often the keystone that can bring research in drug discovery or disease treatment to the next level. But depending on the project and the type of data, there will be different bioinformatic requirements and priorities. At Biolizard, we strive to apply the best solution to the problem at hand, by picking a programming language that matches the needs of the dataset in question. Over the next three weeks, we will introduce you to three new programming languages that we use at BioLizard – starting with Rust.
Rust is a general-purpose programming language that unifies the expressiveness and flexibility of languages like Python with the speed and efficiency of languages like C and C++. When a programming language is fast, it not only means that code can be written and bugs fixed quickly, but also that it is able to efficiently handle massive datasets like those found in modern biological databases.
One of the features that make Rust so fast is that it compiles to native code across multiple platforms, from Windows to macOS to Linux. Put simply, this means that information written in Rust is directly translated from the code that is understandable to programmers, into information that makes sense to the computer – without additional, in-between translation steps that languages like Python require. This speeds up the process of the computer interpreting the code that has been fed to it.
Rust is unique in that while it is fast like C and C++, it is also very safe to use. In fact, Rust originated out of Mozilla Research in 2010 as a more reliable, safer alternative to C++. That safety is thanks to its efficient memory management system, which prevents programmers from inappropriately re-accessing or releasing memory from the operating system. This means that if there is a memory error in Rust code, which could cause crashes or create code vulnerabilities that hackers can abuse, the programmer is notified and forced to fix any errors before the program even runs. In other words, Rust simply won’t allow programs that attempt unsafe memory usage to run, making this language a great option to use with highly sensitive data.
That all sounds great, right? Unfortunately, there is one downside: compared to other programming languages, Rust has a relatively steep learning curve. However, the community of Rust users, self-styled as “Rustaceans”, have tried to ameliorate this by including all of the requirements needed to produce a Rust binary in one package. This makes it easier for newcomers to get going with Rust right away, without needing to download any additional tools. Thanks to these fast and effective features and its vibrant community, Rust has quickly become one of the most popular programming languages according to the most recent yearly Stack Overflow survey.
Do you have a large database that needs to be analysed both efficiently and safely? Then perhaps Rust is the language for you – and BioLizard is here to help! Be sure to contact us to discuss your data analysis needs, and return here next week to learn about a different programming language that is particularly suited to tackling heterogeneous data.
This text was prepared in collaboration with Erik Vandeputte – Software & IT architecture Team Lead.