Three reasons why you should use machine learning for protein engineering and design
03 Jul 2023
“Machine Learning”. It’s both a buzz word and an exciting method – but how, and perhaps more importantly why, should you start using machine learning for protein engineering and design?
Put simply, AI can transform your data into an asset. Effectively applying machine learning (ML), which is one form of artificial intelligence (AI), can transform your data from a difficult-to-wrangle mass of numbers into a clear, detailed, and data-driven story.
Today, we are diving into the top three reasons why you should start to harness the power of ML for protein engineering and design!
Reason #1: Use machine learning to find patterns in an abundance of data
In the field of protein engineering and design, there is a huge amount of available data – which makes this field very suitable for applying AI strategies. In fact, this huge amount of data can be difficult to effectively integrate in an unbiased way without AI.
Excitingly, we are starting to have sufficient available data to learn general patterns that apply to nearly all proteins. And, if you can decipher patterns within this enormous bulk of unrelated proteins, those same patterns can shed immense light on the way that nature puts amino acids together to make stable proteins. Making use of these patterns provides a lot of perspective for, for example, rapidly engineering a new protein that is stable within blood. In other words, we can harness the great amount of knowledge embodied within the massive volumes of existing data on protein sequences by embedding it into models for protein engineering – thereby creating new value out of existing data! The now-famous AlphaFold is one example of the potential that this approach has.
This approach is important, because oftentimes the proteins encompassed within synthetic protein libraries, as opposed to in vitro> protein libraries, tend to be less stable. Even though their amino acid sequences seem very similar to the sequences within in vitro libraries, the synthetic proteins lack some feature, some rule, some pattern, that is present within the ‘natural’ proteins and makes them stable. Until now, it has been very difficult to tease out exactly what this feature or rule for stability might be – there are just too many variables for us humans to easily comprehend. However, ML can be applied to capture this essence of stability, and embed it into synthetic libraries to make synthetic sequences further resemble live sequences and make them stable! These so-called protein “embeddings” – the encoding of functional and structural properties of natural proteins into a new synthetic product via ML – can create huge value and generate exciting new insights in otherwise challenging projects… And all at a lower cost compared to what would be necessary to create new in vivo libraries.
Reason #2: Use machine learning to catch the low hanging fruits in your data
How much data do you have sitting in your dusty drawers that has never been used, or only been used once and never touched again?
Oftentimes, life science companies have a huge abundance of data in their metaphorical filing cabinets, which may hold additional insights that haven’t been fully leveraged yet. Applying AI strategies like ML to sift through your data can help you dust it off and harvest the low hanging fruits hiding inside. Oftentimes these low-hanging fruits may not be apparent without the use of AI, but nonetheless they can benefit research by providing brand-new, data-driven insights that can speed up old research lines, or even spark a new and promising project.
On top of that, usually the data generation part of research is the most expensive and time-consuming aspect. Although experimenting with innovative tech like machine learning can also be money- and time-intensive, it is usually much less resource-intensive than data generation. In other words, if you already have the data, it is often very efficient to use AI to tease out novel insights. Then, these insights can in turn be applied to future experimental plans to add additional value!
Reason #3: Apply machine learning to streamline existing production pipelines
Life science companies’ product pipelines often have some bottlenecks that can be widened by the smart application of machine learning. Life scientists have typically relied on months and months of wet lab work in order to make the jump from a large pool of potential products to a few select top candidates for further testing. But in our experience, applying ML early on in production pipelines can help automate and optimise processes, by predicting beneficial properties of antibodies, small molecules, or other biologicals, and pre-selecting promising candidates.
Similarly, BioLizard has developed ML models to create or optimise synthetic libraries that perform even better than time- and labour-intensive in vitro libraries. Machine learning can then be re-applied to these synthetic libraries, which usually consist of several million sequences, to select only the most promising targets for follow-up wet-lab experiments.
Using ML to streamline the process of deciding whether or not to continue with developing potential products can not only enhance efficiency, save time, and reduce cost, but it can also boost the chances of experimental success. Of course, no two pipelines are the same – and that’s why BioLizard always starts with analysing the situation and goals of our clients, in order to pinpoint each individual clients’ unique opportunities for leveraging ML to gain competitive advantages. In our experience, although there is always some initial investment of time and money involved in incorporating ML approaches into product development, usually the improvement in operational efficiency and product quality vastly supersedes that initial investment. For instance, AI is estimated to reduce the timeline of drug development up to phase I clinical trials from 5 years down to 1.5.
Artificial intelligence and machine learning are exciting tools that bring a lot of potential for application in the life sciences. But, they can also seem intimidating, as can all new, flashy, and computationally intense technologies.
If you want to start applying machine learning to gain new data-driven insights, but you don’t know exactly how and where to start, BioLizard can support you. Our goal is to add value for life science companies by not only supporting the execution of machine learning and other AI tasks, but also by thinking along with clients on a strategic level to see how state-of-the-art technologies can address their unique challenges and opportunities.
Do you want to start applying machine learning in your protein design and engineering projects?