AI in the life sciences: (Un)limited potential?
Artificial Intelligence (AI) is swiftly becoming a cornerstone technology in the early stages of drug discovery and preclinical development. We're witnessing an increasing number of AI-driven breakthroughs in this field… right? Let’s take a critical look together.
Challenges of applying AI in a biomedical context
The real strength of AI shines when it deals with large data sets of a relatively defined and simple nature - like text (or the realms of protein folding and small molecules) for large language model (LLM) applications. In contrast, biological and molecular data present a different challenge. The high variability, complexity, and often incomplete sampling of biological data can make AI application less straightforward.
Take, for example, clinical cohorts. These groups are frequently composed of a biased population. Even small changes in experimental procedures can significantly affect the molecular readings, which in turn impacts the results derived from AI applications. This is one of the reasons why, despite a growing number of biomarkers being identified each year, less than 1% are actually validated - and even this small number is, worryingly, on the decline.
On top of that, what's often labeled as “AI” might actually be more about applying mathematical and statistical methods to big data. This perspective is supported by studies that compare traditional algorithmic approaches with neural networks - often leading to comparable results. This makes it tricky to determine exactly which successes are truly AI-driven.
In short, while AI holds immense promise in revolutionising drug development, the journey is not without its hurdles, especially when navigating the complex and often unpredictable landscape of biological data. The current applications of true AI in early discovery and preclinical context are limited and - importantly - require experimental validation.
Where AI can shine brightest in biomedicine
One of the most significant potentials of AI lies in its ability to reduce failure rates in drug discovery and research & development. AI's true power emerges when it integrates the entire spectrum of available data — from public sources, proprietary databases, and even expert insights — to guide decision-making processes. By evaluating the properties of candidate projects early and weeding out those with low potential, AI can significantly streamline the R&D journey, leading to better decision making.
In this context, AI isn't just a standalone tool; it becomes a synergistic partner when combined with the right (human) domain expertise. AI is especially effective at crunching large volumes of data - allowing human experts to focus their time and efforts more effectively. Together, human experts and AI can fulfil the promise of more efficient and effective drug discovery.
Alongside this concept of human-AI collaboration, another key to harnessing AI's power in drug discovery and R&D lies in considering not only the quantity of data, but also its quality and complexity. Reliable and representative data is crucial for AI to perform optimally in biomedicine. Therefore, creating an environment conducive to processing and learning from high-quality data is a critical step in unleashing AI's transformative potential.
AI and public data
Public data, widely recognised for its value, is one such resource AI can leverage for gaining insights. However, there are still many challenges to be overcome - including the fact that the public data landscape is far from uniform. Moreover, not all publicly available data is of high quality and some datasets are frankly subpar.
The considerable differences in experimental procedures, data annotations, and overall data quality across various studies present a significant challenge. For example, in regards to data collection, the different studies from which public data is derived are often conducted under unique conditions, which may not align perfectly with the goals of repurposing the data. Another major hurdle is the inconsistency in metadata annotations, which can range from experimental procedures to disease classifications. Even within the same institution, there may be no consensus on, for instance, the exact criteria for classifying disease severity levels, leading to divergent judgments among physicians. All of this means that effective data integration, while essential, is by no means straightforward.
Hence, it's vital to carefully assess the applicability of each dataset for its intended (re)purpose, conduct thorough quality control, and invest significant effort in integrating data effectively. Overcoming these hurdles is essential to fully leveraging AI in advancing drug discovery and research.
How to proceed
At BioLizard, we have acquired a deep understanding of what it takes for AI to succeed in biological research, and developed an approach based on a blend of biological insight and computational prowess.
A first important step in applying AI to biomedical data is to carefully curate and access data inputs, aided by computational tools that streamline the manual assessment. There are two different strategies that can work to accomplish this:
- Meticulously curating and annotating datasets specifically chosen to answer certain biological questions, followed by an in-depth analysis of this data. AI can be used to assist this process, but human curation remains important to achieving high-quality datasets.
- Gathering vast amounts of data, integrating it, and then using AI as a mediator to unveil recurring and meaningful biological patterns and remove noise. This approach is particularly effective in well-studied disease areas, and is grounded in the belief that despite biases and potential mis-annotations, a large enough dataset will inherently contain valid biological signals. But a word of caution is appropriate here: if not carefully managed, systematic biases can be amplified by large datasets…
It’s also possible to take a hybrid approach. For instance, we might compare proprietary datasets with public ones, measuring the 'distance' between them to filter out data that significantly deviates from our biological interest. Alternatively, we might select datasets based on specific molecular criteria from the outset.
Unique biological questions pose unique challenges and require custom approaches
Based on experience, we know that each biological problem and data architecture presents a unique challenge - necessitating bespoke analytical solutions. At BioLizard, our goal is not just to process data, but to transform it into valuable insights, tailored to the specific needs and questions of each project.