Blog

Data governance & architecture

13 December 2022

8 minutes

A step-by-step guide to R&D data management

Devising a preclinical data management time is a worthwhile investment for de-risking drug development

Having a great data management plan allows you to spend your workdays on your true area of expertise, instead of worrying and wondering how to store, or where to find, your data. However, in the field of life sciences, it can be difficult for highly specialised R&D scientists and IT professionals to come to a common understanding of what data management means, and to work together to put effective data management strategies into practice.

At BioLizard, our expertise spans both computer science and biology – meaning that we can act as effective translators to align the needs of IT and R&D departments, and design and implement an R&D data management plan that works for everybody. In this blog, we’ll guide you through exactly how we accomplish that.

Step one: Define your strategic goals and scientific use cases

This is where the fun begins! Before implementing a new system, it’s important to envision the future of data in your organization. Perhaps your dream is to make use of a layered multi-omics and systems biology approach to deepen your biomolecular understanding of diseases and improve the research and development success rate at your biopharma company. Or, maybe your first priority is to get better at reusing and repurposing the data that your organization has already generated – for instance, to find new disease indications to treat with existing drugs.

No matter what your goals may be, it’s important to ask yourself, which data-based business insights are key for your organization? And what kinds of data could support your unique selling points? Knowing the answers to these questions will make it possible to create a truly effective, tailored strategy for applying data science to get the most out of all of your unique current and future data assets.

Step two: Assess current data management strengths & weaknesses

Once this vision is in place, the next step is to comprehensively understand your organization’s data as it currently stands. To accomplish this, we always start by conducting stakeholder interviews within both IT and R&D departments. Stakeholder interviews allow us to understand what both R&D and IT departments want, and to begin to envision how all stakeholders can work together towards the common goal of implementing a great data management plan.

At this stage, everybody should come to an agreement about the definition of different terms, and start “speaking the same language”. For instance, commonly used words like ‘customer’, ‘project’, or ‘experiment’ might have different definitions for different people across the organization. Agreeing on a common definition for oft-used terms helps to smooth over future discussions about data management.

Next, we can start to get an idea of what the main data flows are, within the organization. This involves understanding details like who generates data, who uses different types of data, how communication is handled when one chunk of data is handed over to another department, how data is accessed, and what the current quality of data is. Getting an overview of “where we’re at” helps with beginning to envision “where we’re going”.

In the first blog in this series, we already covered some of the common data management challenges that many companies face – and it might seem like a formidable task indeed to solve all of those potential pitfalls in one fell swoop. But luckily, there is a useful framework of guiding principles for data management that we can use to do exactly that!

Step three: FAIR-ifying your data

Take a moment to envision a future where all of your scientific data and all of the research software that you use is not only easily findable and accessible but also both interoperable and reusable across platforms and cases. Wouldn’t that make your work life so much simpler?

We referred to the idea of a ‘good data management plan’ several times in the first blog in this series. But of course, what exactly that means can seem rather nebulous. Implementing the FAIR guiding principles for scientific data management and stewardship is a key step in achieving a good data management plan and eventually getting the most out of your data.

So what are these principles, you ask? In brief, the FAIR principles proclaim that well-managed data should be Findable, Accessible, Interoperable, and Reusable. Let’s go through each of these FAIR terms one by one:

The FAIR principles cover data findability, accessibility, interoperability, and reusability

Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018

Findability: this principle asserts that (meta)data should be able to be easily found – by both machines and by humans – and details how to accomplish that, for instance by making sure that both data and metadata are indexed in an easily searchable manner. In practice, this might mean that data is no longer stored hidden away within excel sheets that are difficult for coworkers to find, but rather in one centralized system.

Accessibility: this principle is all about the importance of making data easy to retrieve – both by humans and machines. For example, it states that (meta)data should be retrievable by their unique identifier using an open, free, and universally implementable communications protocol. Practically, this could involve explicitly including information about which (preferably open access) software is necessary to process specific data types.

Interoperability: this principle contends that all (meta)data should use a common language, which is formal, but accessible and broadly applicable, to allow different types of data to work together with each other. In effect, this can mean ensuring that units of measurement are consistently indicated in datasets, or taking care to annotate any programming scripts so that they are understandable for future users.

Reusability: this principle describes how data should be able to be used for purposes beyond its original intended or envisioned use. In practice, this might involve linking documentation detailing the instruments used to acquire the original data and how it was originally analyzed so that it’s clear how the data were obtained and processed.

Nowadays, the principles of FAIRness are widely accepted and can be applied not only to data but also to software, with the common goal of maximizing research value and efficiency. Key stakeholders like science funders, publishers, and governmental agencies are also beginning to require data management and stewardship plans for data generated in publicly funded experiments. So, FAIRification of your data might not only benefit your process of scientific discovery, but it may also eventually become necessary to align with external data management guidelines.

Step four: Writing an R&D data management roadmap

Ok, now that we have an idea of where our data management plan stands, it’s time to build a roadmap for bringing it to the next level – always keeping in mind your organization’s unique selling points and key business objectives.

The first action point here is to build a task force that can take charge of facilitating and eventually maintaining a new and improved, FAIR data management plan. And, the first job of that task force is to co-establish a vision for what a new and improved, future-proof, scalable, and FAIR data management plan should look like based on the organization’s unique priorities.

At this stage, getting some key stakeholders on board to assist in the setup and the maintenance of the new data management plan is key. Some key roles for facilitating and safeguarding the future updated data management plan include…

An executive sponsor, who understands the data management vision and programme, and will make sure that it gets the resources that it needs;
Data champions, who perform as advisors for FAIR data management practices;
and Data stewards, who enact and ensure that standard operating procedures for data creation and use alongside FAIRness principles are applied consistently.

These and other types of stakeholders can then work together with the task force to outline priorities and pain points for data management in more detail, which allows the task force to ensure that they design a workable agreement for everybody. We always recommend thinking big but starting small, to be able to deliver short-term solutions and gain traction for implementing the broader objectives later on.

Step five: Making the R&D data management roadmap a reality

This step can seem daunting, and that is why BioLizard goes the extra mile and makes sure to help organizations actually implement the data management plan of their dreams. Again, we always recommend starting small, with easy-to-implement, short-term recommendations to create some traction and get some quick wins. This can mean first tackling things like standardizing data formatting, improving already-existing automated data processing pipelines, or creating user-friendly dashboards for data visualization.

Once the data management ball gets rolling by implementing these quick wins, and people start to see all of the scientific benefits that a great data management plan can bring, it gets easier to focus on more long-term digital transformation to support data-driven innovation – like transforming all data and metadata into FAIR digital objects, or implementing Continuing Professional Development training on data management principles.

The R&D data management plan of your dreams

Once you have a great data management plan, you will no longer have to think about data management on a daily basis. Instead of spending your work days wondering (or worrying) about how to find or store your data, you are free to instead spend them on your true area of expertise!

This means that, although it does require some upfront investment, implementing a good data management plan usually ends up saving organizations a lot of time and money, because their data usage becomes more efficient and scalable, employees gain more focused time rather than spending their hours on manual data management, and brand-new insights can be generated thanks to a less biased and more inclusive approach to data collection and analysis.

We’ll get more into the benefits of having a great data management plan in the next blog in this series – but if this already sounds like the outcome you want for your organization, BioLizard is ready to become your data partner.

BioLizard is uniquely situated to support the creation of a strategic R&D data management plan, due to our combined expertise and understanding that spans the drug discovery process, biobusiness, and data analytics. Contact us today to find out more about how we can help you implement the data management plan of your dreams!

A step-by-step guide to R&D data management

Step one: Define your strategic goals and scientific use cases

Step two: Assess current data management strengths & weaknesses

Step three: FAIR-ifying your data

Step four: Writing an R&D data management roadmap

Step five: Making the R&D data management roadmap a reality

The R&D data management plan of your dreams

Help me with my data management!

Recommended Reading

BioLizard

Stay informed, connected, and inspired

Keep in touch

A step-by-step guide to R&D data management

Step one: Define your strategic goals and scientific use cases

Step two: Assess current data management strengths & weaknesses

Step three: FAIR-ifying your data

Step four: Writing an R&D data management roadmap

Step five: Making the R&D data management roadmap a reality

The R&D data management plan of your dreams

Help me with my data management!

Recommended Reading

Leveraging R&D data management to support data-driven R&D and innovation

Using data management to generate value

BioLizard

Stay informed, connected, and inspired

Keep in touch