Activity prediction benchmarking

27 Jun 2025

Why the future of structure-based drug discovery depends on better benchmarking for pose and activity prediction.

by Dr. Neil Taylor

There’s a direct link from the 2024 Nobel Prize in Chemistry – awarded for breakthroughs in protein structure prediction – back to the creation of the Critical Assessment of Structure Prediction (CASP) challenge in the 1990s. CASP provided researchers with invaluable opportunities: the ability to test computational predictions on unreleased experimental data, moving the field beyond retrospective validation. This rigorous, community-driven benchmarking has been key to advancing accuracy and reliability in structural biology.

In this article, we explore a similar call to action emerging within the small molecule drug discovery community. Building on discussions from the recent perspective “The Need for Ongoing Benchmarking in Binding Pose and Activity Prediction” (Kramer et al., J. Chem. Inf. Model., 2025, 65, 2180-2190), we further examine the case for establishing sustained, transparent benchmarking frameworks for computational drug design – particularly in predicting ligand binding poses and affinities.

Let’s take a look at why current approaches in structure-based drug discovery still fall short – drawing on the recent benchmarking analyses by Kramer et al – and explore, with added context, why challenges in pose and activity prediction are so important. In a nutshell, by pairing robust benchmarking with modern cheminformatic and bioinformatic tools like molecular dynamics simulations and machine learning, the industry has a clear opportunity to raise the standard of computer-aided drug discovery.

Background

The Kramer paper addresses significant barriers to faster progress in structure based drug discovery (SBDD). While SBDD has contributed meaningfully to the development of clinical candidates, the underlying processes, particularly predicting how small molecules (ligands) bind to protein targets, and how tightly they bind, remain surprisingly inconsistent.

This is largely due to a shortage of the high-quality experimental data needed to develop and validate new computational methods. For example, the authors reference a study in which only 26% of noncovalently bound ligands and 46% of covalent inhibitors could be accurately regenerated within 2.0 Å RMSD of the experimental pose, highlighting the complexity of both molecular simulation and docking ligand approaches in real-world scenarios.

Methodological overview

Kramer et al. provide a concise overview of existing methods for binding pose prediction and affinity scoring – generally categorised into two core stages: sampling and scoring. These approaches have evolved from combinations of physics-based, knowledge-based and empirical terms, to more recent strategies that integrate machine learning, deep learning, and even large language models.

With over 100 references cited, the authors thoroughly explore how these methodologies are applied today, and the limitations posed by outdated or non-diverse benchmark datasets. They advocate for developing new, high-quality datasets to enable better validation of emerging techniques in both ligand based drug design and fragment based drug discovery.

Critical insights

One of the paper’s strongest contributions is its spotlight on the lack of long-term community benchmarks in the pose- and activity prediction (P-AP) field. Unlike protein structure prediction, which has been continually improved through CASP for over 30 years, the small molecule drug discovery community lacks equivalent, sustained frameworks for progress.

This gap makes it harder for researchers to compare methods and track improvements in areas such as MD simulation, binding mode prediction, and molecular dynamics accuracy. Without standardised evaluations, innovations remain difficult to benchmark, limiting their broader adoption.

Limitations and challenges

The authors highlight several well-recognised issues that complicate benchmarking efforts:

Overlap between training and evaluation datasets
Use of unconfirmed decoys
Structurally complex or flexible binding sites
Presence of non-physical or artefactual poses
Variability in experimental data quality

These challenges are particularly relevant when using forcefield methods, such as molecular dynamics simulations, which rely on high-fidelity structural data. Such issues explain why progress in pose and activity prediction has lagged behind advancements in other areas of computational biology.

Future directions

Kramer et al. lay out a practical and aspirational roadmap for future benchmarking. Key recommendations include:

Introducing blinded evaluation methods for greater objectivity
Developing diverse datasets that reflect real-world therapeutic targets
Encouraging continuous updates and releases of benchmarking sets
Promoting collaboration across academia, industry, and competing organisations
Integrating cutting-edge technologies such as molecular dynamics and AI-based prediction tools

Together, these steps could radically improve how new computational methods are assessed, ultimately accelerating the pace of drug discovery.

Conclusion

This paper brings together twelve authors from leading institutions across Europe and North America – each with extensive experience in computational chemistry, medicinal chemistry, and applied drug discovery. Their call for the community to support long-term benchmarking efforts for pose- and activity prediction is timely and well justified.

One particularly impactful recommendation is the inclusion of activity cliffs – cases where similar molecules show vastly different binding affinities or modes. These are some of the most valuable and challenging examples for evaluating new technologies in ligand design, including de-novo design and fragment based optimistaion.

For those working in molecular simulation, knowledge-based cheminformatics, ligand based drug design, or SBDD more broadly, this paper offers an important roadmap for strengthening foundational tools in computer-aided drug design.

Dr. Neil Taylor, founder of DesertSci, is a leading expert in Structure Based Drug Design – connect with him on LinkedIn to learn more.

Posted in: Current

Comments: (0)