Discovering hidden outliers in PDB bulk releases with automated analysis

30 Aug 2025

How automated protein structure analysis can uncover hidden conformations and transform drug discovery.

By Dr Neil Taylor

The global Protein Data Bank (PDB) releases hundreds of new protein structures each month. A brilliant initiative for advancing world scientific capability, it also creates an ongoing challenge for researchers to stay current with this rapidly expanding resource. The challenge is exacerbated with bulk releases such as PanDDA analysis group depositions, where more than 100 closely related structures are deposited simultaneously.

PanDDA datasets present unique analytical hurdles. Unlike typical protein structures, their coordinate files encode multiple conformational states, with the biologically relevant bound ligand state often present at lower occupancy. This means the key state, the one most relevant for drug discovery, can be easily overlooked.

Case study: SARS-CoV-2 NSP3 macrodomain complexes

Let’s consider a recent bulk release of SARS-CoV-2 NSP3 macrodomain crystal structures. Hidden within this dataset were structural outliers that would be easily missed by traditional manual analysis methods.

Using a robust and automated protein structure database platform such as Proasis by DesertSci, researchers can systematically classify and compare large datasets to reveal subtle but important differences. Figure 1 demonstrates this capability, showing high-resolution structures (<1.0 Å) superimposed to highlight protein backbone variations and bound ligands.

Proasis image 1

Figure 1. High-resolution structures (<1.0 Å) superimposed to highlight protein backbone variations and bound ligands

The remarkable discovery

One particular outlier revealed an extraordinary insight. Despite its ligand appearing almost identical to others in the dataset, a subtle chemical difference triggered a dramatic protein loop flip (see arrow) in the SARS-CoV-2 NSP3 protein. This rearrangement represents the kind of activity cliff that medicinal chemists need to identify and understand.

Figure 2. A subtle chemical difference triggers a dramatic protein loop flip in the SARS-CoV-2 NSP3 protein

Why this matters for drug discovery

Large conformational changes caused by small ligand modifications are critical for structure-based drug design. They reveal:

Structure-activity relationships that explain potency cliffs
Cryptic binding pockets that open opportunities for novel ligand design
Mechanistic insights into protein flexibility that inform lead optimisation

These kinds of discoveries can lead to breakthroughs and provide a competitive edge for teams working on next-generation therapeutics.

The power of automation

The Proasis application transforms how researchers uncover such insights. By automating binding site classification, ligand recognition, structure overlays, and visualisation, it enables exploration of large protein structure datasets in just a few clicks.

In an era of exponential structural data growth, automated analysis isn’t just a convenient timesaver—it’s an essential tool for Structural biologists, Computational chemists, and Medicinal chemists alike. Without tools like Proasis, critical structural outliers risk being overlooked. With them, opportunities for breakthrough discoveries in drug design present themselves more clearly.

Dr Neil Taylor, founder of DesertSci, is a leading expert in protein structure data systems and structure-based drug design. Connect with him on LinkedIn to explore how accessible 3D protein structure data can accelerate your research.

Posted in: Current

Comments: (0)