Do we still need specialised software in the age of AI?

27 Feb 2026

Why protein structure databases remain critical infrastructure for AI-driven drug discovery

By Dr. Neil Taylor

As AI capabilities expand at breakneck speed, a provocative question emerges: Will traditional software become obsolete? Can AI simply handle everything we currently rely on specialised applications for?

To answer this, let’s examine one of the most demanding use cases: drug discovery. Specifically, the role of protein structure databases in the design of both small molecule drugs and therapeutic antibodies.

Why protein structures matter

Whether you’re designing a small molecule to inhibit a kinase associated with a growing cancer, or engineering an antibody to neutralise a virus, you need one critical piece of information: the precise three-dimensional structure of your target protein.

This isn’t abstract knowledge. The difference between a drug that works and one that fails often comes down to angstroms – ten-billionths of a metre. A hydroxyl group positioned 2 angstroms differently can mean the difference between binding and not binding, between therapeutic effect and toxicity.

The irreplaceable role of curated databases

This is where databases like the Protein Data Bank (PDB) become invaluable resources. The PDB contains over 200,000 experimentally determined protein structures – each one painstakingly solved through X-ray crystallography, cryo-EM, or NMR spectroscopy, then validated, annotated, and made accessible in standardised formats.

Moreover, pharmaceutical companies have significant in-house structures that need their own database to validate, annotate and make accessible, their incredibly valuable data. The combination of the two, both public and private protein structure data, is where DesertSci’s Proasis becomes mission-critical infrastructure.

For small molecule design, you need:

High-resolution structures showing binding pockets
Multiple conformational states of the same protein
Structures with existing ligands bound (to understand binding modes)
Metadata about experimental conditions and structure quality

For antibody design, the requirements multiply:

Structures of antibody-antigen complexes
Epitope mapping data
Information about antibody frameworks and CDR loops
Comparative structures across different antibody classes

Could AI in drug discovery replace this? Let’s think this through.

Why AI alone isn’t enough

1. Truth requires measurement, not prediction

Yes, AlphaFold can predict 3D protein structure with remarkable accuracy. But “remarkable” isn’t the same as “sufficient.” When you’re designing a drug candidate that might enter human clinical trials, you need ground truth. Structures validated by experimental data are crucial, and computational predictions somewhere between 50 and 95% correct don’t measure up. Any uncertainty could be in the exact binding site region you’re targeting, and the margins for error in computational chemistry are very, very small.

Whilst AI inferencing is incredibly powerful, it’s not able to actually produce anything new. AI protein structure predictions are always biased by the training set, so a prediction will always ‘look’ like something from the training set, and will be likely to fail if the target is distinct from anything in the training set.

Moreover, AlphaFold predicts single static structures. Ligand-based drug design often requires understanding the molecular dynamics of proteins: how binding pockets open and close, how proteins flex when ligands bind, and which conformations are thermodynamically accessible. Accurate assessment requires experimental structures captured in different states, something no prediction model can reliably generate from sequencing alone.

2. Structured data that can be queried beats conversational retrieval

If you asked an AI to “show me all structures of GPCR proteins with antagonists bound, resolution better than 2.5Å, solved in the last five years.”

The AI will retrieve relevant looking information. But can you trust it found everything? Did it miss key examples for some reason? Fail to examine certain data or file types? Or, even more crucially, did it hallucinate a protein structure that doesn’t exist?

An experimentally validated protein structure database gives you:

Guaranteed complete results for defined queries
Standardised file formats (PDB, mmCIF) that downstream software can reliably parse
Provenance and validation metrics for every structure
APIs for programmatic access and integration into computational pipelines

3. Reproducibility is non-negotiable

Drug discovery is heavily regulated. When you submit a drug application to the FDA, you need to document exactly which structures you used, where they came from, and how you analysed them. “I asked an AI and it suggested this structure” doesn’t meet regulatory standards.

Tried and tested databases provide:

Persistent identifiers (PDB or industry specific IDs) that won’t change
Version control and update histories
Quality metrics and validation statistics
Citable, auditable records
ELN ready reports

The antibody design challenge

Reliable, validated data becomes even more critical for antibody therapeutics, which represent the fastest-growing class of drugs. AI in the design and development of antibody therapeutics is an opportunity, but also introduces risks.

Designing successful therapeutic antibodies requires:

Understanding the target epitope: You need structures showing exactly where on the antigen surface the antibody should bind. This requires experimentally determined antibody-antigen complex structures.
Optimising antibody frameworks: You need databases of antibody structures showing how different framework regions affect stability, expression, and immunogenicity.
Avoiding off-target binding: You need comprehensive structural data to predict whether your designed antibody might cross-react with human proteins, causing autoimmune issues.

Yes, AI can help generate candidate designs. But the validation loop requires checking those designs against curated protein structure databases. There is no shortcut.

The synthesis: AI-augmented databases

The future isn’t AI replacing protein structure databases, it’s databases becoming more powerful through AI integration and connectivity

AI-assisted curation: Using machine learning in drug discovery to help annotate and validate new structures
Hybrid approaches: Combining validated experimental structures with AlphaFold predictions to explore conformational space
Intelligent interfaces: Natural language queries that leverage the structured data underneath
Automated structure-function analysis: AI that identifies patterns across thousands of structures
Accessible, AI ready output: Repositories that can be automatically updated and directly accessed by AI model building teams

AI and machine learning in drug discovery can accelerate processes and unlock new opportunities in fragment based discovery, ligand based drug design and molecular dynamics. But in its current capability, it can’t replace the trustworthy foundations of experimentally validated, carefully curated, reliably accessible protein structure data.

The bottom line

We still need specialised software infrastructure such as protein structure databases for drug design. The protein structure database isn’t just a convenient tool; it’s the empirical foundation upon which both small molecule and antibody therapeutics are built.

AI is revolutionising how we design drugs, predict structures, and analyse molecular interactions. It’s enhancing the discovery process, but not replacing the requirement for reliable, validated, protein structure data.

What we know now is that we need our enterprise database systems to provide us with AI-ready data repositories that will enable us to routinely build the next iteration of predictive models. These enterprise systems must handle all our available public domain data, all our inhouse legacy data, and all new experimental results coming in daily from all sources.

The question for the future isn’t whether we need these databases, but how we build, maintain, and connect with the next generation of them, combining experimental rigor with AI-powered insights, maintaining resolute scientific standards while improving accessibility and accelerating progress.

One thing is for certain. When it comes to designing molecules that will be used in drugs provided to patients, “probably correct” isn’t good enough.

For more insights into AI-enabled drug discovery, protein structure data and research-grade scientific software, follow Dr Neil Taylor on LinkedIn. Or, if you’d like to arrange a demonstration of DesertSci’s Proasis, please get in touch with our team.

Posted in: Current

Comments: (0)