By Dr. Neil Taylor
As AI capabilities expand at breakneck speed, a provocative question emerges: Will traditional software become obsolete? Can AI simply handle everything we currently rely on specialised applications for?
To answer this, let’s examine one of the most demanding use cases: drug discovery. Specifically, the role of protein structure databases in the design of both small molecule drugs and therapeutic antibodies.
Whether you’re designing a small molecule to inhibit a kinase associated with a growing cancer, or engineering an antibody to neutralise a virus, you need one critical piece of information: the precise three-dimensional structure of your target protein.
This isn’t abstract knowledge. The difference between a drug that works and one that fails often comes down to angstroms – ten-billionths of a metre. A hydroxyl group positioned 2 angstroms differently can mean the difference between binding and not binding, between therapeutic effect and toxicity.
This is where databases like the Protein Data Bank (PDB) become invaluable resources. The PDB contains over 200,000 experimentally determined protein structures – each one painstakingly solved through X-ray crystallography, cryo-EM, or NMR spectroscopy, then validated, annotated, and made accessible in standardised formats.
Moreover, pharmaceutical companies have significant in-house structures that need their own database to validate, annotate and make accessible, their incredibly valuable data. The combination of the two, both public and private protein structure data, is where DesertSci’s Proasis becomes mission-critical infrastructure.
Could AI in drug discovery replace this? Let’s think this through.
Yes, AlphaFold can predict 3D protein structure with remarkable accuracy. But “remarkable” isn’t the same as “sufficient.” When you’re designing a drug candidate that might enter human clinical trials, you need ground truth. Structures validated by experimental data are crucial, and computational predictions somewhere between 50 and 95% correct don’t measure up. Any uncertainty could be in the exact binding site region you’re targeting, and the margins for error in computational chemistry are very, very small.
Whilst AI inferencing is incredibly powerful, it’s not able to actually produce anything new. AI protein structure predictions are always biased by the training set, so a prediction will always ‘look’ like something from the training set, and will be likely to fail if the target is distinct from anything in the training set.
Moreover, AlphaFold predicts single static structures. Ligand-based drug design often requires understanding the molecular dynamics of proteins: how binding pockets open and close, how proteins flex when ligands bind, and which conformations are thermodynamically accessible. Accurate assessment requires experimental structures captured in different states, something no prediction model can reliably generate from sequencing alone.
If you asked an AI to “show me all structures of GPCR proteins with antagonists bound, resolution better than 2.5Å, solved in the last five years.”
The AI will retrieve relevant looking information. But can you trust it found everything? Did it miss key examples for some reason? Fail to examine certain data or file types? Or, even more crucially, did it hallucinate a protein structure that doesn’t exist?
An experimentally validated protein structure database gives you:
Drug discovery is heavily regulated. When you submit a drug application to the FDA, you need to document exactly which structures you used, where they came from, and how you analysed them. “I asked an AI and it suggested this structure” doesn’t meet regulatory standards.
Tried and tested databases provide:
Reliable, validated data becomes even more critical for antibody therapeutics, which represent the fastest-growing class of drugs. AI in the design and development of antibody therapeutics is an opportunity, but also introduces risks.
Designing successful therapeutic antibodies requires:
Yes, AI can help generate candidate designs. But the validation loop requires checking those designs against curated protein structure databases. There is no shortcut.
The future isn’t AI replacing protein structure databases, it’s databases becoming more powerful through AI integration and connectivity
AI and machine learning in drug discovery can accelerate processes and unlock new opportunities in fragment based discovery, ligand based drug design and molecular dynamics. But in its current capability, it can’t replace the trustworthy foundations of experimentally validated, carefully curated, reliably accessible protein structure data.
We still need specialised software infrastructure such as protein structure databases for drug design. The protein structure database isn’t just a convenient tool; it’s the empirical foundation upon which both small molecule and antibody therapeutics are built.
AI is revolutionising how we design drugs, predict structures, and analyse molecular interactions. It’s enhancing the discovery process, but not replacing the requirement for reliable, validated, protein structure data.
What we know now is that we need our enterprise database systems to provide us with AI-ready data repositories that will enable us to routinely build the next iteration of predictive models. These enterprise systems must handle all our available public domain data, all our inhouse legacy data, and all new experimental results coming in daily from all sources.
The question for the future isn’t whether we need these databases, but how we build, maintain, and connect with the next generation of them, combining experimental rigor with AI-powered insights, maintaining resolute scientific standards while improving accessibility and accelerating progress.
One thing is for certain. When it comes to designing molecules that will be used in drugs provided to patients, “probably correct” isn’t good enough.