About Desert Scientific Software
- Founded in 2000, based in Sydney, Australia
- Produces web-based, data-driven software for medicines research
- Focus on creating scientific software that is useful and easy to use
- A major goal is the development of new methods for investigating protein-ligand interactions and characterising binding affinities
- First major product release - Proasis2
- DesertSci Advantages:
- Python, Extreme Programming (XP), code re-use
- collaborations with industry
Current Issues
- The pharmaceutical research community all want to make full use of all available protein structure data in their projects
- In the past, protein structure database technology has lagged behind developments in small molecule databases
- most databases have rigid requirements associated with structure input
- most databases have limited scope of output options
- Proasis2 has solved these deficiencies
What is Proasis2?
- Proasis2 is a Client-Server, Protein Structure Database and Visualisation System for drug discovery applications
- Proasis2 focuses on
- storage and retrieval of 3D structures of proteins and their attributes
- bound ligands
- details of protein-ligand binding
- structure comparisons
- customisable
- performance and ease-of-use
Proasis2 Can Benefit a Range of Scientists
- Proasis2 is:
- an interoperable relational database system for research informatics
- a storage and retrieval system for cystallographers
- an aid for molecular modellers
- a tool for chemists seeking a better understanding of a drug target and its interactions with small molecules
More About Proasis2
- Stores in-house and public domain protein crystal structure data, NMR structure data, and results from homology modelling and docking studies in a flexible database system
- Provides a mapping from the space of crystallographic data to the space of medchem projects - gives a 'medicinal chemistry' view of data
- Automates routine molecular modeling tasks
- Access using web browser and popular molecular graphics packages
- Proasis2 is a system that works the same way a brain works - hiding information that is not relevant to the task
- for example, aminoacid residues distal to a bound ligand may be ignored when investigating ligand binding
Challenges Working With Protein Structure Data
- Protein structure data is complex - structures are very large and usually poorly resolved relative to small molecules
- Information resources are often disparate
- PDB format has many limitations
- Reliable chemical information hard to get
- Oligomeric systems require special attention
- Ligand binding modes are challenging to comprehend
- Legacy software designed for expert users
Proasis2 minimises burden of dealing with each of these
Proasis2: Implementation Details
- Server runs on SGI and Linux, Clients can use Windows, Unix, Macs
- Protein structures are indexed and annotated in Oracle or MySQL, data files are maintained on file system
- Server software written in Python and C
- Front-end typically HTML and Javascript, with all major browsers supported (Unix command line applications also available)
- Major graphics packages are supported, including RasMol, Chime, Accelrys (WebLab) Viewer; and Grasp on Silicon Graphics workstations
Proasis2: Data In
A Variety of Ligand Data is Stored
- Ligand attributes stored in databases tables - chemical name, registration ID, etc.
- 1D structure data stored as a molecular hash code (Ihlenfeldt and Gasteiger, J Comp Chem, 15, 793-813, 1994), for fast structure searching
- 2D structure data stored as SMILES (www.daylight.com), for sub-structure searching, and 2D Connection Tables (www.mdli.com), for creating depictions
- 3D Connection Table, for MCS based ligand overlays, and other molecular modelling applications
- Ligand chemistry obtained from:
- in-house corporate database
- manual input
- public domain resources, e.g., CONECT records, CIF dictionary
- predicted from geometry, e.g., using babel (often unreliable)
Structure Submission
- There are four methods for loading structures into the Proasis2 Database
- Web GUI
- Command line scripts - enabling batch submission
- Automatic structure submission - for legacy structures (and, soon to be implemeted, for weekly updates)
- Load pre-processed XML files - curated public domain data from DesertSci
Web GUI Structure Submission
- New structures can be individually loaded using a web browser
- Simple, two-step process:
- A form for uploading and pre-processing a pdb file (in-house or public domain) is provided
- Table data not obtained from the pdb parser is entered into a customised web page
3D Ligand Handling
- The creation of a valid Connection Table (CTab) for the ligand bound in the active site is the most demanding component of structure submission
- This information is required by the modelling community
- A 3D CTab is usually created from a 2D CTab (eg, from a regno2sdf routine) and ATOM, HETATM and CONECT records in the pdb file
- When it works properly, a 3D CTab is generated with correct bond orders, and same atom ordering and coordinates as in the pdb file
- Structure submission provides a report of the success of 3D CTab generation
3D Ligand Handling (cont.)
- A 3D CTab generation can fail at several points - due to problems with the data (sometimes with the methods)
- Worst case senarios:
- 2D CTab not provided, unable to create 3D CTab -> structure not submitted
- 2D CTab provided, unable to create 3D CTab -> load 2D CTab into field meant to hold 3D CTab (having correct chemistry is higher priority)
- 2D CTab not provided, automatic geration of 3D CTab -> load 3D CTab, probably with incorrect bond orders, into database
- 3D CTab stored in database may not even match ligand in protein, eg, missing pdb atoms or covalently bound ligand
Automatic Structure Submission
- DesertSci provides robust, automated methods that enable Proasis2 to be regularly updated with new public domain structures
- Proasis2 software can automatically:
- parse Header and Coordinate Sections of pdb files
- detect to which project a structure belongs
- identify any ligand(s) in the binding site using a structure overlay method
- identify multimeric systems
- Unclassified structures can be easily classified at a later time
Database Administration
- Comprehensive database administration facilities
- Proasis2 database can be updated through web pages, scripts, other connection clients
- Web pages enable database entries to be easily added, deleted, and modified
Security Issues
- Proasis2 is an intranet system protected by in-house firewalls and access protocols
- User and Administration Web pages are delineated enabling control of usage privileges
- Administration Web pages are password protected - users can only edit their own structures
- furthermore, groups can be defined so that all users within a group can edit one-another's structures
- All structures are backed-up in XML format - minimises risk of losing data
Proasis2: Data Out
Backing-Up With XML
- XML (Extensible Markup Language) - robust framework for transfer of information
- All data about a structure, including original pdb file, can be backed-up to disk in xml files
- Proasis2 xml files can be easily transferred between different database installations
- XML parsing done using SAX (Simple API for XML) events
Structure Viewing
- Individual structures can be retrieved based on their identifier
- Proasis2 delivers graphical representation of protein structure data at the click of a button
- Fully operational molecular graphics applications are automatically launched on the desk-top
- Structure views correspond to those most widely used in molecular modelling/rational drug design community
- The major graphics packages are supported and many visualization modes are available for whole protein or binding site region
- HTML link to a structure view can potentially be created from any web page on an intranet site
The Classification of Structures into MedChem Projects
- A key objective of Proasis2 has been to organise protein structure data specifically for life sciences research
- From the very beginning, structures have been organised into MedChem projects
- Projects are automatically updated with publics structures from RCSB and manually updated with inhouse structures by crystallographers
- Once a project is selected, ALL structures relevant to a project can be easily retrieved and explored
Project Hierarchies
- Within large pharma, MedChem projects are typically related to one another, for example, Angiogenesis and Kinase targets
- Thus, a hierarchical representation scheme was created for Proasis2
- Database Tables contain project names, project parents, project types
- Simple DFS searching is used for tree traversal
- Javascript used for displaying trees in web pages with a MS Windows Explorer display style,
The Bigger Picture - Multiple Hierarchical Classification Schemes
- Proasis2 is able to manage, and maintain, multiple classifications, for example:
- EC Number based classification
- Human Kinome based classification
- Ligand Chemotypes based classification (work in progress)
- Fold based, Motif based, ...
- DesertSci provides robust, automated methods that enable Proasis2 to be regularly updated with new public domain structures
- The intention is to provide access to protein structure data to suit the many different approaches to research and to help invent new ones
Structure Searching
- Text based searching is newly available
- Individual database fields can be searched, eg Title, Author, and/or Compound, or the entire Header Section can be searched
- Sub-structure searching, using DayCart, is newly available
- SMARTS searches and similarity searches will be implemented soon
- Alternative searching technologies could be applied instead, eg, using OpenEye software
- Sequence searching, using Blast, is newly available
- Searching for recently submitted structures is newly available
- All search queries can be fine-tuned:
- all or any subset of projects can be searched
- retrieve inhouse and/or public structures
- retrieve xray, nmr, and/or model structures
- a variety of sorting methods can be used
Web-based Downloading
- The following data can be easily downloaded from the web interface:
- Original PDB
- Curated PDB
- Ligand SDF with/without DataFields
- Structure Factor files
- Topology files
- Sequence in fasta format
- Complete XML
Align and Overlay Structures
'Science involves discovering the similarities between things that are different and the differences between things that are similar.' A. Schopenhauer
- Examining the similarities and differences between protein-ligand complexes is FUNDAMENTAL in rational drug design
- Structure superimpositions are usually very time-consuming
- Using Proasis2, overlays are made easy
Protein Structure Alignment and Overlay using Proasis2
- The are numerous ways to overlay structures:
- sequenced-based alignments and backbone superimpostions
- Secondary structure element based overlays
- Ligand-based overlays
- Multiple alignment versus pairwise
- By default, Proasis2 software uses full sequence alignments (or binding site residues) for structures within protein families, and RMS superimpositions of alpha carbon atoms
- Multiple structures are overlayed via multiple pairwise superimpositions onto a common reference structure
- All monomers within oligomers can be overlayed
Advanced Protein Structure Overlays
- Key features:
- Overlays can be done based on:
- full sequence alignments
- subsets of individually selected aminoacid residues
- small molecule similarity, eg kinases JNK3 and LCK (1jnk 1qpc)
- Any subset of database structures can be overlayed
- Fine-control over size of the binding site displayed
- All protein chains can be displayed or just the overlayed ligands
- Results can be downloaded to a local file
Small-Molecule Based Alignments
- A fast method for creating protein overlays, can be very useful as a first step in generating fine tuned solutions, eg, for Kinases
- Incorporates Spinifex - DesertSci's package for topological similarity based on maximum common substructures
- Uses algorithms based upon clique detection (for finding the MOS in an edge induced correspondence graph) and chemical heuristics (for minimising search space)
- Raymond et. al., 'Heuristcs for Similarity Searching of Chemical Graphs Using a Maximum Common Edge Subgraph Algorithm', J Comp Chem, 2002
- Examples ...
Small Molecule Alignments - Example 1
- A simple example showing the Maximum Overlapping Set for a pair of molecules
Small Molecule Alignments - Example 2
- An example showing matching ring systems for COX2 inhibitors
Protein-Ligand Binding Interactions
- Proasis2 enables exploration of prt-lig close contacts in a binding site
- Interactions computed using a close-packing like approach with atoms treated as non-spherical
- Using Proasis2, one can easily explore hydrogen bonding networks, explore hydrophobic interactions, and identify how ligand binding affinity might be improved
Highlighting of Residues Involved with Protein-Ligand Binding
- Using Proasis2, it is possible to highlight residues along a protein sequence that make contact with the ligand
- Residues can be differentiated depending on whether the ligand interacts with sidechain atoms, with mainchain atoms, or both mainchain and sidechain atoms
- Useful in a range of applications such as: exploring selectivity issues, comparing multiple complexes, analysing the results from docking studies