

|
DATABASE STRUCTURE AND ACCESS. Current implementation of the ProtCom database consists of two parts: two-chain protein-protein complexes (251 entries) and two-domain single-chain proteins (708 entries). The domain proteins were selected from single-chain proteins split into domains with the PDP program by picking up only those proteins which were parsed into two domains with one non-rigid link between them. The pdb files of the domain structures in the database are modified compared to the original pdb files (they have suffix _com after the pdb id code) by deleting 2 residues in the link between domains and assigning new chain designators, A and B, for the atoms belonging to the first and the second domains, respectively.
The proteins in the database were selected from X-ray structures in the PDB data bank and satisfy the following criteria: (i) sequence identity between A and B parts of the entry (hereafter, the notations A and B stand for the larger and smaller parts of the database entry, respectively, which, in general, may not coincide with the experimental chain designators in the pdb file) should be less than 95 %, (ii) sequence identity between A and B parts of any two entries i and j (i.e. between pairs [Ai, Aj] and [Bi, Bj] or between pairs [Ai, Bj] and [Bi, Aj] ) should be less than 95 %, (iii) area of the interface between A and B parts should be larger than 250 Ĺ2 and less than 50 % of surface accessible area of the either component, (iv) there should be at least 2 secondary structure elements (helix or strand) in the either component
Files in the database are grouped into three folders: (i) the PDB folder with 3D structure in the pdb format, the SEQ folder with the sequence file (in FASTA format) and (ii) the INF folder where all other relevant information is stored. File names nomenclature is 4-symbol pdb code with extension either pdb (for the structure files), seq (for sequence files) or inf (for the information files). Display of information is available either as a table with minimum relevant information for a list of proteins or as a plain text screen with all available information for a separate entry. The links in the table view are provided to the full-information displays for each listed protein. Two options are available for information downloads. The first option offers download of a plain text file containing either the pdb 3D structure or the full information set for a separate protein. The second option is to download a tar.gz archive containing full content of the database or user’s search or select results. A complementary download is available for the subset of the full database at 40 % sequence identity level. We used this subset previously for a jack-knife test of the homology-based method for predicting 3D structure of protein complexes.
The incorporated search engine (collection of Java script and Perl programs) offers choice of search options in a very easy user-friendly manner. If selected search criterion is against numerals, user is given a choice of searching for values exactly equal to (with 5% tolerance level), larger than, smaller than or between desired value(s). The search results can be sorted in ascending or descending order by a number of parameters. The option is offered to search either within the whole database, or only within 2-chain complexes or only within 2-domain structures. Also user can choose either AND or OR conjunction between the search criteria.
Access to the database is free of charge, but users are required to cite the original paer if usage of the database results in publication. Further, the users are given an option to send suggestions and comments to the authors of the database.
STATE OF THE DATABASE. As of February 2006, the database contains 959 entries of which 251 are the two-chain protein-protein complexes and 708 two-domain structures. Each entry accommodates information both for the whole complex (as pdb id, the name of the protein, as in pdb file after TITLE keywords, resolution of X ray structure, calculated absolute interface area) and separately for the A and B parts (total number of residues, number of residues on the interface, relative interface area, list of the interfacial residues and number of helices and strands). Search and sorting of search results are available by the pdb id, structure name, X-ray resolution, absolute interface area, total number of residues in A and B parts and total number of interfacial residues in A and B parts. In addition, user can search for the presence of up to 5 specific residues on the interface in the A and B parts and for presence of specific combination of up to 5 residues. For instance, if search criterion is chosen to be: “search for presence of Ala, Glu Trp and Lys residues “, then the search result will contain all the structures with any of the above residues on the interface. If user chooses to search for the presence of combination of Ala , Glu Trp and Lys residues, then the search result will contain structures with all four residues present on the interface. In this case in the table view of the search results, additional information about the found residue combination is displayed.
FUTURE DIRECTIONS. We plan for several directions of expanding and improving our database. First, an automatic upgrade tool, which would allow adding new structures constantly appearing in the PDB data bank and satisfying our selection criteria, will be developed. Secondly, we plan to include in the database multi-chain complexes and multi-domain structures. Further, the database structure allows easy incorporation of new information and search parameters, which will undoubtedly arise from the feedback of the database users.
POSSIBLE APPLICATIONS. The database is intended to provide a pool of templates for modeling 3D structures of protein-protein complexes. Each downloadable entry contains both structure and sequence of a complex (or domain-domain structure). The database can also be used for docking experiments, for evaluation of interfacial properties, for selection of set of complexes with particular interfacial properties. The selection can also be made with respect to the resolution of PDB structure. |
![]()
This database was created and is maintained by Petras Kundrotas