3DTF Help
The first mode of running 3DTF (Binding site analysis) gives information about the DNA-Protein interface, which can be used further in user-defined calculation of PWMs. It is fast and recommended to be tested first for every new PDB file, when the user is not sure about the suitability of the structure for 3DTF calculations. The analysis program will:
Print information about chans, numbers of residues/bases, atoms and the number of paired bases in PDB file;
On each strand, detect bases, contacting protein, and print their numbers, together with their paired bases from other strands;
Try to define the binding site as a contiguous region of basepairs, contacting protein.
The second mode is the most appropriate for the cases, when the interface can be unambiguously defined. Automated mode will select binding site on the first appropriate strand with ascending base numbering, and define the length of the site as ranging from the first contacting with the protein base-pair to the last, although the maximal length will be limited to 30.
At the third mode the interface is defined by the user, who should take care that the submitted starting bases are paired. It is also important that the first submitted strand/chain is a strand with ascending base numbering (as a rule downstream 5'->3' strand in PDB file ). The user can inspect the output of the binding site analysis program, to check the correctness of the input parameters.
Calculation time for small complexes (10 bp binding site length, 60 aa protein) is about 1 minute and will increase mainly with the length of the binding site.
Selection of calculation mode "long" will invoke calculations with larger number of random sequences, thus ensuring that not only PWM, but also a full model converges. Full model gives relative weights of positions in addition to PWM. These weights appear on the last column of calculated PWM and represent relative contribution of positions to binding energy, normalized to 1000. Here is an example of results from calculations based on 1gcc:
AC M3D000
P0 A C G T 195.3
01 33 960 1 5 C 72
02 0 557 79 363 Y 72
03 0 0 999 0 G 110
04 0 999 0 0 C 137
05 0 1000 0 0 C 214
06 0 0 999 0 G 106
07 854 127 14 4 A 67
08 0 787 0 212 C 77
09 1 990 8 0 C 78
10 459 113 411 15 R 63
XX
Users are adviced to avoid submitting unnecessary large complexes, as only the contacts closer that 10 A contribute to the energy and as such to PWM. There are limitations to the size of submitted structure file: the number of chains <21, number of residues/bases of each chain < 1000, number of atoms of each chain <10000. And the length of the binding site can not be larger than 30.
Submitted structure files should not have nonstandard or negative residue numbering, because these will not be processed correcltly by the server. Please use, for example, convpdb.pl tool from MMTSB tool set.
Repeatedly calculated PWMs may slightly differ from one another even for the same structure. This is caused by the fact that some of the sequences, used by the method, are generated randomly. In "short" calculation mode the relative weights of positions will not converge, i.e. will be significantly different from one calculation to another.
Submitted in the task 3 mode Chain IDs are assumed to be case specific, i.e. please specify Chain IDs exactly as they appear in PDB file.
The PWM derivation method is described here:
Denitsa Alamanova, Philip Stegmaier and Alexander Kel,
Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies,
BMC Bioinformatics 2010, 11:225
All-atom statistical potential function for protein-DNA interactions, used in the method is described here:
Robertson TA, Varani G: An All-Atom, Distance-Dependent Scoring Function for the Prediction of Protein-DNA Interactions From Structure.
PROTEINS: Structure, Function, and Bioinformatics 2007, 66:359-374