TRANSPATH® Report 1, 0001 (2003) |
Application of automatic Pfam annotation to TRANSPATH® |
Philip Stegmaier, TRANSPATH_Team BIOBASE GmbH Halchtersche Strasse 33 D-38304 Wolfenbuettel, Germany |
Protein sequences from TRANSPATH® release 3.3 were searched with sequence family models from Pfam release 7.5 [1]. A subset of the Pfam-A database was derived through parsing the swisspfam file of the current Pfam release, available from the Pfam ftp site, and extracting relevant models from the database. TRANSPATH® proteins were searched with this subset and sequence family hits are documented with a general E-value cut-off of 0.5. Overlapping matches are tolerated up to a length of 10% of each of the affected hits, otherwise only the model with the lower E-value is considered. The software used for the extraction of models from the Pfam-A database and for searching TRANSPATH® proteins with the derived Pfam subset was HMMER 2.2g [2][3]. For each Pfam family shown in the match display of a TRANSPATH® molecule, the raw score and the E-value are given. While raw scores can be negative and increase with the quality of a match, E-values are always greater or equal to zero and are smaller the greater the quality of a match. An E-value reports the estimated significance of a hit, while the raw score reports the score of an alignment between the matched region and the model. Empirically, matches with E-values computed with the applied software, which are equal to or below 0.1 can be considered to be significant despite a negative raw score. Still, true matches can have much higher E-values. At least one of 410 Pfam sequence families per protein was documented for 2938 of 3088 TRANSPATH® sequences. |
[1] PMID: 11752314 Bateman A, Birney E, Cerruti L, Durbin R, Etwiller L, Eddy SR, Griffiths-Jones S, Howe KL, Marshall M, Sonnhammer ELL. Nucleic Acids Res. 30:276-280 (2002). [2] Eddy SR. HMMER: Profile hidden Markov models for biological sequence analysis (http://hmmer.wustl.edu/). [3] PMID: 9918945 Eddy SR. Profile hidden Markov models. Bioinformatics 14:755-763 (1998). |