TFBLAST - Documentation

What is TFBLAST?

The TFBLAST program was written to provide a tool for searches in the TRANSFAC Factor Table. The search algorithm used for the database search is the BLAST algorithm: BLASTX for searches against nucleotide sequences and BLASTP for searches against peptide sequences.

If the program input is one or multiple nucleotide sequences they are, through the BLAST algorithm, translated to protein sequences in all possible reading frames on both strands of the DNA sequence. In case the input consists of peptide data the sequences are directly used for the database search. Subsequently the input sequences are compared to the protein sequences in the TRANSFAC Factor Table. By setting threshold parameters the output of the BLAST algorithm can be filtered, thus excluding hits falling below a defined identity, score or length threshold.

The BLAST Algorithm

The BLAST algorithm ("Basic Local Alignment Search Tool") is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. BLAST uses a heuristic algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share only isolated regions of similarity.
[cf. http://www.ncbi.nlm.nih.gov/blast/]

The search tools used for the TFBLAST program are the BLASTX and the BLASTP algorithms, version 2.0.13 (from May-26-2000). The BLAST is freely distributed, downloads are available at
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/.

BLAST Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.

Variables in the TFBLAST Input Form

SEQUENCES
The format of the sequences which can be used as input for the TFBLAST is determined by the "READSEQ" program. Before running the search in the TRANSFAC Factor Table any input will be formatted by this tool.
Formats which readseq currently understands:
- IG/Stanford, used by Intelligenetics and others
- GenBank/GB, genbank flatfile format
- NBRF format
- EMBL, EMBL flatfile format
- GCG, single sequence format of GCG software
- DNAStrider, for common Mac program
- Fitch format, limited use
- Pearson/Fasta, a common format used by Fasta programs and others
- Phylip3.2, sequential format for Phylip programs
- Phylip, interleaved format for Phylip programs (v3.3, v3.4)
- Plain/Raw, sequence data only (no name, document, numbering)
- MSF multi sequence format used by GCG software
- PAUP's multiple sequence (NEXUS) format
- PIR/CODATA format used by PIR
- ASN.1 format used by NCBI
The preferred format for the BLAST searches is FASTA !!!

TITLE
You may type any text you want to help you identify the TFBLAST search results.

SEQUENCE TYPE
There are two sequence types: Nucleotide and Peptide sequences. Default setting for the BLAST searches are the "Nucleotide" sequences - there is NO automatic sequence type recognition !!!. For "Nucleotide" sequences BLASTX is used as search algorithm; for "Peptide" sequences BLASTP is used.

IDENTITY THRESHOLD
If you want to exclude search hits with only low sequence identity values you can select your own identity threshold. The BLAST hits falling below the threshold will still be visible in the Original BLAST Results, but will be excluded from the web page.

SCORE THRESHOLD
If you want to exclude search hits with only low BLAST score values you can select your own BLAST score threshold. The BLAST hits falling below the threshold will still be visible in the Original BLAST Results, but will be excluded from the web page.

LENGTH THRESHOLD
If you want to exclude short search hits you can select your own sequence length threshold. The BLAST hits falling below the threshold will still be visible in the Original BLAST Results, but will be excluded from the web page.

...go back to the TFBLAST input form...