TFBLAST - Documentation

What is TFBLAST?

The TFBLAST program was written to provide a tool for searches in the TRANSFAC Factor Table. The search algorithm used for the database search is the BLAST algorithm: BLASTX for searches against nucleotide sequences and BLASTP for searches against peptide sequences.

If the program input is one or multiple nucleotide sequences they are, through the BLAST algorithm, translated to protein sequences in all possible reading frames on both strands of the DNA sequence. In case the input consists of peptide data the sequences are directly used for the database search. Subsequently the input sequences are compared to the protein sequences in the TRANSFAC Factor Table. By setting threshold parameters the output of the BLAST algorithm can be filtered, thus excluding hits falling below a defined identity, score or length threshold.

 
 

The BLAST Algorithm

The BLAST algorithm ("Basic Local Alignment Search Tool") is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. BLAST uses a heuristic algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share only isolated regions of similarity.
[cf. http://www.ncbi.nlm.nih.gov/blast/]

The search tools used for the TFBLAST program are the BLASTX and the BLASTP algorithms, version 2.0.13 (from May-26-2000). The BLAST is freely distributed, downloads are available at
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/.

BLAST Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.

 
 

Variables in the TFBLAST Input Form

  • SEQUENCES
    The format of the sequences which can be used as input for the TFBLAST is determined by the "
    READSEQ" program. Before running the search in the TRANSFAC Factor Table any input will be formatted by this tool.

    Formats which readseq currently understands:

    • IG/Stanford, used by Intelligenetics and others
    • GenBank/GB, genbank flatfile format
    • NBRF format
    • EMBL, EMBL flatfile format
    • GCG, single sequence format of GCG software
    • DNAStrider, for common Mac program
    • Fitch format, limited use
    • Pearson/Fasta, a common format used by Fasta programs and others
    • Phylip3.2, sequential format for Phylip programs
    • Phylip, interleaved format for Phylip programs (v3.3, v3.4)
    • Plain/Raw, sequence data only (no name, document, numbering)
    • MSF multi sequence format used by GCG software
    • PAUP's multiple sequence (NEXUS) format
    • PIR/CODATA format used by PIR
    • ASN.1 format used by NCBI
    The preferred format for the BLAST searches is FASTA !!!


...go back to the TFBLAST input form...