TRANSFAC® Release 7.0 - Documentation
Back to Table of Contents
Site: Criteria
The first criterion for a site to be included in TRANSFAC® is protein
binding, the second is function. Assigned to each site is an unambiguous
accession number and an identifier. The latter is composed of a hint
onto the species (e. g., HS for human), a code for the gene description
and a consecutive number for each entry referring to a particular gene.
Thus, HS$BAC_02 refers to the 2nd entry for the human gene for beta-actin.
The description of a gene is the name of the genes itself or of its
product, depending on what the more common term may be.
The positions have preferably been taken from DNase I footprinting
studies, if available. The next preference is for chemical
modifications, the last for gel retardation assays. In case of different
positional information for both DNA strands, the more upstream position
has been taken for the 5' border, the more downstream position for the
3' border of the site. If not stated otherwise in the S1 field, the
position numbers generally refer to the transcription start site.
Occasionally (or normally for yeast genes due to their generally more
heterogeneous cap site), they may refer to the translation start codon
stated as 1:ATG. Other reference systems such as defined restriction
sites may be indicated as well. If SF and ST are 0, no positions are
given by the references cited. If SF has a negative or positive value,
but ST is 0, no precise boundaries of the site have been given but has
been located "around position [SF]" instead.
The sequences depicted have been taken from the literature. Some
conflicting data with sequences within the EMBL data library are
mentioned in the comment field. In case of diverging site borders on
both strands, only the overlapping sequence is given. When the authors
emphasized a certain sequence motif within a sequence, it is written
in capitals while the rest of the sequence is shown in lowercase
letters.
Cross-references to the EMBL data library also give the positions of
the TRANSFAC® site within the EMBL sequence, negative numbers pointing to
the complementary strand.
The factor which binds to this sequence element is given with its
TRANSFAC® accession number of the FACTOR table and (one of) its name(s)
(see FACTOR table for possible synonyms), and a "quality" value ranging
from 1 to 6 and reflecting the experimental reliability of a certain
protein-DNA interaction. These values have the following meaning:
- Functionally confirmed factor binding site
- Binding of pure protein (purified or recombinant)
- Immunologically characterized binding activity of a cellular extract
- Binding activity characterized via a known binding sequence
- Binding of uncharacterized extract protein to a bone fide element
- No quality assigned
The cellular protein source used to identify a particular site is included in the
SITE table as well.