TRANSFAC® Release 7.0 - Documentation
Back to Table of Contents
Matrix: Contents
The MATRIX table contains 398 nucleotide distribution matrices of aligned binding sequences.
These sequences may have been obtained by in vitro selection studies or may be compiled sites of
genes. The source is appropriately indicated. The matrix entries have an identifier that indicates
one of four groups of biological species (V$, vertebrates; I$, insects; P$, plants; F$, fungi; N$, nematodes;
B$, bacteria), followed by an acronym for the factor the matrix refers to, and a consecutive number discriminating
between different matrices for the same factor. Thus, V$OCT1_02 indicates the second matrix for vertebral Oct-1 factor.
Instead of the consecutive number, those matrices which have been generated from TRANSFAC® SITE entries connected to a
certain transcription factor, IDs end up with an abbreviation of
the least quality of the sites used to construct the matrix. E. g., V$CREB_Q2 is a matrix constructed of CREB binding
sites of quality 2 or better. Finally, a matrix with an ID like V$AP1_C has been derived
from a "consensus description" constructed with the aid of ConsIndex (Frech et al., Nucleic Acids Res. 21:1655-1664, 1993).
The matrix area gives the nucleotide frequencies observed in aligned binding sites of
the corresponding transcription factor (or, more general, in aligned sites of the described function);
an additional column depicts the IUPAC string consensus derived from the matrix according to the following rules
(adapted from Cavener, Nucleic Acids Res. 15:1353-1361, 1987): a single nucleotide is shown if its frequency is greater
than 50% and at least twice as high as the second most frequent nucleotide. A double-degenerate code indicates that the
corresponding two nucleotides occur in more than 75% of the underlying sequences but each of them is present in less than 50%.
Usage of triple-degenerate codes is restricted to those positions where one of the nucleotides did not show up at all in the
sequence set and none of the afore-mentioned rules applies. All other frequency distributions are represented by the letter "N".
Fields
AC Accession no.
XX
ID Identifier
XX
DT Date; author
XX
NA Name of the binding factor
XX
DE Short factor description
XX
BF List of linked factor entries
XX
PO A C G T Position within the aligned sequences,
01 frequency of A, C, G, T residues, resp.;
02 last column: deduced consensus in
03 IUPAC 15-letter code
XX
BA Statistical basis
XX
BS Factor binding sites underlying the matrix
BS (SITE accession no.; Start position for matrix sequence;
length of sequence used;
BS number of gaps inserted; strand orientation)
XX
CC Comments
XX
RX MEDLINE ID
RN Reference no.
RA Reference authors
RT Reference title
RL Reference data
XX
//