Gene Regulation

TRANSFAC^® Release 7.0 - Documentation

Matrix: Contents

The MATRIX table contains 398 nucleotide distribution matrices of aligned binding sequences. These sequences may have been obtained by in vitro selection studies or may be compiled sites of genes. The source is appropriately indicated. The matrix entries have an identifier that indicates one of four groups of biological species (V$, vertebrates; I$, insects; P$, plants; F$, fungi; N$, nematodes; B$, bacteria), followed by an acronym for the factor the matrix refers to, and a consecutive number discriminating between different matrices for the same factor. Thus, V$OCT1_02 indicates the second matrix for vertebral Oct-1 factor. Instead of the consecutive number, those matrices which have been generated from TRANSFAC^® SITE entries connected to a certain transcription factor, IDs end up with an abbreviation of the least quality of the sites used to construct the matrix. E. g., V$CREB_Q2 is a matrix constructed of CREB binding sites of quality 2 or better. Finally, a matrix with an ID like V$AP1_C has been derived from a "consensus description" constructed with the aid of ConsIndex (Frech et al., Nucleic Acids Res. 21:1655-1664, 1993). The matrix area gives the nucleotide frequencies observed in aligned binding sites of the corresponding transcription factor (or, more general, in aligned sites of the described function); an additional column depicts the IUPAC string consensus derived from the matrix according to the following rules (adapted from Cavener, Nucleic Acids Res. 15:1353-1361, 1987): a single nucleotide is shown if its frequency is greater than 50% and at least twice as high as the second most frequent nucleotide. A double-degenerate code indicates that the corresponding two nucleotides occur in more than 75% of the underlying sequences but each of them is present in less than 50%. Usage of triple-degenerate codes is restricted to those positions where one of the nucleotides did not show up at all in the sequence set and none of the afore-mentioned rules applies. All other frequency distributions are represented by the letter "N".

Fields

AC        Accession no.
XX
ID        Identifier
XX
DT        Date; author
XX
NA        Name of the binding factor
XX
DE        Short factor description
XX
BF        List of linked factor entries
XX

PO        A   C   G   T      Position within the aligned sequences,
01                           frequency of A, C, G, T residues, resp.;
02                           last column: deduced consensus in
03                           IUPAC 15-letter code
XX
BA        Statistical basis
XX
BS        Factor binding sites underlying the matrix
BS        (SITE accession no.; Start position for matrix sequence; 
           length of sequence used;
BS        number of gaps inserted; strand orientation)
XX
CC        Comments
XX
RX        MEDLINE ID
RN        Reference no.
RA        Reference authors
RT        Reference title
RL        Reference data
XX
//