Gene Regulation

TRANSFAC^® Release 7.0 - Documentation

Site

As outlined above, SITE gives information on individual (putatively) regulatory protein binding sites. In this release, it contains 7915 entries, 6360 of them referring to sites within 1504 eukaryotic genes, the species of which ranging from yeast to human. Additionally, this table comprises 1295 artificial sequences which resulted from mutagenesis studies, in vitro selection procedures starting from random oligonucleotide mixtures or from specific theoretical considerations. And finally, there are 260 entries with consensus binding sequences given in the IUPAC code, many of them being taken from the compilation of Faisst and Meyer (Nucleic Acids Res. 20:3-26, 1992). The symbols used in addition to A, C, G, or T for these consensi are:

W	= A or T	S	= C or G
R	= A or G	Y	= C or T
K	= G or T	M	= A or C
B	= C, G, or T	D	= A, G, or T
H	= A, C, or T	V	= A, C, or G
N	= A, C, G, or T

A number of consensi have been generated by the TRANSFAC^® team, generally derived from the profiles stored in the MATRIX table. Here, the use of degenerate codes follows the following rules (adapted from Cavener, Nucleic Acids Res. 15:1353-1361, 1987): a single nucleotide is shown if its frequency is greater than 50% and at least twice as high as the second most frequent nucleotide. A double-degenerate code indicates that the corresponding two nucleotides occur in more than 75% of the underlying sequences but each of them is present in less than 50%. Usage of triple-degenerate codes was restricted to those positions where one of the nucleotides did not show up at all in the sequence set and none of the afore-mentioned rules applies.

Collectively, SITE contains 6321 sequences with 116035 nucleotides.