TRANSPATH Professional FAQ

TRANSPATH - FREQUENTLY ASKED QUESTIONS

Data content
Hyperlinks to external databases
Using TRANSPATH^® Professional
Software

What percentage of the known universe of signal transduction molecules (and by category if available, particularly kinases) are currently annotated in TP?

We can give an approximate answer for human entries. As a rule of thumb, if you look at 30,000 genes in the human genome, without considering splice variants:

About 3,700 or 12.2% of the known genes in the human genome were classified in GO as "signal transduction". (868 or 2.8% of the known portion were classified as "kinase". TRANSPATH contains 299 human protein kinases, or about 34% of all known kinases). At the moment there are 2,875 entries for human proteins in TRANSPATH^® Professional, which corresponds to about 77% of all known genes relevant to signal transduction.

[Numbers are from: J. Craig Venter et.al (2001), "The Sequence of the Human Genome", Science 291, 1304-1351]

Where do I find, how many human molecules TRANSPATH contains?

Specific numbers are presented in "information" -> "statistics". Beside the count for human molecules, that TRANSPATH contains, you will find other species listed, numbers of links to external databases and more. Another approach to retrieve specific numbers is to use our search machine, that offers differentiated and complex search queries. The documentation pages help you to fill in the correct terms and to choose the adequate fields.

What information is included in TRANSPATH and who makes those decisions? What proteins receive priority?

In principle we read the scientific literature on a certain topic. Usually the most recent reviews are used for getting important signal transduction molecules in a specific context. Papers cited in those reviews are processed and additional papers concerning these molecules are searched for in databases such as PubMed. Then we start a combined search in PubMed (co-occurrence and also interaction/binding of one molecule of interest). Search results in Pubmed, of course, have to be carefully reviewed to select the relevant literature manually. Terms such as binding/interaction/phosphorylation/complex should be included in the text. Furthermore we ignore papers that lack information about the origin of cDNA constructs, recombinant proteins (i.e. from human/mouse, etc.), unless some other interesting information is mentioned. The more recent primary papers (past 2 years if possible) and those that show most interactions between molecules are selected.

The trend for our annotation is derived from the needs of customers, reflects the current research focus and has to meet the demands of internal requirements (filling in data gaps, scientific relevance, integration of our databases) and is done by the product managers and annotation-coordinators. We also offer service contracts for in depth annotation of specific topics: the customer commissions the integration of specific data.

How far back are you planning to go in literature coverage?

We will not go back in time to cover all of the old and often outdated literature, but we will concentrate rather on the current state of the art. Basically, we will take in any paper that has been shown to play a central role for some molecule or pathway by being repeatedly referenced in current reviews or central articles.

How often is a given entry revisited and updated?

Upgrading each single data entry is not possible for each release, since we focus our annotation topics. However, as signaling molecules in a certain pathway are usually highly connected to other pathways, reactions and molecules are frequently updated at least once a year (largely depending on the topic). So, we cannot offer a complete update to all entries, but rather a most recent picture of our present topics. Furthermore, if more information about a particular issue is required, the customer can make use of service contracts (see above).

TRANSPATH has a pathway named 'magic pathway' that includes IP3 (MO000000332) and PKAc (MO000000001) as you know. What is this pathway and why is it included?

This is right - this pathway has been included simply as a joke! You may consider this as a German (or even international?) tradition: in "Psychrembel", a highly renowned and serious dictionary about medical, pathogenic topics, you will find at least one entry, which is not serious! It's the "Steinlaus" - a non-existing fabulous creature, that a comedian has invented. Hence, what the "Steinlaus" is for the "Psychrembel", the "magic pathway" is for TRANSPATH! Nothing to be taken serious! Something to make people smile!

Why did you introduce a new reaction hierarchy in the database version 5.1? What use is there in introducing pathway and evidence chains?

With our database we soon realized, that the connectivity is really high. Thus it is very important to set filters in order to concentrate molecules to pathway-relevant contents. The other important reason is, that we like to depict physiological relevance of the data. Even if protein A interacts with protein B in vivo, they might not interact in real cells. On the other side, they might interact under physiological conditions, but there might be no function (no signal transfer). Hence it is important to assign pathways as a kind of sorting in order to depict reality.

Could you describe the difference between semantic, pathway step, and molecular evidence reactions?

Pathway step reactions are developed in order to abstract all knowledge on one reaction level. The very bottom reaction level "molecular evidence" are those reactions, which depict real experiments (species-specific, methods, material etc.). From these we abstract the reaction level "pathway step" in order to summarize, what have been shown in experiments. In this way reactions from different species are merged into another reaction level. This pathway level is simplified once more in the "semantic level". Here you find the notation, which is frequently used in the scientific community: A -> B, and therefore understood best.

Can you explain the idea and function of decomposition reactions?

The purpose of decomposed reactions is to clarify active components in a big complex. So the task of decomposition is to describe crucial events within a huge molecule complex. A decomposition reaction is specific for a pathway step reaction. If there is a very complicated pathway step reaction, we try to make this complicated pathway step reaction clear by creating and attaching a decomposition reaction. So decomposition reactions are always an appendix for a pathway step reaction, decomposition reaction should never be alone. Decomposition reactions can be binding reactions as well as phosphorylation/dephosphorylation events, exchange etc.

Do you regard kinetic parameters for reactions?

If we encounter such parameters in the literature, we store this data as a comment with the category kinetics. But this kind of information is rare and may often be published in papers other than the ones we choose according to signal transduction information.

What about the mutations, disease implications, and annotating around drug interactions with molecules?

All this is planned for the future, and a database module for genotype-phenotype mapping and diseases in pathways is under construction. Right now, the focus is to cover as much of the normal pathways as possible. Nevertheless, it is still possible to search for disease related molecules with a selection of keywords.

What is done for quality assurance of TRANSPATH?

All data is checked for consistency, completeness of the entries, correct spelling, and standard fulfilment prior to each release. Also, our database client software strives to maintain the quality by permitting only specific combinations of data input (pulldown-menues, defined vocabulary).

Will TRANSPATH entries link to GenBank for cloning purposes?

In fact, TRANSPATH already links to EMBL, which is synchronized with GenBank and DDBJ. Some entries in TRANSPATH represent families or complexes, and for these it is not possible to give an EMBL accession number. Numbers are provided for species specific protein entries.

Will there be extensive sequence annotation in the molecule entries? How about links to PROSITE?

TRANSPATH is a database dealing with signaling networks and pathways, and with reactions and interactions.

There are plenty of protein databases like SWISS-PROT and PROSITE providing in-depth structural annotation for the molecules - we do not want to duplicate these efforts. Instead, we focus on the reactions and relations between molecules, and link molecules to these databases. There are currently about 900 proteins linked to PROSITE and about 4000 linked to SWISSPROT.

To which external databases does TRANSPATH provide hyperlinks?

Links are present for a large number of databases. The focus lies on linking TRANSPATH with SwissProt, EMBL, InterPro, Entrez Gene, UniGene, GO, DIP, BIND, and HyperCLDB. For some of the molecules links to PDB, PROSITE, Flybase, MGD, and others are also provided.
As a new feature for release 3.1, molecules are linked to Affymetrix micro-array probe set identifiers.

I have 2 questions:
1) What's the best way to split out the database external hyperlinks from molecule.DR.idx?
2) Am I correct to look for the number of DRs for each molecule in molecule.fieldcounts.idx?

You selected the correct delimiter '.', but you have to take care of accession numbers carrying a '.' like UNIGENE. Unfortunately, molecule.fieldcounts.idx can not be used for verifying DR.

How many genes can the ArrayAnalyzer^TM (AA) handle?

The limiting factor for interpreting micro-array-results is not the AA, but the search for the molecule of interest. At the moment, lists with up to 1000 entities can be processed (Pentium III, 700 MHz). The AA can handle this number and more. However, there are efforts to increase this capacity in future releases.

How do I get all transcription factors, which regulate a given gene?

Search for the gene of interest. In the gene entry you will find all transcription factors, that regulate this gene, via all linked reactions (transregulation). If you would like to visualize this, just start the PathwayBuilder from this gene entry and select one upstream step. For further characterization of the transcription factor - gene relation (e.g. looking at the specific DNA-binding site) looking at TRANSFAC would be useful, the database about transcription factor - DNA-binding site regulation.

How do I get all genes, that are regulated by one transcription factor?

Search for the transcription factor of interest in the molecule table. Here you will find all genes regulated by your transcription factor.

Is it possible to find a protein TRANSPATH, of which I do not know much, but its molecular weight and its isoelectric point?

Indeed, this is possible. For each molecule in our database, for which a sequence is known, sequence and its calculated molecular weight and isoelectric point is stored. For finding a molecule candidate for a specific molecular weight and isoeletric point search in the molecule table in the quick search field "sequence length, molecular weight" for the search term 1 and "isoelectric point" for search term 2.

How do one find, given a Entrez Gene ID, all the molecule ID (MO00xxxxx), either single protein entity e.g. A or a complex, of which protein A is a part?

For this search the molecule with your Entrez Gene ID selecting "external database hyperlinks" in the quick search field. Open the molecule with this particular Entrez Gene. In this molecule entry you will find your desired information in the lines "modified form" and "complexes".

How can I find the connection between a signaling molecule and target transcription factors?

Start from the molecule of interest, enter the PathwayBuilder^TM interface, choose show shortest path to and select transcription factors from the pre-defined target group list. If the result page is empty, you can extend the range of the search directly in the visualization. Transcription factors will be highlighted in a grey-blue colour. This procedure can be adapted to other target groups, saved results of queries (e.g. microarray data sets), or matches for free-text search terms as well.

Your website says that TRANSPATH connects data of the TRANSFAC database on transcriptional control with information about relevant signal transduction pathways. Is TRANSPATH a standalone database or would you recommend purchasing it in conjunction with the TRANSFAC database as well?

TRANSPATH is a stand-alone database indeed. The data on transcriptional control is included in TRANSPATH also, but lacks detail information. If you are interested in specific promoter sequence pattern, ie DNA-binding sites of specific transcription factors, then TRANSFAC can be very useful. If you are just interested in the connection or interaction network of transcription factors, genes and other signaling molecules, TRANSPATH alone will be fine.

Is it possible to show the gene-regulatory network for specified molecules?

Yes, indeed. The easiest way to do this, is to search for the molecules (genes) in the TRANSPATH^® database first, start the ArrayAnalyzer for searching connecting key molecules, and finally to start the visualization tool PathwayBuilder from one of the suggested key molecules. The key molecule will be indicated in magenta, all specified molecules (or genes) will appear in blue. For a detailed description, please also have a look on the help pages of the ArrayAnalyzer.

Detecting key molecules with help of the ArrayAnalyzer and subsequent visualization by the PathwayBuilder does not result in the desired network. What can I do to expand/change the network to a specific set of molecules/genes with their expression levels?

The visualization tool PathwayBuilder^TM contains many options, that can be adjusted. Visualization can be started from diverse points: from molecules, reactions and from genes and even from the result list of the molecule search and the ArrayAnalyzer^TM key molecule search. Once a desired starting molecule has been identified, from which the network visualization will be intriguing, the network to a specific set of molecules/genes can be shown by clicking on the options "shortest path to", "saved search results", either "molecule" or "genes". All molecules/genes, which have been searched for in your last molecule/gene search, will be depicted in blue and size of your given expression level. Furthermore the range of the network can be set manually by choosing "upstream/downstream" steps. Another parameter, "show modified forms" and "show unmodified forms", expands the network by looking at signaling branches connected by phosphorylated molecules or complexes. (Please have a look at the documentation for the PathwayBuilder for further descriptions.) .

Why are there different node sizes in the PathwayBuilder^TM output modes signal flow and clustered map?

The size of the nodes symbolizes the value of an attached expression level for a molecule. If you are using the ArrayAnalyzer^TM, you can in this way easily find your genes with the highest expression levels in the signaling network. For further information please see also the documentation.

I have generated a signaling pathway map. It was fairly complex with a number of upstream and downstream levels. I had to scroll through the file to see the whole pathway on the screen. I tried to save the file, but what was saved to the desktop was not the picture but the HTML file. Is there any way to print this out?

You can (in both Internet Explorer and Netscape) click with your right mouse button on the picture, then choose "Save image as". In Netscape you can also choose "View image" instead, then hit the print button.

If you generated a "CASCADE" (tree output), just hit the print button.

What are your plans for the visualization?

We are currently exploring sophisticated ways to automatically lay out the networks, trying to make them look even more like that which people are used to from literature (e.g. position molecules with respect to their intracellular location such as plasma membrane or nucleus). We are collaborating with experts in graph visualization to achieve standard-setting results.
Also, we are working to apply a zoom-function to the PathwayBuilder^TM.

What is the expected date that an annotation client will be available for TRANSPATH?

A client software for customers and the relational version of TRANSPATH will not be available right now for two reasons:

Annotation for TRANSPATH needs training to fulfil the standard format and knowledge.
We offer the service contract to annotate for specific data.

What about BLAST for TRANSPATH?

A BLAST-compatible format of TRANSPATH data is provided (at cgi-bin/dat/blast/ in your package), so that you can use it with your BLAST in-house system.

Is TRANSPATH available in XML format?

Curated and quality-checked XML files are available since release 3.1.