ArrayAnalyzer | |||||||||||||||||||||||||||||||||
Identification of key entities (pathways, molecules, genes, ontology terms) with the ArrayAnalyer and TRANSPATH®TRANSPATH® Professional and the integrated tool ArrayAnalyzer provide the capability to search with lists of data, e.g. unique gene/molecule identifiers from Affymetrix, Entrez Gene, Swiss-Prot etc., with attached expression levels from gene expression array experiments. The result set can be analyzed for common molecules or genes in the immediate or distant vicinity of the signaling network (Key node analysis). For the hits, maps can be generated with the PathwayBuilder to visualize pathways and crosstalk between them. Another possibility is to perform a Functional group analysis using ontologies (Gene Ontology (GO), Cytomer). The search interface for the ArrayAnalyzer can be opened from the menu on the left. You can match your data with the contents of either the molecule or the gene table from TRANSPATH®. The search engine uses indices for nine types of gene identifiers (Table 1), which makes the query extremely fast. A data set list should contain one ID per line. One or more expression level values can be attached to the pure identifier behind a separator, which can be a tab or another character (default setting is the pipe symbol |). Choose the type of separator with the check boxes above the search term field. An example list inserted into the ArrayAnalyzer interface can be seen in Fig.1. In addition, text files containing query lists in the required format can be uploaded. Instead of querying with new data, a previously saved query can be viewed and used as ArrayAnalyzer input list. Table 1: Applicable Gene/Molecule identifiers and the required formats
Another possibility for obtaining input lists are 'normal' queries with the molecule or gene table search engines. Above each results list is a link that transfers the data to an input list. |
|||||||||||||||||||||||||||||||||
Fig.1 Search engine interface with a sample data list |
|||||||||||||||||||||||||||||||||
After submitting the query, the input list is generated (Fig.2). The list can be saved which is convenient especially if you want to perform different ArrayAnalyzer operations. The entries in the resulting list can be sorted in alphanumeric order by clicking on a column head. Each molecule/gene can be viewed as a flatfile entry by clicking on its accession number and with the PathwayBuilder as usual. Two different formats (gif, svg) can be selected to retrieve a pathway map with standard parameter settings, giving a quick overview of the respective signaling network. If you would like to obtain the results list in pure text format, please click on the link above the results list. Of course, you can choose your own parameter settings via the PathwayBuilder interface at a later stage. All data for which there is no matching entry in TRANSPATH® is displayed at the bottom of the page. |
|||||||||||||||||||||||||||||||||
Fig.2 Search result: Input list for the ArrayAnalyzer |
|||||||||||||||||||||||||||||||||
The input list can be used to analyze TRANSPATH® data in different ways:
|
|||||||||||||||||||||||||||||||||
Key node analysisBy default, all entries of an input list are included in the respective analysis and are labeled as 'try to reach'. For refining the analysis, a target entry can be marked as 'must not reach'. All nodes that can reach this target within the distance will be excluded from the analysis. This often has a strong restrictive effect, whereas 'ignore' excludes only the target entry itself. Changes to the flag settings must always be submitted with the Apply button (even one line changes) (Fig.2). For the Key node analysis, three parameters can be set. |
|||||||||||||||||||||||||||||||||
Fig.3 Key node options |
|||||||||||||||||||||||||||||||||
Distance (Fig.3) determines the maximum number of steps ('molecule -> reaction -> molecule' counts as one step) between molecules in the analysis. Thus it defines the search radius around each molecule (or gene) of the results list. Using the network pull-down list, you can select the general pathway orientation (up- or downstream) and whether you want to follow only the reactions (including or excluding gene expression reactions) that are connected with the entries from your set in the analysis, or whether links to superfamilies, modified forms or complexes are included. The distance between a molecule and its superfamily, modified form or complex is considered to be zero, which means the selected maximum distance to key nodes is not reduced by these connections. Selecting superfamilies, modified forms or complexes increases the chance of finding connections to other nodes. All key nodes (either molecule or gene) in the analysis results list will be ordered by default due to their significance score. This score reflects the relation of connected relevant nodes (i.e. the nodes that correspond to the molecule/gene list from the initial query) to nonrelevant nodes (i.e. molecules/genes that can be reached from the key node but are not in the initial list): Expression level values, that have been attached to a molecule/gene list, can optionally have an influence on the score (weighted score) and thus on the ranking of identified key nodes. Positive and negative expression values are normalized and the weight increases with the value. 'Small molecules' such as ATP or ADP are excluded from the analysis by default. As an example, if you choose direction 'upstream' and a maximum distance of '5', you will get as a result the most significant upstream molecules that can reach the largest number of molecules from your data set within the maximally allowed number of steps (5). Redundant key nodes are removed from a result list (example : A->B->(C,D,E) cases are skipped, only B->(C,D,E) is kept). Please note, that in some rare cases A->B->(C,D,E) is not removed. In such cases, there is another key node directly below A, but its targets exceed the radius if starting from A, preventing it from being displayed in the map. Therefore these cases are not really redundant and are not removed, although they seem redundant on the map. The algorithm counts the number of directly linked other keynodes of A. Only if the number is 1 (e.g. only the B node), then A is removed. The analysis result is displayed as an output list of molecules/genes with three new attributes: the score (significance) of the entry, the N value (#hits in network) and number of non-relevant reachable nodes (M). (see above and Fig.4). |
|||||||||||||||||||||||||||||||||
Fig.4 Key node analysis result |
|||||||||||||||||||||||||||||||||
Each one of the listed entries can be displayed with detailed information or can be used as a starting point in the PathwayBuilder. | |||||||||||||||||||||||||||||||||
In the 'signal flow', 'clustered map' and 'list' visualization modes, all of the molecule/gene nodes which match your original result set are highlighted in a light blue/grey (Fig.5), except the starting node and 'molecules bordering the search'. If an expression level value has been attached to the search term, those nodes with positive and negative values are shown in a scale of red or green respectively. | |||||||||||||||||||||||||||||||||
Fig.5 Map with highlighted matched molecules and different expression levels |
|||||||||||||||||||||||||||||||||
Each output list can be saved and/or transferred to an input list by clicking on the link 'Take as ArrayAnalyzer input' (Fig.4). | |||||||||||||||||||||||||||||||||
Network cluster analysisThe cluster analysis can be used to identify common subnetworks for a given molecule/gene list. The algorithm tries to connect each pair of the individual list items. The cluster separation degree influences to which degree clusters are separated/divided and thus the cluster size. The higher the degree, the more edges are removed. The edges are assigned a betweenness value, and edges with high value are more likely to be removed. Too low degrees yield one big cluster which is also difficult to visualize. Too high degrees can leave the input set unclustered. The size of the given input list also influences this parameter: big inputs usually require higher separation degrees. NetPro data (if licensed) are not included in the analysis. | |||||||||||||||||||||||||||||||||
Fig.6 Network cluster analysis options |
|||||||||||||||||||||||||||||||||
Functional group analysisThis analysis function allows molecules from the input list to be clustered according to common associated ontology terms or appearance in canonical TRANSPATH® pathways. By selecting the parameters max. p-value and min. n-list, the user can restrict the output to the cases which are characterized by a low p-value (e.g. < 0.001) and can be considered as statistically significant (though, for such conclusions the multiple testing correction should be taken into account), and/or the cases that are characterized by a high number of hits. | |||||||||||||||||||||||||||||||||
Fig.7 Functional group analysis against several ontologies |
|||||||||||||||||||||||||||||||||
The resulting output list contains the terms or pathways together with the molecule clusters and statistical parameters (Fig.8). The list can be saved as usual and single or multiple clusters can be selected and transferred to an input list for further analysis by ticking the check boxes and clicking on the Take hits button. |
|||||||||||||||||||||||||||||||||
Fig.8 Gene Ontology output list |
|||||||||||||||||||||||||||||||||
Group size (K) is the number of molecules that constitute a functional group in three analysis modes:
The p-value is the probability of getting the observed number of hits in a group just by random chance. Two values are calculated for overrepresentation p-value (+) and for underrepresentation p-value (-). The p-value is calculated using Binomial distribution using the following formulas: The size of the list of matching molecules (L) is the number of molecules from the input list that matched. All molecules (A) is the number of molecules in all groups together. It is calculated separately for each analysis mode. |
|||||||||||||||||||||||||||||||||