ResA is a novel tool for easy and straightforward enrichment and regulation analysis of annotations deposited in various online resources such as KEGG, Gene Ontology and Pfam or any kind of classification. The use of a resampling algorithm provides an empirical non-parametric approach to determine the statistical significance of samples defined by annotations in any experimental distribution. Results are presented in readily accessible navigable table views accompanied by relevant information for statistical inference. The tool is able to analyze multiple types of annotations in a single run and includes a Gene Ontology annotation feature. Results generated by ResA analysis complements those from existing related tools.
|1||Insert data by copy and paste into b or upload your file (see Data format).|
|2||If your data contains already columns with annotation set the respective count in c.|
|3||Add GO annotation if intended and your data contains gene symbols and/or Uniprot identifier. Set the organism (d) and scope of annotation (e and f). The annotation is based on the gene and Uniprot identifiers provided by UniProt-GOA.|
|4||Choose the statistic to be evaluated in g. Either regulation or enrichment (see The statistical estimators).|
|5||Set the number of resamples in h. The value states directly the resolution of the p-value and correlates linearly with the running time. The default value of 1,000 resamples constitutes a compromise between accuracy and time consumption.|
|6||Set minimum size (number of experimental values) of the terms in i depending on your interest in rare annotations.|
|7||By default a generalized Pareto distribution is fit to the tails of the resampling distribution (j). This increases the resolution of the p-values and results in less zero values. Deactivation of this feature speeds up the analysis and limits the resolution of the p-value to the number of resamples.|
|8||We recommend setting of an identifier for the analysis and giving an email address (k) to receive a link to the progress and results.|
|9||Start your analysis with 'Go!' (l). You will see the progress of the analysis and the email will be
sent to the given
Data must be tab separated and might contain gene symbols and Uniprot identifiers which are used for the Gene Ontology annotation. Titles of the annotation columns are used to discriminate between different types of annotation in the results. Multiple identifiers or annotations in one column must be separated by semi-colon.
Most importantly the first column must contain the exprimental value (log foldchange, isotope incorporation, intensity ...) and provided annotations must be the last columns. In between may be any information and identifiers like gene symbols or Uniprot identifiers which can be used for Gene Ontology annotation.
Examples of valid input data:
|This dataset contains experimental values and one column of annotation. The number of provided annotations must be set to 1. Gene Ontology annotation can not be performed with this dataset.|
|This dataset provides experimental values and gene symbols. Gene symbols and Uniprot identifier can be used for Gene Ontology annotation. The respective organism must be set. Because this dataset contains no further annotations the number of provided annotations must be set to 0.|
|This dataset is similar to the dataset used for the example analysis. It contains gene symbols and Uniprot identifier which can be used for Gene Ontology annotation. The respective organism must be set. In addition it contains three annotations. Thus the number of provided annotations must be set to 3. The column 'Protein Names' would be ignored during the analysis but will be visible in the results.|
|Regulation (x̄/SD)||:||Regulation based on mean devided by standard deviation|
|Mean divided by standard deviation evaluates regulation better than mean alone because the sample mean increases in significance when accompanied by a low standard deviation.|
|:||Enrichment based on standard deviation|
|:||Enrichment based on the coefficient of variation|
|:||Enrichment based on the width of the interval from the 10th to 90th percentile|
|ResA3 evaluates enrichment in respect to the width on the samples. The [10,90] estimator is outlier insensitive and the width is visualized by the corresponding marks of the seven-figure summary.|
The seven-figure summary diagrams which appears in the results show the characteristics of the population (grey) and of the individual sample (red) distributions. It is comparable to box plots but have additional marks at the 10th and 90th percentile.
This data was generated in our lab at the Max Planck Institute for Heart and Lung Research using stable isotope labeling and mass spectrometry. To compare turnover rates of proteins we fed two mice for two weeks on heavy lysine (13C6-lysine) containing diet. The level of heavy lysine label in individual proteins after a given period of time depends on the individual turnover rate of these proteins. The dataset contains the mean relative isotope incorporation (RII) of the two mice as the quantity or interest, gene and protein identifier along with KEGG, Pfam and Interpro annotations generated by the Perseus tool.
ResA was used to determine which functional or logical annotations contain proteins with significantly low and high mean heavy lysine incorporation within the complete dataset.
This analsis corresponds to finding significantly low or high ratios if carried out with log-ratio distributions.The example results were generated using the following settings:
ResA was used to determine which functional or logical annotations contain proteins with an exceptional narrow distribution within the complete dataset.
The example results were generated using the following settings: