To perform overrepresentation analysis of metabolite annotations with MBROLE, you have to:
Provide a set of compounds.
Select the annotations (vocabularies) you want to analyze.
Provide a background set.
You can test the server right away with a provided example by clicking the Load example button at the Analysis section.
1. Compound set
The compound set is a list of compounds you are interested in (e.g. a set of metabolites identified in your experiment). Either you can copy/paste a list of IDs, or upload a file (MBROLE expects tabs or new lines as ID separators).
See the Conversion section for further information on supported IDs.
Annotations are organized into several vocabularies extracted from metabolic and chemical databases. Select the vocabulary (or vocabularies) you want to use for the enrichment analysis. You can include multiple annotations at once by clicking on them while pressing Ctrl button. You can also use the box for searching annotations. Annotations already selected will appear in lighter color in the menu. To remove an annotation or vocabulary from the list press in the "x" besides it.
The vocabularies included in this version of MBROLE are the following:
Pathways: from BioCyc, HMDB, KEGG, PathBank, Reactome, PharmGKB and YMDB.
Modules: from KEGG.
Pathway Classes: from Reactome.
Protein interactions: from BioCyc, HMDB and PathBank.
Biological Roles: from ChEBI and KEGG.
Role and Physiological effect: from HMDB ontology.
Gene Ontology Terms: from CTD.
Tissue: from HMDB (e.g. Bladder or Bone Marrow).
Biofluid: from HMDB (e.g. Blood or Urine).
Cellular: from YMDB, ECMDB and HMDB.
Disposition from the HMDB ontology.
Indications from TTD (e.g. Hypoglycemia or Gastritis).
Pharmacological actions from MeSH (e.g. Antiviral Agents).
Anatomical Therapeutic Chemical Classification (ATC) from KEGG.
Disease Classification from KEGG (e.g. Skin diseases or Parasitic infections).
Diseases from HMDB and KEGG.
Chemical Taxonomy: from LMSD and HMDB.
Chemical types from BioCyc.
Uniprot Keywords and Panther Protein Class from chemical-protein interactions (BioCyc, HMDB, and PathBank).
MeSH terms from co-citations in PubMed with HMDB compounds and MeSH chemicals.
3. Background set
MBROLE needs a background set of compounds to compute statistics. This background set is a reference used for assessing the significance of a given annotation to be enriched in your compounds of interest.
A given annotation could have a high frequency in your set of compounds simply because it is frequent in the whole metabolome. This is why a reference set is needed for assessing significance. This background set should contain all the compounds that could hypothetically be identified in your experiment (in a global profiling experiment, the background set will correspond to the whole metabolome).
Therefore, you can choose which type of background set is better depending on your data and/or the information you want to obtain:
Full database: MBROLE will use as background all the compounds in the database(s) associated with the selected annotations.
Organism-dependent: MBROLE will use the compounds associated to the selected organism as background set. Organism-specific annotations are marked with an asterisk (*). Note that if you select annotations of an organism-specific database different than the selected organism, those annotations won't be considered (e.g. Diseases from HMDB and Mus musculus as background organism). For the rest of annotations, the background will be all the compounds in the corresponding database.
Use my own background set: paste a list or upload a file with the IDs you want to use as background. Note that this option will take significantly more time that the others as every ID need to be checked whether exists in the database. Also, only supported compound IDS are valid inputs.
Once the fields in the input form are completed, press the Enrichment analysis button. If an error is found in the input, a red text box shows up at the top describing it.
In this new version, results are separated in two different categories. These categories are the same used to organize the vocabularies in the Annotations section of the analysis page. A new navigation bar allows you to change between the results obtained from (Direct and Indirect) annotations. Each of these sections includes a single table which can be sorted by clicking on its headers. By default, these tables are sorted by decreasing significance (increasing FDR). Each annotation (keyword) will be arranged in a row with several columns containing all the information:
Annotation shows the annotation (keyword) itself and, if available, a link to its entry in the source database.
Category shows to which vocabulary the keyword belongs (take a look at the classification of annotation vocabularies in the Annotations section of the user manual).
Set shows the number of compounds in your input that have annotations in that category.
In set shows the number of your compounds that have this particular annotation.
Background shows the total number of compounds with annotations in that category.
In background shows the number of compounds from the background with this particular annotation.
p-value of the enrichment analysis test for that annotation. It is calculated using the values from Set, In set, Background, and In background.
FDR shows the adjusted p-value calculated as False Discovery Rate (FDR).
The following two columns show up if you select "Show compounds in results table" at the top of the page.
Submitted IDs Shows those IDs from your set with that particular annotation (keyword).
Matching IDs Shows the equivalent IDs from your set that actually matched the particular annotation. For example, if you provided KEGG IDs and selected Pathways from HMDB as vocabulary, this column will show the equivalent IDs from HMDB involved in the corresponding pathways.
MBROLE computes the statistical significance of each annotation found using the background set. This is provided as a p-value, or the probability of obtaining such a number (or more) of those compounds with a particular annotation in case we take a random set of the same size from the background set.
This p-value is also adjusted for multiple testing using the false discovery rate (FDR) method by Benjamini and Hochberg (1995)*. Both, p-value and adjusted p-value are reported for each annotation and, when possible, annotations and compound IDs are hyperlinked to their corresponding source database.
For KEGG pathways, the compounds in the input set are highlighted in the pathway representation generated by the hyperlink. P-values are colored based on their significance level, (green for values <0.001; red for values ≥0.05 and orange for the rest). Put the mouse over each "Category" cell and it'll show the number of metabolites used as background as well as the number of associated metabolites found in it. You can see an example of this in Fig.2 of the Tutorial
For practical purposes, MBROLE only shows those annotations found in three or more metabolites of the input set.
In case your query does not return any significant result and/or you are interested in those annotations filtered out, you can download the whole set of results clicking on the ("Download all annotations" button). The file generated is a TSV plain text file ready for being further processed with any spreadsheet program .
Another novelty in this version of MBROLE is the generation of plots of the results. You can generate dot plots and bar plots by clicking on their corresponding buttons. Doing so will display in a new tab one figure for every vocabulary selected showing the most significant annotations (FDR < 0.05). The number of annotations in each plot is limited to twenty and it is indicated in the title of the plot. In the event that no significant annotations are found for a vocabulary, its plot won't be generated.
Both types of plots show the False Discovery Rate of each annotation with a color code illustrating its enrichment ratio. In the the dot plots, the size of the dot correlates with the number of compounds from the input set matching the annotation.
The enrichment ratio is calculated as the number of compounds with a given annotation against the expected value, where the expected value is the product of the input compounds found in a vocabulary and the ratio of those compounds in the background set.
You can now choose among two different types of ID conversion when running an analysis:
Direct conversion (option selected by default): searches for equivalent IDs when the source database of the annotation selected provides corresponding cross-references. When this option is selected, you will need to indicate the type of ID and the list of annotations will be updated disabling those whose source databases do not include such cross-reference.
Automatic conversion: allows mixing IDs from different sources. If the source database of the annotations selected does not include cross-references of the input IDs, a subsequent search will look for an intermediate chemical ID that relates the input ID and the IDs used for the corresponding annotation.
MBROLE includes an independent conversion utility that implements the Direct Conversion described before. This utility also accepts chemical names as input for converting them to any of the supported IDs, and it is accessible via the "Conversion" link at the top of MBROLE pages.
BioCyc compounds (e.g. CPLX3O-77 for phosphofructokinase).
CAS Registry Number (e.g. 26566-61-0 for galactose)
ChEBI accessions (e.g. CHEBI:34590 for bromobutide).
ChemSpider IDs (e.g. 733 for glycerin)
ECMDB metabolites (e.g. ECMDB00039 for butyric acid).
HMDB metabolites (e.g. HMDB0000064 for creatine).
KEGG (e.g. C00008 for ADP).
LIPID MAPS® Structure Database IDs (e.g. LMST01010001 for cholesterol)
MeSH chemicals (e.g. D005947 for glucose)
PubChem Compounds (e.g. 5793 for glucose)
TTD (e.g. D0J7XL for Gramicidin D).
YMDB metabolites (e.g. YMDB00003 for urea).
Current annotations available in MBROLE, classified according with the database they were obtained from:
Human Metabolome Database (HMDB) metabolites
HMDB ontology Hierarchycal vocabulary divided in four general categories: Disposition, Physiological effect, Process and Role.
Diseases e.g. Obesity, Kidney disease, or Pyruvate dehydrogenase deficiency.
Localization in biofluid (e.g. Blood, Urine), tissue (e.g. Bladder, Bone Marrow, Neuron) or cellular (e.g. Extracellular, Membrane, Mitochondria).
Chemical Classification: Hierarchical classification of chemical entities.
Biological role: played by the compound or part thereof in three different contexts: biological (e.g. toxin, immunomodulator,
antimicrobial agent), chemical (e.g. antioxidant, surfactant, acceptor) and application (e.g. agrochemical, pesticide, dye).
Pathways related to the metabolite and its possition in the biological network.
Protein interaction for Uniprot.
Pharmacological actions: a broad range of chemical actions and uses (e.g. Luminescent, Antineoplastic or Bronchodilator).
MeSH hierarchy: categorical classification of chemicals (D03 for Heterocyclic Compounds or D03.438.221 for Benzoxazoles).
Co-citations of chemicals: Terms from the literature frequently mentioned together with chemicals.
MBROLE integrates information from multiple databases including chemical and biological information related to chemical compounds. In most cases these databases are public and their contents are freely available for non-commercial use. This is not the case with BioCyc and KEGG which require a paid subscription to access their most recent data.
This version of MBROLE analyzes annotations from the following databases:
Kyoto Encyclopedia of Genes and Genomes (KEGG), release 102 (April, 2022)
All data integrated in MBROLE is stored in a relational database that is queried based on the user input. The statistical calculations and the generation of plots are handled by Python scripts.
You can find more details in the publication of this version of MBROLE.
MBROLE makes use of languagues that are standard for web development and it is expected to work in any web browser and operating system. So far, it has been checked to work with the following browsers:
Mozilla Firefox: on Windows 10, Linux and MacOS
Google Chrome: on Windows 10 and Linux
Microsoft Edge: on Windows 10
Safari: on MacOS
The following document contains two detailed use cases to highlight the potential of the server in real-world analysis scenarios: mbrole3_use_case
This tutorial will teach you how to perform a whole analysis with MBROLE. You can also use the "Load example" button in the analysis page to reproduce it.
Upload the file containing the metabolite IDs or copy and paste them in the text area. Note that IDs should be separated by tabs or new lines. Also, as there are chemical IDs from different sources, check the option: MBROLE automatic conversion.
In the next section, Annotations, choose the following annotations: Biological Roles (ChEBI), Diseases (KEGG), GO Terms (CTD), Pathways (KEGG), MeSH terms-MeSH chemicals and Panther Protein Class (HMDB).
In the section Background Set select Organism dependent. A drop-down list with all the organisms available appears. Search and select Homo Sapiens
Click on the Enrichment analysis button to start the analysis.
Once calculations have finished, you will get a results table with six columns summarizing results. The table is sorted by increasing "FDR" (from most to least significant). Each row contains information for a single annotation as it was explained in the Results section of the user manual. Results for Direct and Indirect annotations are separated in different sections.
This set of metabolites is given in terms of chemical names so they cannot be used directly in the analysis. In this example we will convert them to HMDB, KEGG and ChEBI IDs so as to perform the analysis with them.
Go to the Conversion utility and upload the file or copy-paste its content and select: "ChEBI compounds", "HMDB metabolites" and "KEGG compounds".
After you click the "Search IDs" button you will get a table with four columns (see Fig. 3). The table includes the submitted metabolite, the equivalent ID(s) and its source database.
This table can be downloaded as a TSV text file clicking on "Download results" button. Once checked that the name to ID conversion is OK (and eventually corrected), you can copy/paste the column with the IDs of the compounds into the Analysis form to analyze them.