{"id":1643,"date":"2024-05-29T22:32:15","date_gmt":"2024-05-29T20:32:15","guid":{"rendered":"https:\/\/paul-regnier.fr\/?page_id=1643"},"modified":"2026-02-27T15:42:10","modified_gmt":"2026-02-27T14:42:10","slug":"tutoriel-amocati","status":"publish","type":"page","link":"https:\/\/paul-regnier.fr\/en_gb\/tutoriel-amocati\/","title":{"rendered":"Tutoriel AMOCATI"},"content":{"rendered":"<p class=\"has-text-align-center has-black-color has-text-color has-large-font-size wp-block-paragraph\"><strong>AMOCATI: Algorithmic Meta-analysis Of Clinical And Transcriptomic Information<\/strong><\/p>\n\n\n\n<p class=\"has-text-align-left has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Warning: <strong>this tutorial is only available in English, even if you choose the French language at the bottom of the screen. Thank you for your understanding<\/strong>.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><code>AMOCATI<\/code> is a R-written package which aims to analyze transcriptome-based datasets, and more specifically quantify how a given gene and\/or gene signature impacts the overall survival of patients. For the sake of convenience, <code>AMOCATI<\/code> allows to flawlessly download data from the Genomic Data Commons (GDC) repository, and more precisely cancer datasets from TCGA, TARGET and CGCI projects.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" style=\"font-size:28px\"><strong>Table of contents<\/strong><\/h2>\n\n\n\n<ol style=\"font-size:24px\" class=\"wp-block-list\">\n<li><a href=\"#1\" data-type=\"internal\">Prerequisites<\/a>\n<ol style=\"font-size:18px\" class=\"wp-block-list\">\n<li><a href=\"#1.1\">Publication<\/a><\/li>\n\n\n\n<li><a href=\"#1.2\">R environment introduction and installation<\/a><\/li>\n\n\n\n<li><a href=\"#1.3\">AMOCATI R package installation and update<\/a><\/li>\n\n\n\n<li><a href=\"#1.4\" data-type=\"internal\" data-id=\"#1.2\">Load AMOCATI<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li><a href=\"#2\">Workspace directory setup<\/a><\/li>\n\n\n\n<li><a href=\"#3\">Download and process a dataset of interest<\/a><\/li>\n\n\n\n<li><a href=\"#4\">Launch the metaResults analysis<\/a><\/li>\n\n\n\n<li><a href=\"#5\">Extract the Classification Signature<\/a><\/li>\n\n\n\n<li><a href=\"#6\">Compute patient-wise the Quantitative Scores and Clinical Scores of the Classification Signature<\/a><\/li>\n\n\n\n<li><a href=\"#7\">Separate the patients of the cohort according to the Quantitative Score or the Clinical Score of the Classification Signature<\/a><\/li>\n\n\n\n<li><a href=\"#8\">Use of custom gene signatures instead of the Classification Signature<\/a><\/li>\n\n\n\n<li><a href=\"#9\">Adding distinguishing genes as a new layer of complexity for custom signatures<\/a><\/li>\n\n\n\n<li><a href=\"#10\">Further analyses<\/a><\/li>\n\n\n\n<li><a href=\"#11\">Miscellaneous<\/a><\/li>\n\n\n\n<li><a href=\"#12\">Citation<\/a><\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"1\" style=\"font-size:28px\"><strong>1) Prerequisites<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1.1\" style=\"font-size:22px\"><strong>1.1) Publication<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Before using <code>AMOCATI<\/code>, we greatly encourage users to carefully read our associated publication. The methodology behind <code>AMOCATI<\/code> can be rather complex to understand at first sight, but we tried our best to make it as clear as possible. The manuscript and its associated supplemental resources will help users to understand how <code>AMOCATI<\/code> works, what are the main steps of its workflow and how to apply it in real-life datasets to treat a biological question.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1.2\" style=\"font-size:22px\"><strong>1.2) R environment introduction and installation<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">To make the installation of R programming language and RStudio development software easier for new or beginner users, we highly recommend the following ressource, entitled <a href=\"https:\/\/bookdown.org\/ndphillips\/YaRrr\/\">\u00ab\u00a0YaRrr! The Pirate&rsquo;s Guide to R\u00a0\u00bb<\/a>. New users should at least read the first (\u00ab\u00a0Preface\u00a0\u00bb) and second (\u00ab\u00a0Getting Started\u00a0\u00bb) sections, as they provide clear and straightforward instructions on how to setup R and RStudio on Windows and MacOS operating systems. These sections will allow users to correctly install <code>AMOCATI<\/code> and launch it flawlessly.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1.3\" style=\"font-size:22px\"><strong>1.3) AMOCATI R package installation and update<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><code>AMOCATI<\/code> package can be installed with the following command:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># The following line can be skipped if the devtools package is already installed\n<\/em>\n<strong>install.packages(<\/strong>\"devtools\"<strong>)<\/strong>\n\n<em># Load the devtools package\n<\/em>\n<strong>library(<\/strong>\"devtools\"<strong>)<\/strong>\n\n<em># Install AMOCATI from GitHub repository\n<\/em>\n<strong>devtools::install_github(<\/strong>\"PaulRegnier\/AMOCATI\", build_vignettes = TRUE<strong>)<\/strong>\n\n<em># Update AMOCATI from GitHub repository\n<\/em>\n<strong>devtools::install_github(<\/strong>\"PaulRegnier\/AMOCATI\", build_vignettes = TRUE, force = TRUE<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1.4\" style=\"font-size:22px\"><strong>1.4) Load AMOCATI<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">To load <code>AMOCATI<\/code>, simply enter the following command in the R console:<\/p>\n\n\n\n<pre class=\"wp-block-code has-black-color has-text-color\" style=\"font-size:16px\"><code><strong>library(<\/strong>\"AMOCATI\"<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"2\" style=\"font-size:28px\"><strong>2) Workspace directory setup<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Before launching the actual analysis, users need to select and setup their working directory, which <code>AMOCATI<\/code> will use throughout its workflow:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Select the right working directory\n<\/em>\nworkingDirectory = <strong>file.path(<\/strong>\"YOUR\", \"PATH\", \"HERE\"<strong>)<\/strong>\n<strong>setwd(<\/strong>workingDirectory<strong>)<\/strong>\n\n<em># Construct the actual workspace\n<\/em>\n<strong>resetWorkspace(<\/strong>\n    <strong>eraseEntireRMemory<\/strong> = FALSE,\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This function will create a set of folders and subfolders in which different files will be written throughout the <code>AMOCATI<\/code> workflow.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"3\" style=\"font-size:28px\"><strong>3) Download and process a dataset of interest<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">First, users should determine which dataset should be analyzed. For the sake of convenience, <code>AMOCATI<\/code> allows users to directly download cancer datasets coming from the GDC repository (and notably TCGA, TARGET and CGCI projects).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If you want to use such dataset, please run the following command to access the available datasets to download:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># List projects and associated parameters\n<\/em>\n<strong>listProjectsAttributes()<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This will output something similar to:<\/p>\n\n\n\n<pre id=\"rstudio_console_output\" class=\"wp-block-preformatted\" style=\"font-size:16px\">          ProjectID                                                                   ProjectName\n1     CGCI-HTMCP-CC               HIV+ Tumor Molecular Characterization Project - Cervical Cancer\n2  CGCI-HTMCP-DLBCL HIV+ Tumor Molecular Characterization Project - Diffuse Large B-Cell Lymphoma\n3     CGCI-HTMCP-LC                   HIV+ Tumor Molecular Characterization Project - Lung Cancer\n4     TARGET-ALL-P1                                        Acute Lymphoblastic Leukemia - Phase I\n5     TARGET-ALL-P2                                       Acute Lymphoblastic Leukemia - Phase II\n6     TARGET-ALL-P3                                      Acute Lymphoblastic Leukemia - Phase III\n7        TARGET-AML                                                        Acute Myeloid Leukemia\n8       TARGET-CCSK                                              Clear Cell Sarcoma of the Kidney\n9        TARGET-NBL                                                                 Neuroblastoma\n10        TARGET-OS                                                                  Osteosarcoma\n11        TARGET-RT                                                                Rhabdoid Tumor\n12        TARGET-WT                                                         High-Risk Wilms Tumor\n13         TCGA-ACC                                                      Adrenocortical Carcinoma\n14        TCGA-BLCA                                                  Bladder Urothelial Carcinoma\n15        TCGA-BRCA                                                     Breast Invasive Carcinoma\n16        TCGA-CESC              Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma\n17        TCGA-CHOL                                                            Cholangiocarcinoma\n18        TCGA-COAD                                                          Colon Adenocarcinoma\n19        TCGA-DLBC                               Lymphoid Neoplasm Diffuse Large B-cell Lymphoma\n20        TCGA-ESCA                                                          Esophageal Carcinoma\n21         TCGA-GBM                                                       Glioblastoma Multiforme\n22        TCGA-HNSC                                         Head and Neck Squamous Cell Carcinoma\n23        TCGA-KICH                                                            Kidney Chromophobe\n24        TCGA-KIRC                                             Kidney Renal Clear Cell Carcinoma\n25        TCGA-KIRP                                         Kidney Renal Papillary Cell Carcinoma\n26        TCGA-LAML                                                        Acute Myeloid Leukemia\n27         TCGA-LGG                                                      Brain Lower Grade Glioma\n28        TCGA-LIHC                                                Liver Hepatocellular Carcinoma\n29        TCGA-LUAD                                                           Lung Adenocarcinoma\n30        TCGA-LUSC                                                  Lung Squamous Cell Carcinoma\n31        TCGA-MESO                                                                  Mesothelioma\n32          TCGA-OV                                             Ovarian Serous Cystadenocarcinoma\n33        TCGA-PAAD                                                     Pancreatic Adenocarcinoma\n34        TCGA-PCPG                                            Pheochromocytoma and Paraganglioma\n35        TCGA-PRAD                                                       Prostate Adenocarcinoma\n36        TCGA-READ                                                         Rectum Adenocarcinoma\n37        TCGA-SARC                                                                       Sarcoma\n38        TCGA-SKCM                                                       Skin Cutaneous Melanoma\n39        TCGA-STAD                                                        Stomach Adenocarcinoma\n40        TCGA-TGCT                                                   Testicular Germ Cell Tumors\n41        TCGA-THCA                                                             Thyroid Carcinoma\n42        TCGA-THYM                                                                       Thymoma\n43        TCGA-UCEC                                          Uterine Corpus Endometrial Carcinoma\n44         TCGA-UCS                                                        Uterine Carcinosarcoma\n45         TCGA-UVM                                                                Uveal Melanoma<\/pre>\n\n\n\n<p class=\"has-text-align-left has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Of note, if users want to download a TCGA or a CGCI dataset, then they must use the dedicated <code>TCGA_CGCI.download()<\/code>, <code>TCGA_CGCI.createMetaMapping()<\/code> and <code>TCGA_CGCI.pool()<\/code> functions as described below. On the contrary, if users rather want to use a TARGET dataset, they must use the <code>TARGET.download()<\/code>, <code>TARGET.createMetaMapping()<\/code> and <code>TARGET.pool()<\/code> functions.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">For the rest of this tutorial, we will use the cholangiocarcinoma dataset from the TCGA project (<code>ProjectID = TCGA-CHOL<\/code>).<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Download the associated RNA-Seq and clinical data\n<\/em>\n<strong>TCGA_CGCI.download(projectID<\/strong> = \"TCGA-CHOL\"<strong>)<\/strong>\n\n<em># Create a metamapping file which links RNA-Seq and clinical data to the right patients\n<\/em>\n<strong>TCGA_CGCI.createMetaMapping(verbose<\/strong> = TRUE<strong>)<\/strong>\n\n<em># Finally pool, process and export data in an all-in-one file\n<\/em>\n<strong>TCGA_CGCI.pool(verbose<\/strong> = TRUE<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Because of their dependence to the GDC API, the <code><strong>TCGA_CGCI.download()<\/strong><\/code> and <strong><code>TARGET.download()<\/code><\/strong> functions could potentially stop during the downloading and thus throw errors. In this case, do not hesitate to run the command again, as we implemented a mechanism inside to prevent the redownloading of already downloaded files (both for RNA-Seq and clinical data).<\/p>\n\n\n\n<p class=\"has-text-align-left has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Indeed, users always have the possibility to use their own dataset, as long as it follows the correct format (see the <code>Figure 1<\/code> below): the <code>fullData.data<\/code> file (located in the <code>output &gt; data<\/code> folder) should be a tabulation-delimited plain text file, where the 1<sup>st<\/sup> column is entitled <code>CaseUUID<\/code> and lists all the unique identifiers for each patient, the 2<sup>nd<\/sup> column is entitled <code>vitalStatus<\/code> and lists the vital status of each patient (either <code>Alive<\/code> or <code>Dead<\/code>), the 3<sup>rd<\/sup> column is entitled <code>survivedDays<\/code> and lists the number of days survived after the diagnosis, and the subsequent columns list the expression values for each gene (HGNC format) of the transcriptome.<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a5597d1ca676&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a5597d1ca676\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1218\" height=\"843\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1218px) 100vw, 1218px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_dataFileFormat.png\" alt=\"\" class=\"wp-image-1666\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_dataFileFormat.png 1218w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_dataFileFormat-300x208.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_dataFileFormat-1024x709.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_dataFileFormat-768x532.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_dataFileFormat-18x12.png 18w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 1 &#8211; File format to respect for <code>AMOCATI<\/code> data (click on the image to open in fullscreen).<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Importantly, users can also use datasets unrelated to cancer if they wish to. The only important criteria is that the data should represent measurable features (one per column) for patients (one per line) with survival\/relapse\/event information (2<sup>nd<\/sup> and 3<sup>rd<\/sup> column). <code>Alive<\/code> and <code>Dead<\/code> values for the <code>vitalStatus<\/code> column could easily be translated to code any other event type (although the column name nor the words <code>Alive<\/code> or <code>Dead<\/code> should be changed for the sake of compatibility with <code>AMOCATI<\/code>).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"4\" style=\"font-size:28px\"><strong>4) Launch the metaResults analysis<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, the next step is to compute the <code>metaResults<\/code> associated with this dataset. In a few words, this function randomly samples the dataset a given number of times (bootstrapping approach) and then computes, summarizes and outputs different metrics for each gene allowing to estimate and classify its impact on the overall survival of patients (see publication for more details about the algorithm as well as for a graphical representation of what it actually does):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>createMetaResults(<\/strong>\n    <strong>selectedGenesOnly<\/strong> = FALSE,\n    <strong>verbose<\/strong> = TRUE,\n    <strong>signaturesMode<\/strong> = FALSE,\n    <strong>minNumberOfPatientsPerGroup<\/strong> = 3,\n    <strong>unsollicitedCores<\/strong> = 2,\n    <strong>iterationsPerCluster<\/strong> = 16,\n    <strong>genesCutoff<\/strong> = 20\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Of note, this step can be long to complete and can be rather computing intensive. So please set the number of <code>unsollicitedCores<\/code> to a reasonable value (RAM requirements can be important).<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This function outputs a tabulation-delimited text file named <code>metaResults.meta<\/code> and located in the <code>output &gt; metaResults<\/code> folder.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If desired, this analysis can be performed only on a given selection of genes, in order to drastically reduce the computation time. To this, users can provide a list of genes to use through the <code>selectedGenesOnly = TRUE<\/code> argument. In this case, the output metaResults analysis will be named <code>metaResults_selectedGenes.meta<\/code> and will be located in the <code>output &gt; metaResults<\/code> folder, as previously described. Additionally, this mode of analysis will also generate other results, and notably a <code>selectedGenes.zip<\/code> file contaning the associated survival tables and curves in the <code>output &gt; metaResults &gt; selectedGenes<\/code> folder. Please note that the tabulation-delimited text file containing the selected gene(s) to use should follow a precise format, although its name can be whatever the users want (<code>*.txt<\/code>): this file should contain a single column table, with the first line named <code>HGNC_GeneSymbol<\/code> and the subsequent ones indicating the actual genes to use. These genes must be in the HGNC format. This file should mandatorily be located in the <code>output &gt; data &gt; input<\/code> folder.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"5\" style=\"font-size:28px\"><strong>5) Extract the Classification Signature<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Afterwards, users should plot two metrics that are output by the previous <strong>createMetaResults()<\/strong> function in order to select the genes that have a great impact on survival coupled with a low variability upon the performed boostrapping iterations.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">First, users should see the two metrics that will help to delineate the genes that will compose the Classification Signature:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>getClassificationSignature(<\/strong>\n    <strong>GeneScoreThreshold<\/strong> = NULL,\n    <strong>Gene_SNR_ExpressionThreshold<\/strong> = NULL,\n    <strong>exportSignature<\/strong> = FALSE,\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">After that, users can choose associated thresholds and see how it affects the resulting signature (see the <strong>Figure 2<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code>GS_threshold = 0.5\nSNR_threshold = 2\n\n<strong>getClassificationSignature(<\/strong>\n    <strong>GeneScoreThreshold<\/strong> = GS_threshold,\n    <strong>Gene_SNR_ExpressionThreshold<\/strong> = SNR_threshold,\n    <strong>exportSignature<\/strong> = FALSE,\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a5597d1cb882&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a5597d1cb882\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1950\" height=\"1613\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1950px) 100vw, 1950px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_getClassificationSignature.png\" alt=\"\" class=\"wp-image-1686\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_getClassificationSignature.png 1950w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_getClassificationSignature-300x248.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_getClassificationSignature-1024x847.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_getClassificationSignature-768x635.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_getClassificationSignature-1536x1271.png 1536w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2024\/05\/AMOCATI_getClassificationSignature-15x12.png 15w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\"><strong>Figure 2 &#8211; <code>GeneScoreThreshold<\/code> and <code>Gene_SNR_ExpressionThreshold<\/code> are visually set (click on the image to open in fullscreen).<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">When the thresholds are correctly set, users should export the final gene signature:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>getClassificationSignature(<\/strong>\n    <strong>GeneScoreThreshold<\/strong> = GS_threshold,\n    <strong>Gene_SNR_ExpressionThreshold<\/strong> = SNR_threshold,\n    <strong>exportSignature<\/strong> = TRUE,\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The subsequent Classification Signature is saved in the <code>output &gt; signatures &gt; classificationSignature.sign<\/code> file, as well as the associated plot in PDF format. <code>*.sign<\/code> files are simply plain text file (tabulation-delimited), where each column represents a gene signature (each gene being in the HGNC format), with a short description in the 1<sup>st<\/sup> line. Therefore, several signatures can be aggregated within a single <code>*.sign<\/code> file (see later in this tutorial).<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"6\" style=\"font-size:28px\"><strong>6) Compute patient-wise the Quantitative Scores and Clinical Scores of the Classification Signature<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, <code>AMOCATI<\/code> can compute both the Quantitative Score as well as the Clinical Score for the Classification Signature and for each patient, which basically uses the previously-determined Classification Signature and the <code>metaResults.meta<\/code> file:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>applySignature(<\/strong>\n    <strong>signatureUsed<\/strong> = \"classification\",\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">In this setting, the <code><strong>applySignature()<\/strong><\/code> function outputs a <code>*.apply<\/code> file in the <code>output &gt; apply<\/code> folder. This tabulation-delimited text file contains the Quantitative Scores and the Clinical Scores for the Classification Signature for each patient.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"7\" style=\"font-size:28px\"><strong>7) Separate the patients of the cohort according to the Quantitative Score or the Clinical Score of the Classification Signature<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Next, we can separate the patients of the cohort according to their previously computed Clinical Score for the Classification Signature:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>separatePatients(<\/strong>\n    <strong>applyFileUsed<\/strong> = \"classification\",\n    <strong>metricToUse<\/strong> = \"CS\",\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Basically, this function takes as input the previously generated *.apply file as well as the <code>metaResults.meta<\/code> file.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If desired, users can also use the Quantitative Score to attempt to separate the patients of the cohort, although this generally leads to a poor separation of patients:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>separatePatients(<\/strong>\n    <strong>applyFileUsed<\/strong> = \"classification\",\n    <strong>metricToUse<\/strong> = \"QS\",\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The aim of this function is to separate patients into 2 groups: the ones with the highest values (called Long-Term Survivors if the Clinical Score is used) and the ones with the lowest values (called Short-Term Survivors if the Clinical Score is used). Please read the publication for the full details.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This function outputs several files in the <code>output &gt; class &gt; classificationSignature<\/code> folder: PDF files containing the ROC curve as well as the survival curves, a tabulation-delimited <code>*.class<\/code> file which is basically a <code>*.apply<\/code> file with a supplemental column indicated in which group (LTS or STS) each patient is, and several tabulation-delimited text files with different information and statistics about the two generated groups, the ROC curves, etc.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"8\" style=\"font-size:28px\"><strong>8) Use of custom gene signatures instead of the Classification Signature<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If desired, users have the possiblity to use their own custom gene signatures instead of the determined Classification Signature. The process to follow is exactly the same as for the steps 6) and 7), except that all the signatures of interest should be written in the <code>output &gt; signatures &gt; customSignatures.sign<\/code> file, which should follow the same file format as previously described for the <code>classificationSignature.sign<\/code> file:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Compute the Clinical Scores and the Quantitative Scores for each custom gene signatures and for each patient\n<\/em>\n<strong>applySignature(<\/strong>\n    <strong>signatureUsed<\/strong> = \"custom\",\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Technically, in this context, this function performs exactly the same tasks as in the step 6), but organizes the results in different output folders: <code>output &gt; customSignatures &gt; fullTables<\/code> for the <code>*.apply<\/code> files as previously described and <code>output &gt; customSignatures &gt; synthesis<\/code> for a tabulation-delimited text file which contains the Pearson&rsquo;s and Spearman&rsquo;s coefficient of correlation as well as their associated p-values and the coefficient determined for a linear regression between the Quantitative Scores and Clinical Scores for each custom signature. This will help users to determine if their own gene signatures (which one(s) and with which magnitude) is positively or negatively associated with survival.<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Use the Clinical Scores obtained for each of the custom signatures to separate the patients of the cohort\n<\/em>\n<strong>separatePatients(\n<\/strong>    <strong>applyFileUsed<\/strong> = \"custom\",\n    <strong>metricToUse<\/strong> = \"CS\",\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This function also performs exactly as the one mentioned in the step 7), but instead creates in the <code>output &gt; class<\/code> folder one subdirectory per custom signature.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"9\" style=\"font-size:28px\"><strong>9) Adding distinguishing genes as a new layer of complexity for custom signatures<\/strong><strong> <\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If desired, users have the possiblity to use one or several distinguishing gene(s) during the computation of the Clinical Score and Quantitative Score as well as during the patients separation. Briefly, a distinguishing gene is a given gene for which users want to deeper stratify the impact of the gene signatures on the overall survival (only for the custom signatures). This will simply add a new column in the resulting tables which will allow users to better see if a low, intermediate or high expression actually impacts the survival. The process is exactly the same as for the steps 6), 7) and 8):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Compute the Clinical Score and the Quantitative Score on the custom gene signatures using one or more gene(s) as distinguishing gene(s)\n<\/em>\n<strong>applySignature(<\/strong>\n    <strong>signatureUsed<\/strong> = \"custom\",\n    <strong>distinguishingGenes<\/strong> = TRUE,\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that the tabulation-delimited text file containing the distinguishing gene(s) to use should follow a precise format, although its name can be whatever the users want (<code>*.txt<\/code>): this file should contain a single column table, with the first line named <code>HGNC_GeneSymbol<\/code> and the subsequent ones indicating the actual genes to use. These genes must be in the HGNC format. This file should mandatorily be located in the <code>output &gt; apply &gt; input<\/code> folder. <\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This functions also performs similar computations as compared to the ones presented in the step 8), but outputs more results that are ordered in supplementary folders:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\">In the <code>output &gt; apply &gt; customSignatures &gt; fullTables<\/code> folder, there will be zip files (one per distinguishing gene used) containing the actual <code>*.apply<\/code> files as previously described in the step 6)<\/li>\n\n\n\n<li style=\"font-size:16px\">In the <code>output &gt; apply &gt; customSignatures &gt; synthesis<\/code> folder, there will be one subfolder per distinguishing gene which will contain a tabulation-delimited text file describing the Clinical Scores and Quantitative Scores means for the low, intermediate and high expressing groups for the distinguishing gene of interest, as well as the associated p-values for each possible comparison<\/li>\n\n\n\n<li style=\"font-size:16px\">In the <code>output &gt; apply &gt; customSignatures &gt; CS<\/code> folder, there will be zip files (one per distinguishing gene used) containing PDF graphs showing the variation of the Clinical Scores for each signature relative to the low, intermediate or high expression of the current distinguishing gene, as well as tabulation-delimited text files which statistically describe these graphs (p-values of comparisons, mean, median, etc.)<\/li>\n\n\n\n<li style=\"font-size:16px\">In the <code>output &gt; apply &gt; customSignatures &gt; QS<\/code> folder, there will be zip files (one per distinguishing gene used) containing PDF graphs showing the variation of the Quantitative Scores for each signature relative to the low, intermediate or high expression of the current distinguishing gene, as well as tabulation-delimited text files which statistically describe these graphs (p-values of comparisons, mean, median, etc.)<\/li>\n\n\n\n<li style=\"font-size:16px\">In the <code>output &gt; apply &gt; customSignatures &gt; CS VS QS<\/code> folder, there will be zip files (one per distinguishing gene used) containing PDF graphs showing the correlation between the Clinical Scores and the Quantitative Scores for each signature with highlight of the low, intermediate or high expression of the current distinguishing gene, as well as tabulation-delimited text files which statistically describe these graphs (Pearson&rsquo;s and Spearman&rsquo;s coefficients of correlation as well as linear regression coefficients)<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"10\" style=\"font-size:28px\"><strong>10) Further analyses<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If needed, users can go deeper in the analysis of the datasets, as we actually did in our associated publication. To this, they can use already implemented methods such as the <code>limma<\/code> R package for differential analyses of gene or pathway expressions, the <code>GSVA<\/code> R package for the enrichment analysis of gene sets (pathways) and <code>igraph<\/code> R package for the generation of network-based graphs. The use of these functions will not be detailed in here, as they are not officially implemented in <code>AMOCATI<\/code>. We invite users to read the respective vignettes and documentations for these packages.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"11\" style=\"font-size:28px\"><strong>11) Miscellaneous<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><code>AMOCATI<\/code> also offers the possibility to split a given dataset into two parts, which is useful to lead analyses in a discovery\/validation intra-cohort setting:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>splitData(<\/strong>\n    <strong>datasetAProportion<\/strong> = 0.5,\n    <strong>verbose<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This function will simply output 2 new *.data subdatasets (<code>fullData_datasetA.data<\/code> and <code>fullData_datasetB.data<\/code>) from the original <code>output &gt; data &gt; fullData.data<\/code> file and save them in the same location.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"12\" style=\"font-size:28px\"><strong>12) Citation<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If you used <code>AMOCATI<\/code> in your work, we kindly encourage you to properly cite our bioRxiv manuscript:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\"><a href=\"https:\/\/www.biorxiv.org\/content\/10.1101\/2024.06.18.596859\" target=\"_blank\" rel=\"noreferrer noopener\">bioRxiv<\/a><\/li>\n\n\n\n<li style=\"font-size:16px\">DOI: <a href=\"https:\/\/doi.org\/10.1101\/2024.06.18.596859\" target=\"_blank\" rel=\"noreferrer noopener\">10.1101\/2024.06.18.596859<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>AMOCATI: Algorithmic Meta-analysis Of Clinical And Transcriptomic Information Warning: this tutorial is only available in English, even if you choose the French language at the bottom of the screen. Thank you for your understanding. AMOCATI is a R-written package which aims to analyze transcriptome-based datasets, and more specifically quantify how [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"page-templates\/template-fullwidth.php","meta":{"_themeisle_gutenberg_block_has_review":false,"footnotes":""},"class_list":["post-1643","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/pages\/1643","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/comments?post=1643"}],"version-history":[{"count":74,"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/pages\/1643\/revisions"}],"predecessor-version":[{"id":2041,"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/pages\/1643\/revisions\/2041"}],"wp:attachment":[{"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/media?parent=1643"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}