{"id":395,"date":"2022-12-01T11:31:20","date_gmt":"2022-12-01T10:31:20","guid":{"rendered":"https:\/\/paul-regnier.fr\/?page_id=395"},"modified":"2026-01-29T10:29:33","modified_gmt":"2026-01-29T09:29:33","slug":"tutoriel-picaflow","status":"publish","type":"page","link":"https:\/\/paul-regnier.fr\/en_gb\/tutoriel-picaflow\/","title":{"rendered":"PICAFlow tutorial"},"content":{"rendered":"<p class=\"has-text-align-center has-black-color has-text-color has-large-font-size wp-block-paragraph\"><strong>PICAFlow: Pipeline for Integrative and Comprehensive Analysis of flow\/mass cytometry data<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Warning: this tutorial is only available in English, even if you choose the French language at the bottom of the screen. Thank you for your understanding.<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color wp-block-paragraph\" style=\"font-size:16px\"><code>PICAFlow<\/code> is a R package allowing to process cytometry data from raw FCS files to deep and comprehensive analysis of underlying key messages.<\/p>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a1907211&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a1907211\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/06\/craiyon_170831_Scientist_Pikachu_in_a_lab_coat_and_goggles_with_bright_smile.png\" alt=\"\" class=\"wp-image-981\" style=\"width:484px;height:484px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/06\/craiyon_170831_Scientist_Pikachu_in_a_lab_coat_and_goggles_with_bright_smile.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/06\/craiyon_170831_Scientist_Pikachu_in_a_lab_coat_and_goggles_with_bright_smile-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/06\/craiyon_170831_Scientist_Pikachu_in_a_lab_coat_and_goggles_with_bright_smile-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/06\/craiyon_170831_Scientist_Pikachu_in_a_lab_coat_and_goggles_with_bright_smile-768x768.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/06\/craiyon_170831_Scientist_Pikachu_in_a_lab_coat_and_goggles_with_bright_smile-12x12.png 12w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><figcaption class=\"wp-element-caption\"><strong>An AI-generated visual representation of what <code>PICAFlow<\/code> could be in real life. Drawn by Craiyon.<\/strong><\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\" style=\"font-size:28px\"><strong>Table of contents<\/strong><\/h2>\n\n\n\n<ol style=\"font-size:24px\" class=\"wp-block-list\">\n<li style=\"font-size:28px\"><a href=\"#1\" data-type=\"internal\">Prerequisites<\/a>\n<ol style=\"font-size:18px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\"><a href=\"#1.1\">PICAFlow R package installation<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#1.2\">Troubleshooting<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#1.3\" data-type=\"internal\" data-id=\"#1.2\">Load PICAFlow<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#1.4\">Update PICAFlow<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#1.5\">Example dataset<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#2\">Pre-processing<\/a>\n<ol style=\"font-size:18px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\"><a href=\"#2.1\">Working directory setup<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#2.2\">Convert FCS files to rds data files<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#2.3\">Pre-gating (optional)<\/a>\n<ol class=\"wp-block-list\">\n<li style=\"font-size:20px\"><a href=\"#2.3.1\">Merge samples<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#2.3.2\">Gate cells<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#2.3.3\">Reexport individual rds files<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#2.4\">Subset data<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#3\">Transformation and compensation<\/a>\n<ol style=\"font-size:18px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\"><a href=\"#3.1\">Visually determine optimal transformation parameters<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#3.2\">Apply transformation<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#3.3\">Visually edit compensation matrix (optional)<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#3.4\">Compensate data (optional)<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#3.5\">Export data per parameter<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#4\">Normalization<\/a>\n<ol style=\"font-size:18px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\"><a href=\"#4.1\">Plot transformed signal densities<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#4.2\">Peaks analysis<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#4.3\">Compute peaks data for new samples but keep previously generated peaks data (optional)<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#4.4\">Fine tune the peaks interactively (optional)<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#4.5\">Synthesize peaks information<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#4.6\">Normalize data<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#4.7\">Plot normalized signal densities<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#4.8\">Merge normalized data<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#5\">Gating<\/a><\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#6\">Downsample, rescale and split data<\/a>\n<ol style=\"font-size:18px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\"><a href=\"#6.1\">Downsample and rescale data<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#6.2\">Split data (optional)<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#7\">UMAP dimensionality reduction<\/a>\n<ol style=\"font-size:18px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\"><a href=\"#7.1\">Initial preparation<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#7.2\">Determination of optimal UMAP hyperparameters<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#7.3\">Generate UMAP model on downsampled dataset<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#7.4\">Apply UMAP model to the remaining data<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#7.5\">Combine and export UMAP data<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#8\">Export FCS files<\/a><\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#9\">Clustering<\/a>\n<ol style=\"font-size:22px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\"><a href=\"#9.1\">Test parameters normality<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#9.2\">Apply a clustering method<\/a>\n<ol class=\"wp-block-list\">\n<li style=\"font-size:20px\"><a href=\"#9.2.1\">Using hierarchical clustering + k-nearest neighbors approach<\/a>\n<ol class=\"wp-block-list\">\n<li style=\"font-size:18px\"><a href=\"#9.2.1.1\">Initial clustering on training dataset<\/a><\/li>\n\n\n\n<li style=\"font-size:18px\"><a href=\"#9.2.1.2\">Final clustering on training dataset<\/a><\/li>\n\n\n\n<li style=\"font-size:18px\"><a href=\"#9.2.1.3\">Apply clustering model on validation dataset<\/a><\/li>\n\n\n\n<li style=\"font-size:18px\"><a href=\"#9.2.1.4\">Determine binary thresholds<\/a><\/li>\n\n\n\n<li style=\"font-size:18px\"><a href=\"#9.2.1.5\">Collapse phenotypically close clusters<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#9.2.2\">Using FlowSOM method<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#9.2.3\">Using PhenoGraph method<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#9.3\">Visualize clusters on UMAP dimensions<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#9.4\">Export clusters-associated statistics and plots<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#9.5\">Export clusters-associated heatmaps<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#10\">Metadata integration and analysis<\/a>\n<ol style=\"font-size:18px\" class=\"wp-block-list\">\n<li style=\"font-size:22px\"><a href=\"#10.1\">Open and merge datasets<\/a><\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#10.2\">I have metadata available for my dataset<\/a>\n<ol class=\"wp-block-list\">\n<li style=\"font-size:20px\"><a href=\"#10.2.1\">Metadata integration<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#10.2.2\">Subset merged data<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#10.2.3\">UMAP dimensionality reduction of cell cluster abundances<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#10.2.4\">Remove outliers (optional)<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#10.2.5\">Include final UMAP embeddings to the dataset<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#10.2.6\">Hierarchical clustering of UMAP projection<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#10.2.7\">Prepare data for export<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#10.2.8\">Export merged data, boxplots and feature tables<\/a><\/li>\n\n\n\n<li style=\"font-size:20px\"><a href=\"#10.2.9\">ROC analysis<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:22px\"><a href=\"#10.3\">I do not have metadata available for my dataset<\/a><\/li>\n<\/ol>\n<\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#11\">Parameters export<\/a><\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#12\">Acknowledgements<\/a><\/li>\n\n\n\n<li style=\"font-size:28px\"><a href=\"#13\">Citation<\/a><\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"1\" style=\"font-size:28px\"><strong>1) Prerequisites<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1.1\" style=\"font-size:22px\"><strong>1.1) PICAFlow R package installation<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><code>PICAFlow<\/code> package can be installed with the following commands:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># The following line can be skipped if the devtools package is already installed\n<\/em>\n<strong>install.packages(<\/strong>\"devtools\"<strong>)<\/strong>\n\n<em># Load the devtools package\n<\/em>\n<strong>library(<\/strong>\"devtools\"<strong>)<\/strong>\n\n<em># Install\/update PICAFlow from GitHub repository\n<\/em>\n<strong>devtools::install_github(<\/strong>\"PaulRegnier\/PICAFlow\", force = TRUE<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1.2\" style=\"font-size:22px\"><strong>1.2) Troubleshooting<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">In the case the installation of <code>PICAFlow<\/code> is not successful, please check the following points:<\/p>\n\n\n\n<ul style=\"font-size:16px\" class=\"wp-block-list\">\n<li style=\"font-size:16px\">If you run R under the Windows operating system, do you have the <code>Rtools<\/code> utility correctly installed and in the right compatible version?\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\">Please visit <a href=\"https:\/\/cran.r-project.org\/bin\/windows\/Rtools\/\">this link<\/a> to get more information about <code>Rtools<\/code> and install it<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<ul style=\"font-size:16px\" class=\"wp-block-list\">\n<li style=\"font-size:16px\">Do you have a working version of the <code>Git<\/code> utility installed on your operating system?\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\">Please visit <a href=\"https:\/\/git-scm.com\/downloads\">this link<\/a> to get more information about <code>Git<\/code> and install it<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\">Do you have a working version of the <code>devtools<\/code> R package installed on your computer?\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\">If you do not know, run the following command to force the (re)installation of <code>devtools<\/code>:<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>install.packages(<\/strong>\"devtools\", force = TRUE<strong>)<\/strong>\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\">If any of the above do not help, you can consider to manually install the packages that are not hosted directly on the CRAN servers:<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Install packages that are mandatory:\n<\/em>\n<strong>install.packages(<\/strong>\"BiocManager\", force = TRUE<strong>)<\/strong>\n<strong>install.packages(<\/strong>\"remotes\", force = TRUE<strong>)<\/strong>\n\n<em># Install packages hosted by the Bioconductor repository:\n<\/em>\n<strong>BiocManager::install(<\/strong>\"Biobase\", force = TRUE<strong>)<\/strong>\n<strong>BiocManager::install(<\/strong>\"flowCore\", force = TRUE<strong>)<\/strong>\n<strong>BiocManager::install(<\/strong>\"flowWorkspace\", force = TRUE<strong>)<\/strong>\n<strong>BiocManager::install(<\/strong>\"flowStats\", force = TRUE<strong>)<\/strong>\n<strong>BiocManager::install(<\/strong>\"ggcyto\", force = TRUE<strong>)<\/strong>\n<strong>BiocManager::install<\/strong>(\"flowGate\", force = TRUE<strong>)<\/strong>\n<strong>BiocManager::install(<\/strong>\"FlowSOM\", force = TRUE<strong>)<\/strong>\n\n<em># Finally, try to install PICAFlow:\n<\/em>\n<strong>devtools::install_github(<\/strong>\"PaulRegnier\/PICAFlow\", force = TRUE<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-text-align-left has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: in some rare cases, even after the troubleshooting maneuvers, the installation of <code>PICAFlow<\/code> could still fail, notably if users already have the <code>flowWorkspace<\/code> package installed. To overcome this problem, users should manually remove the <code>flowWorkspace<\/code> folder in their R library, then proceed with the reinstallation of this package alone, following the previously given R command regarding this specific package.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Due to the fact that <a href=\"https:\/\/github.com\/PaulRegnier\/PICAFlow\/issues\/1\">some users could be affected by a persistent error related to the <code>FastPG<\/code> package during <code>PICAFlow<\/code> installation under recent MacOS versions<\/a>, the <code>FastPG<\/code> package was moved from the <code>Imports<\/code> to the <code>Suggests<\/code> list in the <code>DESCRIPTION<\/code> file of <code>PICAFlow<\/code>. This means that the <code>FastPG<\/code> package is now not installed by default and is only loaded when needed. This will allow MacOS users to correctly install and use <code>PICAFlow<\/code> if <code>FastPG<\/code> installation is persistently failing. Of note, if the <code>FastPG<\/code> package is not installed, then the associated FastPG\/PhenoGraph clustering method will not be available. The following command will launch the installation process of the <code>FastPG<\/code> package:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># After installing Git on your system, proceed with the installation of a GitHub-hosted package through the Bioconductor installation function:<\/em>\n<em>\n<\/em><strong>devtools::install_github(<\/strong>\"sararselitsky\/FastPG\", force = TRUE<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1.3\" style=\"font-size:22px\"><strong>1.3) Load PICAFlow<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">To load <code>PICAFlow<\/code>, simply enter the following command in the R console:<\/p>\n\n\n\n<pre class=\"wp-block-code has-black-color has-text-color\" style=\"font-size:16px\"><code><strong>library(<\/strong>\"PICAFlow\"<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1.4\" style=\"font-size:22px\"><strong>1.4) Update PICAFlow<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Upon loading, <code>PICAFlow<\/code> checks online if there is a newer version of the package. If <code>PICAFlow<\/code> notifies you of an update, we highly recommend you to update as soon as possible, in order to benefit from the latest functionalities and fixes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">To update <code>PICAFlow<\/code>, simply enter the following command in the R console:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>devtools::install_github(<\/strong>\"PaulRegnier\/PICAFlow\", force = TRUE<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If the installation fails or if the update message keeps displaying after the update, try providing the tag corresponding to the version you want to install:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>devtools::install_github(<\/strong>\"PaulRegnier\/PICAFlow@x.x.x\", force = TRUE<strong>)<\/strong> # Where x.x.x represents the version number to install<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">You can have access to the latest version number by going directly on the <a href=\"https:\/\/github.com\/PaulRegnier\/PICAFlow\/releases\">GitHub releases<\/a> page.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If you still see the update message, please remove the <code>PICAFlow<\/code> package from its installation path and reinstall it from scratch using the aforementioned command.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"1.5\" style=\"font-size:22px\"><strong>1.5) Example dataset<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">For learning and test purposes, <code>PICAFlow<\/code> includes a dataset composed of 25 FCS files pre-processed using FlowJo version 10.1. These FCS files were obtained after the antibody staining of thawed peripheral blood mononuclear cells (PBMCs) previously isolated from human whole blood. The dataset includes 5 healthy donors (labelled as <em>HD<\/em>), 5 patients with Sj\u00f6gren&rsquo;s syndrome (labelled as <em>Sjogren<\/em>), 5 patients with cryoglobulinemia (labelled as <em>Cryo<\/em>), 5 patients with Systemic Lupus Erythematosus (labelled as <em>SLE<\/em>) and 5 patients with Rheumatoid Arthritis (labelled as <em>RA<\/em>).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">You can download two distinct versions of the test dataset: either the <a href=\"https:\/\/paul-regnier.fr\/picaflow\/PICAFlow_TestDataset_RawData.zip\">raw FCS files version<\/a> (roughly 2.2GB) or the <a href=\"https:\/\/paul-regnier.fr\/picaflow\/PICAFlow_TestDataset_AlreadyPreprocessedData.zip\">already pre-processed FCS version<\/a> (roughly 1.1GB, please see below for further explanation of the pre-processing steps that were performed).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The staining panel (primarily designed to target and study the B cells compartment) consisted of the following antibodies, which was then revealed using a BD LSR Fortessa flow cytometer:<\/p>\n\n\n\n<ul style=\"font-size:16px\" class=\"wp-block-list\">\n<li style=\"font-size:16px\">Anti-FcRL5 coupled with APC fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-CD24 coupled with APC-Cy7 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-CD38 coupled with Alexa Fluor 700 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-CXCR5 coupled with BV421 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-IgD coupled with BV510 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-CD19 couled with BV605 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-G6 coupled with BV650 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-FcRL3 coupled with BV711 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-CD27 coupled with BV786 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-CD95 coupled with FITC fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-IgM coupled with PE fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-CD11c coupled with PE-Cy5 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-Tbet coupled with PE-Cy5.5 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-CD21 coupled with PE-Cy7 fluorochrome<\/li>\n\n\n\n<li style=\"font-size:16px\">Anti-CD3 coupled with PE-TexasRed fluorochrome<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The pre-processing of the raw FCS data included the following steps:<\/p>\n\n\n\n<ul style=\"font-size:16px\" class=\"wp-block-list\">\n<li>Adjustment of the compensation matrix<\/li>\n\n\n\n<li>Renaming of the samples to follow the required <code>CustomPanelName_Group-SampleGroup_Sample-SampleName<\/code> format<\/li>\n\n\n\n<li>Exclusion of non lymphocyte-shaped and doublet cells<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Importantly, all figures that are shown in this tutorial actually refer to the raw FCS files.<\/p>\n\n\n\n<p class=\"has-text-align-left has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that to use the full capacities of <code>PICAFlow<\/code>, every FCS file should be renamed following the aforementioned <code>CustomPanelName_Group-SampleGroup_Sample-SampleName<\/code><\/strong> <strong>format (example: <code>PanelBCells_Group-HealthyControl_Sample-PatientXY<\/code>). If this is not the case, please use the <code>file.rename()<\/code> function in R to format the file names appropriately.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"2\" style=\"font-size:28px\"><strong>2) Pre-processing<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"2.1\" style=\"font-size:22px\"><strong>2.1) Working directory setup<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><code>PICAFlow<\/code> is designed to work in a self-organized set of directories, which can be created by running the following commands:<\/p>\n\n\n\n<pre class=\"wp-block-code has-black-color has-text-color\" style=\"font-size:16px\"><code><em># Define the working directory path\n<\/em>\nworkingDirectory = <strong>file.path(<\/strong>\"C:\", \"Users\", \"Paul\", \"Lab\", \"R\", \"PICAFlow\"<strong>)<\/strong>\n<strong>setwd(<\/strong>workingDirectory<strong>)<\/strong>\n\n<em># Defining a global seed (used in later parts of the tutorial to ensure reproducibility)\n<\/em>\nseed_value = 42\n\n<em># Create the actual working directory tree\n<\/em>\n<strong>setupWorkingDirectory()<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The <code><strong>setupWorkingDirectory()<\/strong><\/code> function creates the following directories tree in the <code>workingDirectory<\/code> path:<\/p>\n\n\n\n<ul style=\"font-size:16px\" class=\"wp-block-list\">\n<li style=\"font-size:16px\">input<\/li>\n\n\n\n<li style=\"font-size:16px\">output\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\">1_Transformation<\/li>\n\n\n\n<li style=\"font-size:16px\">2_Normalization<\/li>\n\n\n\n<li style=\"font-size:16px\">3_Gating<\/li>\n\n\n\n<li style=\"font-size:16px\">4_Downsampling<\/li>\n\n\n\n<li style=\"font-size:16px\">5_UMAP<\/li>\n\n\n\n<li style=\"font-size:16px\">6_FCS<\/li>\n\n\n\n<li style=\"font-size:16px\">7_Clustering<\/li>\n\n\n\n<li style=\"font-size:16px\">8_Analysis<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li style=\"font-size:16px\">rds<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><code>input<\/code> directory will contain FCS files that you want to analyze. <code>output<\/code> directory will contain the different parts of the analysis process, organized in several subdirectories. <code>rds<\/code> directory will contain intermediate versions of FCS files generated during the first steps of their processing.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"2.2\" style=\"font-size:22px\"><strong>2.2) Convert FCS files to rds data files<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The very first step of the workflow consists to convert each FCS file from the <code>input<\/code> directory to an associated rds file. Each rds file actually contains a single <code>FlowFrame<\/code> object (from <code>flowCore<\/code> package). This step helps to improve the speed and efficiency of the subsequent processing steps:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Here, we do not need to use a conversion table\n<\/em>\nparametersConversionTable = NULL\n\n<em># Convert all FCS files to FlowFrames contained in rds files\n<\/em>\ntotalParametersList = <strong>convertToRDS(<\/strong>\n    <strong>conversionTable<\/strong> = parametersConversionTable\n<strong>)<\/strong>\n\ntotalParametersList\n\n   Parameter_ID    Parameter_Name Parameter_Description\n1             1              Time  Empty Description #1\n2             2             FSC-A  Empty Description #2\n3             3             FSC-H  Empty Description #3\n4             4             FSC-W  Empty Description #4\n5             5             SSC-A  Empty Description #5\n6             6             SSC-H  Empty Description #6\n7             7             SSC-W  Empty Description #7\n8             8    PE-Texas Red-A                   CD3\n9             9           BV605-A                  CD19\n10           10           BV786-A                  CD27\n11           11          PE-Cy7-A                  CD21\n12           12            FITC-A                  CD95\n13           13          PE-Cy5-A                 CD11c\n14           14           BV421-A                 CXCR5\n15           15        PE-Cy5_5-A                  Tbet\n16           16           BV711-A                 FCRL3\n17           17             APC-A                 FCRL5\n18           18           BV650-A                    G6\n19           19         APC-Cy7-A                  CD24\n20           20 Alexa Fluor 700-A                  CD38\n21           21           BV510-A                   IgD\n22           22              PE-A                   IgM<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The <code><strong>convertToRDS()<\/strong><\/code> function, thanks to the <code>conversionTable<\/code> argument, allows to convert some (or all) parameter names from a value to another, if needed. It also allows to delete one or more channels from the dataset (please use with caution). This is typically used when samples are labelled using two different panels (for instance in different batches) showing the same antibodies specificites but different fluorophores\/metal labelling (and <em>vice versa<\/em>). Please see the <code>?convertToRDS<\/code> documentation for more information.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This function also returns a list (named <code>totalParametersList<\/code> here) of all the parameters present within the dataset, after eventual renaming, which is useful to select which parameters to keep or not.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Depending on the metadata quality of the dataset to analyze, and especially the exact matches between the acquired channels for all the samples, it is possible that users may need to reexport new FCS files to correct these problems. Please be aware that some third-party software (like FlowJo for instance) seem to reorder cytometry channels in newly exported FCS files as compared to FCS files from the same experiment which were not processed by FlowJo. This behaviour can make processed FCS files impossible to integrate with the other non-processed ones. In this case, remember to equally process the FCS files with such software to avoid any potential mismatch. In all cases, the content of the subsequent <code>totalParametersList<\/code> variable will helps users to identify what failed (both for channels and descriptions) and provide clues on how to correct it if necessary.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"2.3\" style=\"font-size:22px\"><strong>2.3) Pre-gating (optional)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">One of the very first steps when analyzing cytometry data is to gate on cells of interest, usually lymphocytes, using notably FSC and SSC parameters, but also to remove dead cells and doublets. This can be achieved in <code>PICAFlow<\/code> with the steps detailed below.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: This section is totally optional, as users can directly load pre-gated FCS files within the <code>PICAFlow<\/code> workflow.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note 2: This section is only relevant for flow cytometry data, even if it can be applied on mass cytometry data if desired.<\/strong><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"2.3.1\" style=\"font-size:20px\"><strong>2.3.1) Merge samples<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">First, we have to create a new <code>rds<\/code> file which will contain the merged individual samples:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code>totalFlowset = <strong>mergeSamples(<\/strong>\n    <strong>suffix<\/strong> = NULL,\n    <strong>useStructureFromReferenceSample<\/strong> = 1\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The <code>useStructureFromReferenceSample<\/code> argument is used to eventually use a given sample as a structure and parameter name\/description reference. This is typically used when channels do not exactly match (notably regarding their acquisition order) despite the fact that the same panel was used and acquired on the same cytometer.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"2.3.2\" style=\"font-size:20px\"><strong>2.3.2) Gate cells<\/strong><\/h4>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Even if complete on its own, this subsection can be seen as a slightly simplified version of the section number 5 which is totally dedicated to gating. Feel free to read it if you want more information or details.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Actually, the principle is simple, even if it needs a lot of variables to be defined. Concretely, the <code><strong>gateData()<\/strong><\/code> function needs some arguments to be specified in order to correctly create, compute and apply the desired gate:<\/p>\n\n\n\n<ul style=\"font-size:16px\" class=\"wp-block-list\">\n<li><code>gateName_value<\/code> contains the user-friendly name of the current gate<\/li>\n\n\n\n<li><code>xParameter_value<\/code> contains the parameter name to be used on the x axis<\/li>\n\n\n\n<li><code>yParameter_value<\/code> contains the parameter name to be used on the y axis<\/li>\n\n\n\n<li><code>xlim_value contains<\/code> a vector of size 2 detailing the x values to use as limits for data display<\/li>\n\n\n\n<li><code>ylim_value<\/code> contains a vector of size 2 detailing the y values to use as limits for data display<\/li>\n\n\n\n<li><code>samplesToUse_value<\/code> contains a vector detailing the samples to use for direct displaying<\/li>\n\n\n\n<li><code>samplesPerPage_value<\/code> contains the number of samples to plot per PDF page<\/li>\n\n\n\n<li><code>inverseGating_value<\/code> contains a boolean defining is the current gate should be inclusive (<code>FALSE<\/code>) or exclusive\/inverted (<code>TRUE<\/code>)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Each of these values is contained in a single list named <code>totalGatingParameters_preProcessing<\/code>, where each new gate will be a new element of the list.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">One has the possibility to extract the 2nd and 99th percentiles of a given parameter distribution, to pre-determine values for feeding the <code><strong>gateData()<\/strong><\/code> function:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Extract the 2nd and 99th percentiles for the FSC-A parameter distribution from the 1st sample\n<\/em>\n<strong>getParameterLimits(flowset<\/strong> = totalFlowset, <strong>sample<\/strong> = 1, <strong>parameter<\/strong> = \"FSC-A\"<strong>)<\/strong>\n\n<em># Extract the 2nd and 99th percentiles for the SSC-A parameter distribution from the 1st sample\n<\/em>\n<strong>getParameterLimits(flowset<\/strong> = totalFlowset, <strong>sample<\/strong> = 1, <strong>parameter<\/strong> = \"SSC-A\"<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Next, we can directly use the following commands to set these parameters and draw a gate to delineate lymphocytes using <code>SSC-A<\/code> and <code>FSC-A<\/code> parameters:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code>totalStats_preProcessing = <strong>list()<\/strong>\ntotalGatingParameters_preProcessing = <strong>list()<\/strong>\n\n<em># Define some mandatory arguments<\/em>\n\ngateName_value = \"Lymphocytes\"\n\ntotalGatingParameters_preProcessing&#91;&#91;gateName_value]] = <strong>list(<\/strong>\n    <strong>xParameter_value<\/strong> = \"FSC-A\",\n    <strong>yParameter_value<\/strong> = \"SSC-A\",\n    <strong>xlim_value<\/strong> = c(0, 200000),\n    <strong>ylim_value<\/strong> = c(0, 150000),\n    <strong>samplesToUse_value<\/strong> = c(1:6),\n    <strong>samplesPerPage_value<\/strong> = 6,\n    <strong>inverseGating_value<\/strong> = FALSE,\n    <strong>gateName_value<\/strong> = gateName_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">It is possible to recall the full list of the paramaters embedded within the dataset by running the following command:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Recall all channel names embedded within the dataset\n<\/em>\n<strong>getAllChannelsInformation()<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, we have to run an interactive R Shiny application implemented in the <code><strong>gateData()<\/strong><\/code> function to create the gate we want and display it on some selected samples:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Generate global gate and show on some samples<\/em>\n\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = totalFlowset,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = FALSE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = NULL,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 2000\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If the gate is satisfying enough, then we can apply it to all samples and export the subsequent plots:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Apply to all samples and export plots<\/em>\n\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = totalFlowset,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 2000\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If needed, we can even change the gate independently and iteratively for specific samples if it does not fit well:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># If needed, change the gate for selected samples<\/em>\n\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = totalFlowset,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = TRUE,\n    <strong>specificGates<\/strong> = c(7, 8, 9, 10, 11, 13, 15, 16, 17, 18, 19, 20, 21, 23, 25),\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 2000\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">When we are happy with our gates, we now have to actually gate the flowset and export gated cells as <code>Gate1<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Actually gate the flowset and export gated cells\n<\/em>\nGate1 = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = totalFlowset,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = TRUE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = associatedInfos$generatedGates,\n    <strong>customBinWidth <\/strong>= 2000\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Finally, we add the information about the generated gates in the <code>totalGatingParameters_preProcessing<\/code> list:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code>totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$generatedGates = Gate1$generatedGates\n\ntotalStats_preProcessing&#91;&#91;gateName_value]] = Gate1$summary\n\nGate1 = Gate1$flowset<\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Of course, this process can be used iteratively to further gate on cells of interest.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">For instance, in the following commands, we gate within the <code>Gate1<\/code> flowset to extract single cells:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define some mandatory arguments\n<\/em>\ngateName_value = \"Singulets\"\n\ntotalGatingParameters_preProcessing&#91;&#91;gateName_value]] = <strong>list(<\/strong>\n    <strong>xParameter_value<\/strong> = \"FSC-A\",\n    <strong>yParameter_value<\/strong> = \"FSC-W\",\n    <strong>xlim_value<\/strong> = c(0, 200000),\n    <strong>ylim_value<\/strong> = c(0, 200000),\n    <strong>samplesToUse_value<\/strong> = c(1:6),\n    <strong>samplesPerPage_value<\/strong> = 6,\n    <strong>inverseGating_value<\/strong> = FALSE,\n    <strong>gateName_value<\/strong> = gateName_value\n)\n\n<em># Generate global gate and show on some samples\n<\/em>\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = Gate1,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = FALSE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = NULL,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 2000\n)\n\n<em># Apply to all samples and export plots\n<\/em>\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = Gate1,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 2000\n)\n\n<em># If needed, change the gate for selected samples\n<\/em>\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = Gate1,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = TRUE,\n    <strong>specificGates<\/strong> = c(10, 11, 21),\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 2000\n)\n\n<em># Actually gate the flowset and export gated cells\n<\/em>\nGate2 = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = Gate1,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters_preProcessing&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = TRUE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = associatedInfos$generatedGates,\n    <strong>customBinWidth <\/strong>= 2000\n)\n\n<em># Add information about generated gates and their statistics to the totalGatingParameters_preProcessing list\n<\/em>\ntotalGatingParameters_preProcessing&#91;&#91;gateName_value]]$generatedGates = Gate2$generatedGates\n\ntotalStats_preProcessing&#91;&#91;gateName_value]] = Gate2$summary\n\nGate2 = Gate2$flowset<\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Noteworthily, we are fully aware that true removal of doublets in flow cytometry is usually done with FSC-A vs. FSC-H and\/or SSC-A vs. SSC-H combinations. Unfortunately, an unwanted mismanipulation in the BD FACSDiva software before the acquisition of some batches led to FSC-H and SSC-H parameters to be mislabelled as FSC-W and SSC-W, respectively. <\/strong> <\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, we want to actually export the gate parameters which were used and their respective statistics as well as the gated cells:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>saveRDS(<\/strong>\n    totalGatingParameters_preProcessing,\n    <strong>file.path(<\/strong>\"output\", \"3_Gating\", \"gatingParameters_preProcessing.rds\"<strong>)<\/strong>\n<strong>)<\/strong>\n\n<strong>exportGatingStatistics(<\/strong>\n    <strong>totalStats<\/strong> = totalStats_preProcessing,\n    <strong>filename<\/strong> = \"gatingStatistics_preProcessing\"\n<strong>)<\/strong>\n\n<strong>saveRDS(<\/strong>\n    Gate2,\n    <strong>file.path(<\/strong>\"rds\", \"pooledSamples_gated.rds\"<strong>)<\/strong>\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Finally, we clean up the workspace a bit by deleting several useless <code>rds<\/code> files:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>unlink(file.path(<\/strong>\"rds\", \"globalGate_coordinates.rds\"<strong>))<\/strong>\n<strong>unlink(file.path(<\/strong>\"rds\", \"specialGate_coordinates.rds\"<strong>))<\/strong>\n<strong>unlink(file.path(<\/strong>\"rds\", \"pooledSamples.rds\"<strong>))<\/strong>\n<strong>gc()<\/strong><\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"2.3.3\" style=\"font-size:20px\"><strong>2.3.3) Reexport individual rds files<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Afterwards, we need to reexport individual <code>rds<\/code> files from the pooled version:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>exportRDSFilesFromPool(<\/strong>\n    <strong>RDSFileToUse<\/strong> = \"pooledSamples_gated\",\n    <strong>coresNumber<\/strong> = 4\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Finally, we only have to remove the <code>pooledSamples_gated.rds<\/code> file, as well as other useless elements then tidy up the memory a bit:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>rm(<\/strong>Gate1<strong>)<\/strong>\n<strong>rm(<\/strong>Gate2<strong>)<\/strong>\n\n<strong>unlink(file.path(<\/strong>\"rds\", \"pooledSamples_gated.rds\"<strong>))<\/strong>\n\n<strong>gc()<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"2.4\" style=\"font-size:22px\"><strong>2.4) Subset data<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Next, we want to extract the actual parameters of interest to use for the subsequent analyses, such as fluorescence-based information in this case. At this step, you normally do not need to keep other parameters such as Time- or FSC\/SSC-based channels:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define the parameters to keep and their associated custom names\n<\/em>\nparametersToKeep = <strong>c(<\/strong>\n    \"PE-Texas Red-A\",\n    \"BV605-A\",\n    \"BV786-A\",\n    \"PE-Cy7-A\",\n    \"FITC-A\",\n    \"PE-Cy5-A\",\n    \"BV421-A\",\n    \"PE-Cy5_5-A\",\n    \"BV711-A\",\n    \"APC-A\",\n    \"BV650-A\",\n    \"APC-Cy7-A\",\n    \"Alexa Fluor 700-A\",\n    \"BV510-A\",\n    \"PE-A\"\n<strong>)<\/strong>\n\ncustomNames = <strong>c(<\/strong>\n    \"CD3_PETexasRed\",\n    \"CD19_BV605\",\n    \"CD27_BV786\",\n    \"CD21_PECy7\",\n    \"CD95_FITC\",\n    \"CD11c_PECy5\",\n    \"CXCR5_BV421\",\n    \"Tbet_PECy55\",\n    \"FCRL3_BV711\",\n    \"FCRL5_APC\",\n    \"G6_BV650\",\n    \"CD24_APCCy7\",\n    \"CD38_AlexaFluor700\",\n    \"IgD_BV510\",\n    \"IgM_PE\"\n<strong>)<\/strong>\n\n<em># Subset data\n<\/em>\n<strong>subsetData(<\/strong>\n    <strong>parametersToKeep<\/strong> = parametersToKeep,\n    <strong>customNames<\/strong> = customNames\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">You can use the <code>totalParametersList<\/code> list generated at the previous step to choose the parameters that you want to keep in the subsequent analysis. Conversely, you can also use the <code><strong>getAllChannelsInformation()<\/strong><\/code> function to retrieve them for you without running again the FCS to rds conversion previously described with the <strong><code>convertToRDS()<\/code><\/strong> function.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that rds files will be overwritten with the subset version.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"3\" style=\"font-size:28px\"><strong>3) Transformation and compensation<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3.1\" style=\"font-size:22px\"><strong>3.1) Visually determine optimal transformation parameters<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, data need to be transformed to account for the specific distribution of the acquired signals. Usually, <code>logicle<\/code>, <code>biexponential<\/code> and <code>arcsinh<\/code> transformation methods are used for cytometry data transformation. Here, the R Shiny application we included in <code>PICAFlow<\/code> allows to visualize in real-time the aspect of transformed data when any of the parameters governing the <code>logicle<\/code>, <code>biexponential<\/code> or <code>arcsinh<\/code> transformation is modified.<\/p>\n\n\n\n<ul style=\"font-size:16px\" class=\"wp-block-list\">\n<li style=\"font-size:16px\">For the <code>logicle<\/code> transformation, the parameters are:\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\"><code>t<\/code> which represents the highest value of the dataset<\/li>\n\n\n\n<li style=\"font-size:16px\"><code>w<\/code> which represents the linearization width (also called slope at 0)<\/li>\n\n\n\n<li style=\"font-size:16px\"><code>m<\/code> which represents the number of decades to use for transformed data<\/li>\n\n\n\n<li style=\"font-size:16px\"><code>a<\/code> which represents a constant to add to transformed data<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li style=\"font-size:16px\">For the <code>biexponential<\/code> transformation (using the <code>f(x) = a*exp(b*(x-w))-c*exp(-d*(x-w))+f<\/code> formula), the parameters are:\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\"><code>a<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\"><code>b<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\"><code>c<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\"><code>nd<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\"><code>f<\/code> which represents a constant bias for the intercept<\/li>\n\n\n\n<li style=\"font-size:16px\"><code>w<\/code> which represents a constant bias for the 0 point of the data<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li style=\"font-size:16px\">For the <code>arcsinh<\/code> transformation (using the <code>f(x) = asinh(a+b*x)+c)<\/code> formula), the parameters are:\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\"><code>a<\/code> which represents the shift about 0<\/li>\n\n\n\n<li style=\"font-size:16px\"><code>b<\/code> which represents the scale factor<\/li>\n\n\n\n<li style=\"font-size:16px\"><code>c<\/code><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Using the R Shiny interactive application, you can explore the impact of each parameter on the data transformation, for each cytometry parameter separately. Please note that it is possible to use a given transformation for a parameter and a different one on the others, if needed. This only depends on the dataset and the choices you are free to make.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Regarding the <code>logicle<\/code> transformation, we also added a <code>Auto-logicle<\/code> button in the R Shiny application which allows you to apply auto-determined <code>logicle<\/code> parameters instead of the hard-coded default ones. Classically, the auto-determined value for <code>t<\/code> is accurate, and <code>a<\/code> value modification is almost never needed, except if your original signal values are very low (lower or close to 0). On the contrary, auto-determined <code>w<\/code> and <code>m<\/code> values are very frequently non optimal. You can use the sliders for each parameter to make the value vary, and directly visualize the results in real-time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">To have a better overview of the transformation of the whole dataset, we included the <code><strong>mergeSamples()<\/strong><\/code> function, which creates another rds file called <code>pooledSamples.rds<\/code> containing an actual pool of all the individual rds files. The variable containing the result of this function should be called <code>fs_shiny<\/code>.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that the <code>pooledSamples.rds<\/code> file can potentially have a big size and therefore can take several minutes to open.<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Create pooled dataset and launch the R Shiny application\n<\/em>\nfs_shiny = <strong>mergeSamples(<\/strong>\n    <strong>suffix<\/strong> = NULL\n<strong>)<\/strong>\n\n<strong>launchTransformationTuningShinyApp(<\/strong>\n    <strong>fs_shiny<\/strong> = fs_shiny\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Once the visualization window is opened (see <strong>Figure 1<\/strong> below for a screenshot), you will be able to choose which dataset to use for visualization (a given sample or the pooled data) and which transformation method to use, as well as directly adjust the transformation parameters. Do not forget to click on the <code>Save current parameter<\/code> button when you are all done with a cytometry parameter. Also, do not forget to reiterate this operation on all parameters using the dedicated choice list, as the R Shiny application will not create the missing parameters for you. Of note, a parameter which has been saved will show the <code>Status: saved<\/code> line above the sliders.<\/p>\n\n\n<div class=\"wp-block-image is-style-default\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a19108d8&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a19108d8\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"2538\" height=\"1124\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 2538px) 100vw, 2538px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/09\/PICAFlow_transformationManualTuning.png\" alt=\"\" class=\"wp-image-1041\" style=\"width:768px;height:undefinedpx\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/09\/PICAFlow_transformationManualTuning.png 2538w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/09\/PICAFlow_transformationManualTuning-300x133.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/09\/PICAFlow_transformationManualTuning-1024x453.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/09\/PICAFlow_transformationManualTuning-768x340.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/09\/PICAFlow_transformationManualTuning-1536x680.png 1536w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/09\/PICAFlow_transformationManualTuning-2048x907.png 2048w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/09\/PICAFlow_transformationManualTuning-18x8.png 18w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 1 &#8211; Manual tuning of parameters for data transformation using in-house R Shiny interactive application (click on the image to open in fullscreen).<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Once all the cytometry parameters have been treated, do not forget to click on the <code>Export rds<\/code> button to actually export the transformation parameters for each channel in the <code>parametersTransformations.rds<\/code> file located in the <code>output &gt; 1_Transformation<\/code> directory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Also, don&rsquo;t forget to delete the <code>pooledSamples.rds<\/code> and <code>parametersTransformations.rds<\/code> files after the transformation parameters are exported, as well as purging <code>fs_shiny<\/code> variable and free unused RAM:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Clean up\n<\/em>\n<strong>unlink(file.path(<\/strong>workingDirectory, \"rds\", \"pooledSamples.rds\"<strong>))<\/strong>\n<strong>unlink(file.path(<\/strong>workingDirectory, \"rds\", \"parametersTransformations.rds\"<strong>))<\/strong>\n<strong>rm(<\/strong>fs_shiny<strong>)<\/strong>\n<strong>gc()<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3.2\" style=\"font-size:22px\"><strong>3.2) Apply transformation<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Finally, we can proceed to actual data transformation:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Apply the previously determined transformations and associated parameters to data\n<\/em>\n<strong>transformData(<\/strong>\n    <strong>parametersToTransform<\/strong> = parametersToKeep\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that this function expects a rds file named <code>parametersTransformations.rds<\/code> located in the <code>output &gt; 1_Transformation<\/code> directory. This file must be generated using the R Shiny application as described above. All the parameters that are specified in the <code>parametersToTransform<\/code> argument of the <code>transformData()<\/code> function must have matching transformation and associated parameters in the <code>parametersTransformation.rds<\/code> file.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Also, please note that rds files will be updated with their transformed version.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3.3\" style=\"font-size:22px\"><strong>3.3) Visually edit compensation matrix (optional)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If needed, <code>PICAFlow<\/code> offers the possibility to edit the compensation matrix embedded in the studied samples.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>This section as well as the following are only relevant for flow cytometry data, even if it can technically be applied on mass cytometry data.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">First, we need to create a new <code>rds<\/code> file which will contain all the individual samples:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>fs_shiny = mergeSamples(<\/strong>\n    <strong>suffix<\/strong> = NULL,\n    <strong>useStructureFromReferenceSample<\/strong> = 0\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, we only have to launch the dedicated R Shiny application to begin the adjustment process:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><strong>launchCompensationTuningShinyApp(<\/strong>\n    <strong>fs_shiny<\/strong> = fs_shiny,\n    <strong>maxEventsNumber<\/strong> = 100000\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Here, users can simply edit the compensation values for each possible pair of parameters in an interactive manner. Each slider controls the compensation of one axis and represents the actual compensation values (as percentages divided by 100).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">When all the necessary adjustments are done, do not forget to click on the <code>Export rds<\/code> button to write a final copy of the <code>compensationParameters.rds<\/code> file in the <code>output &gt; 1_Transformation<\/code> directory.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that when compensations are tuned, no data are overwritten nor altered. The live changes seen on the R Shiny application are purely visual.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3.4\" style=\"font-size:22px\"><strong>3.4) Compensate data (optional)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Afterwards, the compensations must be applied to each file on the desired parameters:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># First, some clean up\n<\/em>\n<strong>unlink(file.path(<\/strong>\"rds\", \"pooledSamples.rds\"<strong>))<\/strong>\n<strong>unlink(file.path(<\/strong>\"rds\", \"compensationMatrix.rds\"<strong>))<\/strong>\n\n<em># Compensate data\n<\/em>\n<strong>compensateData(<\/strong>\n    <strong>parametersToCompensate<\/strong> = parametersToKeep,\n    <strong>useCustomCompensationMatrix<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Of note, <code>rds<\/code> files will be updated with their compensated version.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>When the <strong><code>useCustomCompensationMatrix<\/code><\/strong> parameter is set to <code>FALSE<\/code>, the unaltered compensation matrices embedded in each FCS file will instead be applied.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"3.5\" style=\"font-size:22px\"><strong>3.5) Export data per parameter <\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Once transformation is applied and data are correctly compensated, the dataset structure needs to be modified: all the rds files are pooled and reexported to have only one rds file per cytometry parameter instead of one file per sample.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Consequently, each final rds file will contain a <code>FlowSet<\/code> object consisting of a collection of <code>FlowFrame<\/code> objects (one <code>FlowFrame<\/code> per sample). This dataset permutation step will ease the process of the future steps:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Export one rds file per parameter\n<\/em>\n<strong>exportPerParameter(<\/strong>\n    <strong>parametersToExport<\/strong> = parametersToKeep,\n    <strong>nCoresToExploit<\/strong> = 10\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Feel free to change the <code>nCoresToExploit<\/code> parameter to increase\/decrease the parallelization of data export, but be careful of the RAM consumption during this process!<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that the individual rds files for each sample will be deleted after the new rds files are exported.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"4\" style=\"font-size:28px\"><strong>4) Normalization<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">In order to correct for batch effects as well as unwanted inter-sample heterogeneity, <code>PICAFlow<\/code> also features a normalization step. Basically, the principle is very simple: first, we detect the peaks for each channel according to a reference value we give as input (1 to 3 peaks, usually). Then, we use transformation methods to align these peaks across all samples (\u00ab\u00a0low\u00a0\u00bb peaks must align together, and so on for the other one(s)). The impact of the transformation can be checked on density plots that are generated before and after the normalization.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">We strongly believe that such normalization approach is critical for cytometry data, as inter-group samples very often show a great heterogeneity, even if not particularly expected. This is mainly due to the sum of small variations during either the sample collection (fresh or thawed samples, quantity of blood\/tissue collected, operator, etc.), staining (number of stained cells, reagents quantity\/quality\/lot, operator) and\/or acquisition processes (lasers, cytometer \u00ab\u00a0cleanliness\u00a0\u00bb, operator, etc.). Unsupervised and semi-supervised analysis of unnormalized data, notably using dimensionality reduction methods (see further), could potentially lead to artefactual clusters and identification of cell populations based on groups\/conditions\/samples rather than actual biologically-relevant phenotypes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Consequently, even though the normalization step is actually not mandatory, we strongly recommand to users to follow it.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Noteworthily, other normalization methods (more oriented towards batch correction\/normalization) also exist, such as <strong><a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/31633883\/\"><code>CytoNorm<\/code><\/a>, <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/35361793\/\"><code>CyCombine<\/code><\/a> and <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/35177627\/\"><code>CytofIn<\/code><\/a> (which are all available as standalone R packages<\/strong>) and can be of interest in the case the approach we offer in <code>PICAFlow<\/code> is not performing well enough or if users rather prefer another method. But we have to warn you that because of the huge differences between the design of <code>PICAFlow<\/code> and these methods, they do not integrate properly with our workflow. Among these great differences, we can cite that, at this step, data treated with <code>PICAFlow<\/code> are exported as one rds file per parameter instead of one rds file per sample, which is not the case for the aforementioned methods. Plus, they also include features regarding data transformation, preprocessing and downsampling, which are not treated in the same order in our workflow. Together, these incompatibilities prevent us to easily add these methods directly in <code>PICAFlow<\/code> without modifying the whole data process and handling. That said, we are totally open to discuss with the respective developers of these packages to eventually provide users a way to only apply the actual normalization algorithms without any other intervention on data, in a way that could also be compatible with <code>PICAFlow<\/code>. Thank you for your understanding.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4.1\" style=\"font-size:22px\"><strong>4.1) Plot transformed signal densities<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The next step of the workflow is to generate a collection of PDF files showing the actual signal densities for each sample and parameter of the dataset:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Create density plots for each parameter and sample of the dataset\n<\/em>\n<strong>plotFacets(\n<\/strong>    <strong>parametersToPlot<\/strong> = parametersToKeep,\n    <strong>maxSamplesNbPerPage<\/strong> = 16,\n    <strong>folder<\/strong> = \"logicleTransformed\",\n    <strong>suffix<\/strong> = \"_raw\",\n    <strong>downsample_n<\/strong> = 25000,\n    <strong>nCoresToExploit<\/strong> = 10\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Generated plots (see <strong>Figure 2<\/strong> below) will be exported to <code>output &gt; 2_Normalization &gt; folder<\/code> directory.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a1913188&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a1913188\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1096\" height=\"1097\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1096px) 100vw, 1096px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_transformedData.png\" alt=\"\" class=\"wp-image-1381\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_transformedData.png 1096w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_transformedData-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_transformedData-1024x1024.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_transformedData-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_transformedData-768x769.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_transformedData-12x12.png 12w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 2 &#8211; Density plots showing the logicle-transformed <code>CD3_PETexasRed<\/code> parameter signal <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">As you can see on the <strong>Figure 2<\/strong> above, despite the fact that \u00ab\u00a0low\u00a0\u00bb and \u00ab\u00a0high\u00a0\u00bb peaks are approximately in the same values range, they clearly do not match and present both inter- and intra-group heterogeneity. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4.2\" style=\"font-size:22px\"><strong>4.2) Peaks analysis<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Afterwards, we want to analyze the peaks for each sample and parameter, to prepare the future normalization step. Practically, we only need to specify the empirically-determined peaks number (by simple visualization for instance) for each parameter (<code>max.lms.sequence<\/code>) to the <code><strong>analyzePeaks()<\/strong><\/code> function: <\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define the maximum number of peaks for each parameter\n<\/em>\nmax.lms.sequence = <strong>list(<\/strong>\n    2,\n    1,\n    2,\n    2,\n    2,\n    2,\n    2,\n    1,\n    1,\n    2,\n    2,\n    1,\n    1,\n    1,\n    3\n<strong>)<\/strong>\n\n<strong>names(<\/strong>max.lms.sequence<strong>)<\/strong> = parametersToKeep\n\n<em># Launch the peaks determination for each parameter and sample\n<\/em>\n<strong>analyzePeaks(<\/strong>\n    <strong>parametersToAnalyze<\/strong> = parametersToKeep,\n    <strong>max.lms.sequence<\/strong> = max.lms.sequence,\n    <strong>suffix<\/strong> = \"_raw\",\n    <strong>samplesToDelete<\/strong> = NULL,\n    <strong>nCoresToExploit<\/strong> = 10,\n    <strong>minpeakdistance<\/strong> = 150\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The function will export peaks information to <code><code>output &gt; 2_Normalization<\/code><\/code> directory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Normally, if your cells were correctly stained using an optimized protocol and if they were correctly acquired following the flow\/mass cytometer manufacturer&rsquo;s instructions, the automatic identification of peaks should <em>theoretically<\/em> be exact. But <code>PICAFlow<\/code> still allows users to manually edit these peaks in order to correct for wrongly-identified peaks, for instance in samples showing a staining profile which is rather different compared to the others.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">To do this, we can open the text files in the <code>peaks<\/code> directory (with Excel for instance) and manually edit the determined peaks (if needed). Please keep in mind that the overall objective of this manual tweaking is to make sure that peaks from the same intensity range will match each other after normalization as precisely as possible. For instance, in the case a parameter was identified as presenting two distincts peaks (\u00ab\u00a0low\u00a0\u00bb and \u00ab\u00a0high\u00a0\u00bb typically, like you can observe in the <strong>Figure 2<\/strong> above), all the determined and\/or manually edited \u00ab\u00a0low\u00a0\u00bb peaks will be considered as \u00ab\u00a0matching\u00a0\u00bb and thus will be aligned during the normalization step. The same process applies for all the determined and\/or manually edited \u00ab\u00a0high\u00a0\u00bb peaks.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Additionally, users should check for the following problems during the generation\/edition of the peaks data:<\/p>\n\n\n\n<ul style=\"font-size:16px\" class=\"wp-block-list\">\n<li>Two peaks should not be too close in terms of distance\/intensity<\/li>\n\n\n\n<li>NA-presenting rows should display their only peak at the right location (\u00ab\u00a0low\u00a0\u00bb of \u00ab\u00a0high\u00a0\u00bb typically)<\/li>\n<\/ul>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>It is more and more considered as good cytometry practice to include at least one common sample in each batch, in order to better identify then correct the newly introduced batch effects. The implemention of peaks edition in <code>PICAFlow<\/code> allows users to mix it with the use of common control samples. In this case, instead of aligning peaks based on all the samples, users could still use these control samples as references when editing the peaks.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Of note, users can rerun if desired the generation of density plots after the peaks analysis, because the function <code><strong>plotFacets()<\/strong><\/code> can automatically add for each sample vertical bars at the exact location of the determined peaks, only if the associated peaks files are present in the <code><code>output &gt; 2_Normalization &gt; peaks<\/code><\/code> folder (which should be the case if you computed the peaks analysis described in this section) and if the <code>plotPeaks<\/code> argument is set to <code>TRUE<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Create density plots for each parameter and sample of the dataset\n<\/em>\n<strong>plotFacets(\n<\/strong>    <strong>parametersToPlot<\/strong> = parametersToKeep,\n    <strong>maxSamplesNbPerPage<\/strong> = 16,\n    <strong>folder<\/strong> = \"logicleTransformedWithPeaks\",\n    <strong>suffix<\/strong> = \"_raw\",\n    <strong>downsample_n<\/strong> = 25000,\n    <strong>nCoresToExploit<\/strong> = 10,\n    <strong>plotPeaks<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4.3\" style=\"font-size:22px\"><strong>4.3) Compute peaks data for new samples but keep previously generated peaks data (optional)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If necessary, <code>PICAFlow<\/code> also provides a function called <code><strong>keepPreviousPeaksData()<\/strong><\/code> to keep previously computed\/edited peaks for a set of samples. This function is typically used when new samples are added to the dataset, but users do not want to change the already generated peaks values for the old samples. This can be achieved with the following procedure:<\/p>\n\n\n\n<ol style=\"font-size:16px\" class=\"wp-block-list\">\n<li>Create a <code>refs<\/code> directory within the <code>output &gt; 2_Normalization &gt; peaks<\/code> directory<\/li>\n\n\n\n<li>Copy the old peak files within this <code>refs<\/code> directory<\/li>\n\n\n\n<li>Run the <code><strong>analyzePeaks()<\/strong><\/code> function to find the peaks of your new samples<\/li>\n\n\n\n<li>Run the <code><strong>keepPreviousPeaksData()<\/strong><\/code> function to push back the peak values for the old samples in the newly generated peak files<\/li>\n<\/ol>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: New peak files will overwrite the ones present in the <code>output &gt; 2_Normalization &gt; peaks<\/code> directory.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note 2: If needed, a backup (before edition) of the peak files located in <code>output &gt; 2_Normalization &gt; peaks<\/code> can be found in the <code>output &gt; 2_Normalization &gt; peaks &gt; backup<\/code> directory.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note 3: Please keep in mind that peaks location is greatly dependent on the parameters used during data transformation. Therefore, if you decide to include new samples to an existing set of samples, you should consider to use the same transformation parameters for the new samples.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"4.4\" style=\"font-size:22px\"><strong>4.4) Fine tune the peaks interactively (optional)<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If you want to visually confirm and\/or precisely tune the peaks value that were determined previously, users can use the dedicated Shiny application tool for this purpose:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Fine tune the peaks using the Shiny application\n<\/em>\n<strong>fineTunePeaks()<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Once opened, it is very easy to use. On the left, one can find the visualization controls, allowing to choose the sample and the parameter to display. Below is shown the number of peaks previously set for the associated parameter as well as sliders (depending on the number of peaks) displaying the actual peak values at their predetermined positions. Each peak can be manually adjusted using the dedicated slider, and can also be unset (by using the dedicated checkbox) to define the peak value to <code>NA<\/code>, which can be useful if one given peak value is difficult to estimate. In this case, as designed by the workflow, the peak value for <code>NA<\/code> values will be set to the mean of the peaks in the same location that were effectively defined (different than <code>NA<\/code> values).<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: Any modification of any value directly affects the generated RDS file, located in the <code>ShinyApp_TunePeaks.rds<\/code> file located in the <strong><code>output &gt; 2_Normalization<\/code><\/strong> directory.<\/strong>  <\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">When the peaks inspection\/correction step is done, one can just close the Shiny application window and proceed with the next paragraph.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">To finish, edited peak values should be reexported to overwrite the original tabular text files located in the <code>output &gt; 2_Normalization &gt; peaks<\/code> directory. This can be achieved automatically by running the following command:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Reexport the tabular text files to take account of the potential new values previously edited\n<\/em>\n<strong>exportRDSPeaksDataToTabularTextFiles()<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">After this, you are ready to go to the next section.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: If one want to interrupt the <code>fineTunePeaks()<\/code> function in order to resume later, one should close the Shiny application, then absolutely call the <code>exportRDSPeaksDataToTabularTextFiles()<\/code> function to write new tabular text files containing the modifications already done. Otherwise, everything will be wiped because the <code>ShinyApp_TunePeaks.rds<\/code> file will be overwritten if the Shiny application is launched again.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4.5\" style=\"font-size:22px\"><strong>4.5) Synthesize peaks information<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Afterwards, a synthesis of the peaks number and values must be computed:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Summarize peaks information\n<\/em>\npeaksAnalysis = <strong>synthesizePeaks()<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Basically, this function creates a new list of three elements, each containing several information about the peaks:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\">The <code>info<\/code> element contains general information about the peaks, organized as a table with the following columns:\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\"><code>Parameter<\/code> (parameter name)<\/li>\n\n\n\n<li style=\"font-size:16px\"><code>PeaksNb<\/code> (number of peaks specified)<\/li>\n\n\n\n<li style=\"font-size:16px\"><code>PeakType<\/code> (which precise peak is described)<\/li>\n\n\n\n<li style=\"font-size:16px\"><code>Mean<\/code> (the mean of all the peaks for every sample)<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li style=\"font-size:16px\">The <code>best<\/code> element is a named list containing the mean peaks found for each parameter<\/li>\n\n\n\n<li style=\"font-size:16px\">The <code>raw<\/code> element stores the total peaks list for every parameter and sample<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">These information should be split into several variables to ease their future use: <\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Write peaks analysis information and extract values for the normalization step\n<\/em>\npeaksInfo = peaksAnalysis$info\n\n<strong>write.table(<\/strong>\n    <strong>x<\/strong> = peaksInfo,\n    <strong>file<\/strong> = <strong>file.path(<\/strong>\"output\", \"2_Normalization\", \"peaksAnalysisInformation_allSamples.txt\"<strong>)<\/strong>,\n    <strong>quote<\/strong> = FALSE,\n    <strong>col.names<\/strong> = TRUE,\n    <strong>row.names<\/strong> = FALSE,\n    <strong>sep<\/strong> = \"\\t\"\n<strong>)<\/strong>\n\nbestPeaksList = peaksAnalysis$best\ntotalPeaksList = peaksAnalysis$raw<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4.6\" style=\"font-size:22px\"><strong>4.6) Normalize data<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Finally, these information are integrated to the actual normalization process. For more convenience, the normalization step can be launched as a \u00ab\u00a0dry run\u00a0\u00bb test, without any values being output at the end (with the <code>try<\/code> argument set to <code>TRUE<\/code>).<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Launch a test normalization process\n<\/em>\nwarpSet_value = FALSE\ngaussNorm_value = TRUE\n\ntestNormalizationResult = <strong>normalizeData(<\/strong>\n    <strong>try<\/strong> = TRUE,\n    <strong>max.lms.sequence<\/strong> = max.lms.sequence,\n    <strong>suffix<\/strong> = \"_raw\",\n    <strong>base.lms.list<\/strong> = bestPeaksList,\n    <strong>warpSet<\/strong> = warpSet_value,\n    <strong>gaussNorm<\/strong> = gaussNorm_value,\n    <strong>samplesToDelete<\/strong> = NULL,\n    <strong>nCoresToExploit<\/strong> = 8,\n    <strong>custom.lms.list<\/strong> = totalPeaksList\n<strong>)<\/strong>\n\ntestNormalizationResult\n\n&#91;1] \"Parameter: Alexa Fluor 700-A =&gt; No error during normalization.\" \"Parameter: APC-A =&gt; No error during normalization.\"            \n &#91;3] \"Parameter: APC-Cy7-A =&gt; No error during normalization.\"         \"Parameter: BV421-A =&gt; No error during normalization.\"          \n &#91;5] \"Parameter: BV510-A =&gt; No error during normalization.\"           \"Parameter: BV605-A =&gt; No error during normalization.\"          \n &#91;7] \"Parameter: BV650-A =&gt; No error during normalization.\"           \"Parameter: BV711-A =&gt; No error during normalization.\"          \n &#91;9] \"Parameter: BV786-A =&gt; No error during normalization.\"           \"Parameter: FITC-A =&gt; No error during normalization.\"           \n&#91;11] \"Parameter: PE-A =&gt; No error during normalization.\"              \"Parameter: PE-Cy5-A =&gt; No error during normalization.\"         \n&#91;13] \"Parameter: PE-Cy5_5-A =&gt; No error during normalization.\"        \"Parameter: PE-Cy7-A =&gt; No error during normalization.\"         \n&#91;15] \"Parameter: PE-Texas Red-A =&gt; No error during normalization.\"<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The <code>testNormalizationResult<\/code> variable will contain a sentence summarizing for each parameter whether a normalization using the provided parameters will lead to an error (or not). This be can seen as a \u00ab\u00a0dry run\u00a0\u00bb, as the normalization is actually performed but without exporting nor overwriting anything. If everything seems to be alright, users can then proceed to the real normalization work, leading to the generation of new rds files:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Apply the real normalization process to the dataset\n<\/em>\nsamplesToDelete = NULL\n\n<strong>normalizeData(<\/strong>\n    <strong>try<\/strong> = FALSE,\n    <strong>max.lms.sequence<\/strong> = max.lms.sequence,\n    <strong>suffix<\/strong> = \"_raw\",\n    <strong>base.lms.list<\/strong> = bestPeaksList,\n    <strong>warpSet<\/strong> = warpSet_value,\n    <strong>gaussNorm<\/strong> = gaussNorm_value,\n    <strong>samplesToDelete<\/strong> = samplesToDelete,\n    <strong>nCoresToExploit<\/strong> = 8,\n    <strong>custom.lms.list<\/strong> = totalPeaksList\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4.7\" style=\"font-size:22px\"><strong>4.7) Plot normalized signal densities<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">When the process has been completed, users should export new density plots for the normalized signals (see <strong>Figure 3<\/strong> below) for every sample and parameter:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Create density plots for each parameter and sample of the dataset\n<\/em>\n<strong>plotFacets(\n<\/strong>    <strong>parametersToPlot<\/strong> = parametersToKeep,\n    <strong>maxSamplesNbPerPage<\/strong> = 16,\n    <strong>folder<\/strong> = \"gaussNormNormalized\",\n    <strong>suffix<\/strong> = \"_normalized\",\n    <strong>downsample_n<\/strong> = 25000,\n    <strong>nCoresToExploit<\/strong> = 10\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: at this point, if needed, users can either perform again the normalization step with some adjustments for the peak values, or even remove samples which do not normalize correctly using the <code>samplesToDelete<\/code> argument.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note 2: please remember that if any peak value is modified, users should always follow the instructions provided in the 4.4) section before launching a new normalization process with the <code>normalizeData()<\/code> function. Otherwise, the modified peak values will not be taken into account during the new normalization process.<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a1916628&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a1916628\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_normalizedData.png\" alt=\"\" class=\"wp-image-1382\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_normalizedData.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_normalizedData-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_normalizedData-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_normalizedData-768x768.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_normalizedData-12x12.png 12w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 3 &#8211; Density plots showing the normalized <code>CD3_PETexasRed<\/code> parameter signal (click on the image to open in fullscreen).<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">As you can see here, peaks were correctly realigned to make respectively all the \u00ab\u00a0low\u00a0\u00bb and \u00ab\u00a0high\u00a0\u00bb peaks match together.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"4.8\" style=\"font-size:22px\"><strong>4.8) Merge normalized data<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Afterwards, users should merge all the normalized rds files into a single rds file, which will be used for the subsequent steps. We also recommend to purge unused RAM after the exportation:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Merge all rds files to only one and clean RAM\n<\/em>\n<strong>mergeParameters(\n<\/strong>    <strong>suffix<\/strong> = \"_normalized\"\n<strong>)<\/strong>\n\n<strong>gc()<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This will output a single rds file named <code>2_Normalized.rds<\/code> in the <code>rds<\/code> directory.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that after the merged rds exportation, this function will delete all the individual <code>*_raw<\/code> and <code>*_normalized<\/code> rds files.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"5\" style=\"font-size:28px\"><strong>5) Gating<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The next major step of the workflow is to gate on cells of interest, if applicable. To do this, <code>PICAFlow<\/code> includes features that will allow to isolate cells of interest based on expression of selected markers. First, users have to import the content of the <code>2_Normalized.rds<\/code> file located in the <code>rds<\/code> directory then prepare the lists that will contain all the useful information about the future gates:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Load ungated data and prepare lists that will contain future gate information\n<\/em>\ndataUngated = <strong>readRDS(<\/strong>\n    <strong>file<\/strong> = <strong>file.path(<\/strong>\"rds\", \"2_Normalized.rds\"<strong>)<\/strong>\n<strong>)<\/strong>\n\ntotalStats = <strong>list()<\/strong>\ntotalGatingParameters = <strong>list()<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If necessary, users can have a reminder of the markers present in this dataset:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Recall all channel names within the dataset\n<\/em>\n<strong>getAllChannelsInformation(<\/strong><strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, the principle is simple, even if it needs a lot of variables to be defined. Concretely, the <code><strong>gateData()<\/strong><\/code> function needs some arguments to be specified in order to correctly create, compute and apply the desired gate:<\/p>\n\n\n\n<ul style=\"font-size:16px\" class=\"wp-block-list\">\n<li><code>gateName_value<\/code> contains the user-friendly name of the current gate<\/li>\n\n\n\n<li><code>xParameter_value<\/code> contains the parameter name to be used on the x axis<\/li>\n\n\n\n<li><code>yParameter_value<\/code> contains the parameter name to be used on the y axis<\/li>\n\n\n\n<li><code>xlim_value<\/code> contains a vector of size 2 detailing the x values to use as limits for data display<\/li>\n\n\n\n<li><code>ylim_value<\/code> contains a vector of size 2 detailing the y values to use as limits for data display<\/li>\n\n\n\n<li><code>samplesToUse_value<\/code> contains a vector detailing the samples to use for direct displaying<\/li>\n\n\n\n<li><code>samplesPerPage_value<\/code> contains the number of samples to plot per PDF page<\/li>\n\n\n\n<li><code>inverseGating_value<\/code> contains a boolean defining is the current gate should be inclusive (<code>FALSE<\/code>) or exclusive\/inverted (<code>TRUE<\/code>)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Each of these values is contained in a single list named <code>totalGatingParameters<\/code>, where each new gate will be a new element of the list.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">One has the possibility to extract the 2nd and 99th percentiles of a given parameter distribution, to pre-determine values for feeding the <code><strong>gateData()<\/strong><\/code> function:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Extract the 2nd and 99th percentiles for the FSC-A parameter distribution from the 1st sample<br><\/em><br><strong>getParameterLimits(flowset<\/strong> = totalFlowset, <strong>sample<\/strong> = 1, <strong>parameter<\/strong> = \"FSC-A\"<strong>)<\/strong><br><br><em># Extract the 2nd and 99th percentiles for the SSC-A parameter distribution from the 1st sample<br><\/em><br><strong>getParameterLimits(flowset<\/strong> = totalFlowset, <strong>sample<\/strong> = 1, <strong>parameter<\/strong> = \"SSC-A\"<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">For our example, we want to gate on B cells, so we want to define these variables to the following values:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code>totalStats = <strong>list()<\/strong>\ntotalGatingParameters = <strong>list()<\/strong>\n\n<em># Define gating elements for the current gate<\/em>\n\ngateName_value = \"Bcells\"\n\ntotalGatingParameters&#91;&#91;gateName_value]] = <strong>list(<\/strong>\n    <strong>xParameter_value<\/strong> = \"CD19_BV605\",\n    <strong>yParameter_value<\/strong> = \"CD3_PETexasRed\",\n    <strong>xlim_value<\/strong> = <strong>c(<\/strong>0, 4<strong>)<\/strong>,\n    <strong>ylim_value<\/strong> = <strong>c(<\/strong>0, 4<strong>)<\/strong>,\n    <strong>samplesToUse_value<\/strong> = <strong>c(<\/strong>1:6<strong>)<\/strong>,\n    <strong>samplesPerPage_value<\/strong> = 6,\n    <strong>inverseGating_value<\/strong> = FALSE,\n    <strong>gateName_value<\/strong> = gateName_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, we have to run an interactive R Shiny application implemented in the <code><strong>gateData()<\/strong><\/code> function to create the polygonal gate we want and display it on some selected samples:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Interactively generate a gate, then apply it to the samples, without subset nor PDF plots exportation<\/em>\n\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = dataUngated,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = FALSE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = NULL,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 0.01\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Please note that both <code>subset<\/code> and <code>exportAllPlots<\/code> parameters are set to <code>FALSE<\/code> in this first visualization step.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, if the actual graphs and gate seem correct, one can apply the gate to all the samples, by exporting PDF files allowing to visualize how the gate is positioned (see <strong>Figure 4<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Apply the gate to the samples, without subset but with PDF plots exportation\n<\/em>\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = dataUngated,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 0.01\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Notice here how the <code>subset<\/code> argument is still set to <code>FALSE<\/code> but the <code>exportAllPlots<\/code> one is now set to <code>TRUE<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The subsequent PDF files are then exported to <code>output &gt; 3_Gating &gt; x=xParameter_value_y=yParameter_value<\/code> subdirectory.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a1918691&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a1918691\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"674\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_gating-1024x674.png\" alt=\"\" class=\"wp-image-1380\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_gating-1024x674.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_gating-300x197.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_gating-768x506.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_gating-1536x1011.png 1536w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_gating-18x12.png 18w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_gating.png 1577w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 4 &#8211; A B cells-restricted gate on 6 samples from the dataset<\/strong> <strong>(click on the image to open in fullscreen)<\/strong>.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that the <code>gateData()<\/code> function also provides a <code>customBinWidth<\/code> argument which allows to manually tune the bin width used during the generation of X\/Y plots. The default value could sometimes lead to very few displayed squares, which is likely to be unreadable. Please read the documentation of the <code>gateData()<\/code> function to learn more about this argument and how to properly set its value.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If needed, we can even change the gate independently and iteratively for specific samples if it does not fit well thanks to the <code>redrawGate<\/code> and <code>specificGates<\/code> arguments of the <code><strong>gateData()<\/strong><\/code> function:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Redraw the gate iteratively for some selected samples\n<\/em>\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = dataUngated,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = TRUE,\n    <strong>specificGates<\/strong> = c(11, 13, 14),\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 0.01\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Finally, if one approves the way the gate applies to every sample, one can proceed with the real application of the gate to the samples, where a gated <code>flowSet<\/code> (named <code>Gate1<\/code> here) is returned with additional information:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Apply the gate to the samples<\/em>\n\nGate1 = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = dataUngated,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = TRUE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = associatedInfos$generatedGates,\n    <strong>customBinWidth <\/strong>= 0.01\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Notice how both <code>subset<\/code> and <code>exportAllPlots<\/code> arguments are now set to <code>TRUE<\/code>, which lead to the attribution of the <code><strong>gateData()<\/strong><\/code> function result to a variable, here <code>Gate1<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">One can easily notice here that a single gate can be successfully applied to all the samples of the dataset (or at least a vast majority). This is one consequence of the previously applied data normalization step, which greatly helps to reduce inter- and intra-sample unwanted heterogeneity.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Next, we add the information about the generated gates in the <code>totalGatingParameters<\/code> list:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code>totalGatingParameters&#91;&#91;gateName_value]]$generatedGates = Gate1$generatedGates<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, there is only to add to the <code>totalStats<\/code> list the related statistics information saved in the <code>summary<\/code> element of <code>Gate1<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code>totalStats&#91;&#91;gateName_value]] = Gate1$summary<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">After all that, the <code>Gate1<\/code> list can finally be replaced with the <code>flowSet<\/code> content of <code>Gate1<\/code>:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Replace Gate1 with the flowset content of Gate1\n<\/em>\nGate1 = Gate1$flowset<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">From this point, users have 2 choices: either continuing the workflow if only 1 gate is needed, or perform again the gating process with another gate (either parallel to or within the cells that remained after the first gating). To achieve this, there is only to duplicate the code from above and name the <code>Gate1<\/code> variable to <code>Gate2<\/code>, and so on:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Do not directly run this code! It is supplied only for illustration and needs to be adapted for the current dataset\n<\/em>\n<em># Please note that the flowset argument can actually be set to any gate already performed (Gate1 for instance), if necessary\n<\/em>\n<em># Define gating elements for the current gate\n<\/em>\ngateName_value = \"XXX\"\n\ntotalGatingParameters&#91;&#91;gateName_value]] = <strong>list(<\/strong>\n    <strong>xParameter_value<\/strong> = \"XXX_XXX\",\n    <strong>yParameter_value<\/strong> = \"XXX_XXX\",\n    <strong>xlim_value<\/strong> = <strong>c(<\/strong>X, X<strong>)<\/strong>,\n    <strong>ylim_value<\/strong> = <strong>c(<\/strong>X, X<strong>)<\/strong>,\n    <strong>samplesToUse_value<\/strong> = <strong>c(<\/strong>1:6<strong>)<\/strong>,\n    <strong>samplesPerPage_value<\/strong> = 6,\n    <strong>inverseGating_value<\/strong> = FALSE,\n    <strong>gateName_value<\/strong> = gateName_value\n<strong>)<\/strong>\n\n<em># Interactively generate a gate, then apply it to the samples, without subset nor PDF plots exportation<\/em>\n\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = totalFlowset,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = FALSE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = NULL,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 0.01\n<strong>)<\/strong>\n\n<em># Apply the gate to the samples, without subset but with PDF plots exportation\n<\/em>\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = totalFlowset,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 0.01\n<strong>)<\/strong>\n\n<em># Redraw the gate iteratively for some selected samples\n<\/em>\nassociatedInfos = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = totalFlowset,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = FALSE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = TRUE,\n    <strong>specificGates<\/strong> = c(X),\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = NULL,\n    <strong>customBinWidth <\/strong>= 0.01\n<strong>)<\/strong>\n\n<em># Apply the gate to the samples<\/em>\n\nGate2 = <strong>gateData(<\/strong>\n    <strong>flowset<\/strong> = totalFlowset,\n    <strong>sampleToPlot<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$samplesToUse_value,\n    <strong>xParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xParameter_value,\n    <strong>yParameter<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$yParameter_value,\n    <strong>xlim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$xlim_value,\n    <strong>ylim<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$ylim_value,\n    <strong>inverseGating<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$inverseGating_value,\n    <strong>gateName<\/strong> = totalGatingParameters&#91;&#91;gateName_value]]$gateName_value,\n    <strong>subset<\/strong> = TRUE,\n    <strong>exportAllPlots<\/strong> = TRUE,\n    <strong>redrawGate<\/strong> = FALSE,\n    <strong>specificGates<\/strong> = NULL,\n    <strong>gatingset<\/strong> = associatedInfos$gatingset,\n    <strong>generatedGates<\/strong> = associatedInfos$generatedGates,\n    <strong>customBinWidth <\/strong>= 0.01\n<strong>)<\/strong>\n\ntotalGatingParameters&#91;&#91;gateName_value]]$generatedGates = Gate2$generatedGates\ntotalStats&#91;&#91;gateName_value]] = Gate2$summary\nGate2 = Gate2$flowset<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">When users have completed the gating strategy process, there is only to export the whole gating parameters used as well as the generated gating statistics:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Export gating parameters and statistics\n<\/em>\n<strong>saveRDS(<\/strong>\n    totalGatingParameters,\n    <strong>file.path(<\/strong>\"output\", \"3_Gating\", \"gatingParameters.rds\"<strong>))<\/strong>\n\n<strong>exportGatingStatistics(<\/strong>\n    <strong>totalStats<\/strong> = totalStats,\n    <strong>filename<\/strong> = \"gatingStatistics\"\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">These functions will respectively output a rds and a text file in <code>output &gt; 3_Gating<\/code> directory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Finally, the only thing to do now is to save the most subgated <code>GateX<\/code> variable (<code>Gate1<\/code> in our example, but it can also be what users actually desire), clean up the useless variables and then proceed to the next steps:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Export the gated rds file and clean up\n<\/em>\n<strong>saveRDS(<\/strong>\n    <strong>object<\/strong> = Gate1,\n    <strong>file<\/strong> = <strong>file.path(<\/strong>\"rds\", \"3_Gated.rds\"<strong>)<\/strong>\n<strong>)<\/strong>\n\n<strong>rm(<\/strong>Gate1<strong>)<\/strong>\n<strong>rm(<\/strong>totalStats<strong>)<\/strong>\n<strong>rm(<\/strong>dataUngated<strong>)<\/strong>\n<strong>rm(<\/strong>totalGatingParameters<strong>)<\/strong>\n<strong>unlink(file.path(<\/strong>\"rds\", \"globalGate_coordinates.rds\"<strong>))<\/strong>\n<strong>unlink(file.path(<\/strong>\"rds\", \"specialGate_coordinates.rds\"<strong>))<\/strong>\n<strong>gc()<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that to ensure the continuity of the workflow, we recommend to save the final rds file named as <code>3_Gated.rds<\/code> in the <code>rds<\/code> directory.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"6\" style=\"font-size:28px\"><strong>6) Downsample, rescale and split data<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"6.1\" style=\"font-size:22px\"><strong>6.1) Downsample and rescale data<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The next step of the workflow is to create a downsampled dataset out of the full dataset, which will serve as input for the upcoming UMAP dimensionality reduction analysis. This will ensure that any dysbalance between groups (regarding their number of samples) and\/or cell numbers per sample will not interfere with the dimensionality reduction approach that will be performed in the next step. More precisely, in the downsampled dataset, all groups should contribute equally, and each sample should contribute equally within a given group.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that this step also transforms the dataset from a <code>flowSet<\/code> contained in a rds file to actual tabular data embedded in a rds file.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Before proceeding to the downsampling, the dataset must be loaded and several information such as the group for each sample and the final parameters to keep must be defined, as well as other parameters that will be used by the <code><strong>poolData()<\/strong><\/code> function:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Load data\n<\/em>\ndata = <strong>readRDS(<\/strong>\n    <strong>file<\/strong> = <strong>file.path(<\/strong>\"rds\", \"3_Gated.rds\"<strong>)<\/strong>\n<strong>)<\/strong>\n\n<em># Regular expression-assisted groups extraction from sample names\n<\/em>\nextractedGroups = <strong>gsub(<\/strong>\"^(.+)_Group-(.+)_Sample-(.+)$\", \"\\\\2\", data@phenoData@data$name<strong>)<\/strong>\n\n<em># Recall the marker names\n<\/em>\n<strong>flowCore::markernames(<\/strong>data<strong>)<\/strong>\n\n<em># Define the markers that will remain in the pooled dataset\n<\/em>\nparametersToKeepFinal = <strong>c(<\/strong>\n    \"FCRL5_APC\",\n    \"CD24_APCCy7\",\n    \"CD38_AlexaFluor700\",\n    \"CXCR5_BV421\",\n    \"IgD_BV510\",\n    \"G6_BV650\",\n    \"FCRL3_BV711\",\n    \"CD27_BV786\",\n    \"CD95_FITC\",\n    \"IgM_PE\",\n    \"CD11c_PECy5\",\n    \"Tbet_PECy55\",\n    \"CD21_PECy7\"<strong>)<\/strong>\n\n<em># Define the values that will be used for data downsampling\n<\/em>\ndownsampleMinEvents_value = 20000\nmaxCellsNb = 750000\nestimateThreshold_value = FALSE\n\n<em># Perform data downsampling and restructuration\n<\/em>\npoolData = <strong>poolData(<\/strong>\n    <strong>flowSet<\/strong> = data,\n    <strong>groupVector<\/strong> = extractedGroups,\n    <strong>parametersToKeep<\/strong> = parametersToKeepFinal,\n    <strong>downsampleMinEvents<\/strong> = downsampleMinEvents_value,\n    <strong>rescale<\/strong> = TRUE,\n    <strong>rescale_min<\/strong> = 1,\n    <strong>rescale_max<\/strong> = 10,\n    <strong>maxCellsNb<\/strong> = maxCellsNb,\n    <strong>estimateThreshold<\/strong> = estimateThreshold_value,\n    <strong>coresNumber<\/strong> = 2\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Concretely, this function splits the samples into groups (here based on their respective name), extracts the desired parameters of interest, downsamples the dataset, rescales the data and restructures it to a tabular form.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: Please note that a cell which is not selected to be part of the downsampled dataset is NOT really discarded. The notion of downsampling is simply incarnated by a flag within the final data table, which states if the cell is part of the downsampled dataset or not.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note 2: The <strong><code>downsampleMinEvents<\/code><\/strong> of the <code>poolData()<\/code> function represents the minimum number of cells required for a sample to be included in the downsampled dataset. This value is totally unrelated to the <code>downsample<\/code> argument of the <code>determineOptimalUMAPParameters()<\/code> function that you will see in the 7.2) section, which rather <strong>represents the number of cells to pick from the downsampled dataset in order to determine the best <strong><code>n_neighbors<\/code><\/strong> and <code>n_dist<\/code> UMAP-related hyperparameters<\/strong>. These values can of course be set independently from each other.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note 3: Users also have the possibility to let the function estimate the best appropriated threshold for the <code>downsampleMinEvents<\/code> parameter, by setting the <code>estimateThreshold<\/code> argument to <code>TRUE<\/code>. This way, the function will not output the dataset, but will rather export a PDF file named <code>CutThreshold-vs-FinalDatasetCellNumber.pdf<\/code> in <code>output &gt; 4_Downsampling<\/code> directory showing the final number of cells obtained in the downsampled dataset according to the <code>downsampleMinEvents<\/code> threshold used. Once an appropriated threshold is selected, users can specificy it to the <code>poolData()<\/code> function and set the <code>estimateThreshold<\/code> argument to <code>FALSE<\/code>.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note 4: Rescaling is performed independently for each parameter by applying the following function: <code>f(x) = (((rescale_max - rescale_min)\/(max_old - min_old)) * (x - max_old)) + rescale_max<\/code> where <code>rescale_min<\/code> and <code>rescale_max<\/code> can be adjusted by users. If needed (but this is not recommended), rescaling can be disabled by setting the <code>rescale<\/code> argument to <code>FALSE<\/code>. Actually, the rescaling was implemented to prevent the highly expressed parameters to overwhelm the lowly expressed ones just because of their magnitude<\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">When the downsampled data is finally generated, there only is to save it, as well as the associated downsampling log, and clean up the RAM:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Export downsampling data and log\n<\/em>\n<strong>exportDownsamplingOutput(<\/strong>\n    <strong>poolData<\/strong> = poolData\n<strong>)<\/strong>\n\n<em># Clean up\n<\/em>\n<strong>rm(<\/strong>data<strong>)<\/strong>\n<strong>rm(<\/strong>poolData<strong>)<\/strong>\n<strong>gc()<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"6.2\" style=\"font-size:22px\"><strong>6.2) Split data (optional)<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">At this point, users have the possibility to generate 2 subdatasets from the downsampled dataset. Of note, this step if not mandatory. It is typically used when one wants to generate distinct training and validation datasets for downstream analysis:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define the proportion of the training dataset\n<\/em>\ntrainingDatasetProportion_value = 0.75\n\n<em># Split the full downsampled dataset\n<\/em>\n<strong>splitDataset(<\/strong>\n    <strong>trainingDatasetProportion<\/strong> = trainingDatasetProportion_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that this function does not directly split the samples between the two subdatasets, but rather the cells themselves. It means that at the end, each subdataset will have the same number of samples as compared to the original dataset, but will not contain the same cells: some will be attributed to the training subdataset, whereas the others will be attributed to the validation subdataset, according to the specified <code>trainingDatasetProportion<\/code> argument (representing the frequency of the training dataset between 0 and 1).<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"7\" style=\"font-size:28px\"><strong>7) UMAP dimensionality reduction<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"7.1\" style=\"font-size:22px\"><strong>7.1) Initial preparation<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The next step to perform is the dimensionality reduction analysis, here using the UMAP algorithm. Users can select which downsampled dataset to use (<code>datasetToUse = full<\/code> for the full downsampled dataset, <code>datasetToUse = training<\/code> for the training downsampled subdataset or <code>datasetToUse = validation<\/code> for the validation downsampled subdataset):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define the dataset to use for UMAP computation\n<\/em>\ndatasetToUse_value = \"full\"\n\n<em># Open the selected dataset\n<\/em>\ndata = <strong>openDownsampledData(<\/strong>\n    <strong>datasetToUse<\/strong> = datasetToUse_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, the cells that were labelled as downsampled are separated from the other ones:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Split downsampled and not downsampled cells\n<\/em>\ndataSampled = data&#91;data$state == \"sampled\",]\ndataNotSampled = data&#91;data$state == \"notSampled\",]\n\n<em># Create a new variable from downsampled cells\n<\/em>\ndataSampled_umap = dataSampled\n\n<em># Clean up\n<\/em>\n<strong>rm(<\/strong>data<strong>)<\/strong>\n<strong>gc()<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"7.2\" style=\"font-size:22px\"><strong>7.2) Determination of optimal UMAP hyperparameters<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The first UMAP step will be performed several times on the downsampled cells (or even on a subset of these cells, incarnated by the <code>downsample_number<\/code> argument, if users think they are too numerous), using different values of <code>n_neighbors<\/code> and <code>min_dist<\/code> UMAP-specific hyperparameters. Users can give several values of each argument as input and the resulting UMAP plots will be output in a PDF file (see <strong>Figure 5<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define the minimum number of cells each sample must have to be included in the downsampled dataset\n<\/em>\ndownsample_number = 20000\n\n<em># Define UMAP hyperparameters to test\n<\/em>\nnNeighborsToTest = <strong>c(<\/strong>\n    <strong>round(<\/strong>downsample_number\/256<strong>)<\/strong>,\n    <strong>round(<\/strong>downsample_number\/128<strong>)<\/strong>,\n    <strong>round(<\/strong>downsample_number\/64<strong>)<\/strong>,\n    <strong>round(<\/strong>downsample_number\/32<strong>)<\/strong>,\n    <strong>round(<\/strong>downsample_number\/16<strong>)<\/strong>\n<strong>)<\/strong>\n\nminDistToTest = <strong>c(<\/strong>0, 0.1, 0.2<strong>)<\/strong>\n\n<em># Run UMAP analysis on every pair of defined hyperparameters\n<\/em>\n<strong>determineOptimalUMAPParameters(<\/strong>\n    <strong>data<\/strong> = dataSampled,\n    <strong>nNeighborsToTest<\/strong> = nNeighborsToTest,\n    <strong>minDistToTest<\/strong> = minDistToTest,\n    <strong>downsample<\/strong> = downsample_number,\n    <strong>datasetFolder<\/strong> = datasetToUse_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a191bc2c&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a191bc2c\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"611\" height=\"1024\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 611px) 100vw, 611px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAP_parametersSelection-611x1024.png\" alt=\"\" class=\"wp-image-1383\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAP_parametersSelection-611x1024.png 611w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAP_parametersSelection-179x300.png 179w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAP_parametersSelection-7x12.png 7w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAP_parametersSelection.png 703w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 5 &#8211; UMAP plots to determine the optimal values of <code>n_neighbors<\/code> and <code>min_dist<\/code> parameters <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: Specified <strong><code>n_neighbors<\/code> and <code>min_dist<\/code><\/strong> values in the previous code is for illustration only. Users can actually specify any values they want to test. As there are no fixed values that will suit all datasets, users are invited to read the <a href=\"https:\/\/umap-learn.readthedocs.io\/en\/latest\/parameters.html\">related documentation of the <code>UMAP<\/code> package<\/a> in order to understand what these values refer to and how to set them appropriately. <\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Users should finally open the subsequent PDF and manually choose the couple of <code>n_neighbors<\/code> and <code>min_dist<\/code> hyperparameters which leads to the best\/clearest separation of cells. Regarding the example provided here, we can see that increasing the <code>n_neighbors<\/code> value (from top to bottom) does not seem to greatly affect the UMAP output. Therefore, we arbitrarily choose <code>n_neighbors = 312<\/code> here (the middle value of the ones tested). Likewise, increasing the <code>min_dist<\/code> value (from left to right) seems to sightly increase the spreading of the cells on the 2D space, which may allow to better distinguish the cell clusters. Therefore, we arbitrarily choose <code>min_dist = 0.2<\/code> here.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If necessary, users can indeed relaunch the previous code with other <code>n_neighbors<\/code> and <code>min_dist<\/code> values to better refine them.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: The <strong><strong><code>downsample<\/code> argument of the <code>determineOptimalUMAPParameters()<\/code> function<\/strong><\/strong> represents the number of cells to pick from the downsampled dataset in order to determine the best <strong><strong><code>n_neighbors<\/code><\/strong><\/strong> and <code>n_dist<\/code> UMAP-related hyperparameters. This value is totally unrelated to the <code>downsampleMinEvents<\/code> argument from the <code>poolData()<\/code> function that you already saw in the 6.1) section, which rather <strong>represents the minimum number of cells required for a sample to be included in the downsampled dataset<\/strong>. These values can of course be set independently from each other.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"7.3\" style=\"font-size:22px\"><strong>7.3) Generate UMAP model on downsampled dataset<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Once users have selected the optimal <code>n_neighbors<\/code> and <code>min_dist<\/code> hyperparameters, we can proceed to the \u00ab\u00a0definitive\u00a0\u00bb UMAP computation on the whole downsampled subdataset:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define the number of cores to use\n<\/em>\ncoresNumber = 8\n\n<em># Define the actual UMAP parameters to use\n<\/em>\nmin_dist_value = 0.2\nn_neighbors_value = 78\n\n<em># Apply UMAP dimensionality reduction to the whole downsampled dataset\n<\/em>\ndataSampled_umap_out = <strong>UMAP_downsampledDataset(<\/strong>\n    <strong>data<\/strong> = dataSampled,\n    <strong>n_threads<\/strong> = coresNumber,\n    <strong>min_dist<\/strong> = min_dist_value,\n    <strong>n_neighbors<\/strong> = n_neighbors_value\n<strong>)<\/strong>\n\n<em># Extract UMAP embeddings and add them to dataSampled\n<\/em>\ndataSampled_umap$UMAP_1 = dataSampled_umap_out$embedding&#91;, 1]\ndataSampled_umap$UMAP_2 = dataSampled_umap_out$embedding&#91;, 2]\n\n<em># Clean up\n<\/em>\n<strong>rm(<\/strong>dataSampled<strong>)<\/strong>\n<strong>gc()<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"7.4\" style=\"font-size:22px\"><strong>7.4) Apply UMAP model to the remaining data<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Afterwards, users should apply the UMAP model previously generated on the downsampled subdataset to the remaining cells:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Apply UMAP model to the remaining data\n<\/em>\ndataNotSampled = <strong>UMAPFlowset(<\/strong>\n    <strong>data<\/strong> = dataNotSampled,\n    <strong>model<\/strong> = dataSampled_umap_out,\n    <strong>chunksMaxSize<\/strong> = 500000,\n    <strong>n_threads<\/strong> = coresNumber\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This step allows to generate UMAP embeddings for all the cells of the dataset, regardless of its size, thus avoiding to lose precious information about the numerous remaining cells.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"7.5\" style=\"font-size:22px\"><strong>7.5) Combine and export UMAP data<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Finally, one can merge the UMAP embeddings obtained from the downsampled and not downsampled subdatasets and save the result in a single rds file:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Combine the UMAP data obtained from downsampled and not downsampled dataset\n<\/em>\ndataTotal_umap = <strong>rbind(<\/strong>dataSampled_umap, dataNotSampled<strong>)<\/strong>\n<strong>rm(<\/strong>dataSampled_umap<strong>)<\/strong>\n<strong>rm(<\/strong>dataNotSampled<strong>)<\/strong>\n<strong>gc()<\/strong>\n\n<strong># Export resulting UMAP data\n<\/strong>\n<strong>saveRDS(<\/strong>\n    <strong>object<\/strong> = dataTotal_umap,\n    <strong>file<\/strong> = <strong>file.path(<\/strong>\"rds\", <strong>paste(<\/strong>\"5_UMAP_\", datasetToUse_value, \".rds\", <strong>sep<\/strong> = \"\"<strong>))<\/strong>\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that to ensure the continuity of the <code>PICAFlow<\/code> workflow, the resulting rds file should be exported as <code>5_UMAP_\"datasetToUse\".rds<\/code> in the <code>rds<\/code> directory.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"8\" style=\"font-size:28px\"><strong>8) Export FCS files<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Here, users have the possibility to export fresh FCS files containing the normalized data as well as other information such as UMAP parameters, the belonging to the UMAP training subset or not, etc.:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Export new FCS files with added parameters\n<\/em>\n<strong>exportFCS(<\/strong>\n    <strong>data<\/strong> = dataTotal_umap,\n    <strong>datasetFolder<\/strong> = datasetToUse_value\n<strong>)<\/strong>\n\n<em># Clean up\n<\/em>\n<strong>rm(<\/strong>dataTotal_umap<strong>)<\/strong>\n<strong>gc()<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This function will output FCS files in three ways simultaneously: the full dataset in either one file for the entire dataset or one file per sample or one file per group. It also exports a text file containing the correspondence between sample IDs and actual sample names and groups. All these files are saved to <code>output &gt; 6_FCS<\/code> directory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Basically, users have the opportunity to stop the workflow here and, for instance, switch to more conventional third-party cytometry analysis software if they think unsupervised or semi-supervised determination of cell clusters is not of interest for their study. However, we recommand the users to follow the next steps of the <code>PICAFlow<\/code> workflow in order to unravel the full potential of their dataset, which includes: cell clustering, identification\/discovery of unknown phenotypes, statistical analysis, visual representations, etc.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"9\" style=\"font-size:28px\"><strong>9) Clustering<\/strong><\/h2>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"9.1\" style=\"font-size:22px\"><strong>9.1) Test parameters normality<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Before performing cell clustering, users need to define which metric (mean or median) to use for the upcoming analyses and (future) cell clustering. To this, we implemented a simple function which allows to test the normality of the data distribution for each parameter:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Open the dataset of interest\n<\/em>\ndata = <strong>readRDS(<\/strong>\n    <strong>file<\/strong> = <strong>file.path(<\/strong>\"rds\", <strong>paste(<\/strong>\"5_UMAP_\", datasetToUse_value, \".rds\", <strong>sep<\/strong> = \"\"<strong>))<\/strong>\n<strong>)<\/strong>\n\n<em># Check the normality of each parameter signal distribution\n<\/em>\n<strong>testDatasetNormality(<\/strong>\n    <strong>data<\/strong> = data,\n    <strong>parametersToAnalyze<\/strong> = parametersToKeepFinal,\n    <strong>datasetFolder<\/strong> = datasetToUse_value\n)\n\n                   ShapiroWilk_statistic                                                ShapiroWilk_pValue isDistributionNormal\nFCRL5_APC                      0.9744255 0.000000000000000000000000000049624313831946244490386788150715347                   No\nCD24_APCCy7                    0.9710242 0.000000000000000000000000000001195606518319042770991805779701167                   No\nCD38_AlexaFluor700             0.9960941 0.000000000300119648778022816764873836881122315389802679419517517                   No\nCXCR5_BV421                    0.9303355 0.000000000000000000000000000000000000000000253665817914606104167                   No\nIgD_BV510                      0.9674000 0.000000000000000000000000000000032206769080820776480263839536278                   No\nG6_BV650                       0.8649523 0.000000000000000000000000000000000000000000000000000002119560403                   No\nFCRL3_BV711                    0.9797110 0.000000000000000000000000037452476868805047592776713560880352816                   No\nCD27_BV786                     0.9760377 0.000000000000000000000000000332012292779698060547072246961874953                   No\nCD95_FITC                      0.9219721 0.000000000000000000000000000000000000000000003975458606023709201                   No\nIgM_PE                         0.8410288 0.000000000000000000000000000000000000000000000000000000002491257                   No\nCD11c_PECy5                    0.9629720 0.000000000000000000000000000000000582435920560752426510323087605                   No\nTbet_PECy55                    0.9513738 0.000000000000000000000000000000000000074980961372048744078069871                   No\nCD21_PECy7                     0.9215428 0.000000000000000000000000000000000000000000003243880502481798542                   No\n<\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that if a lot of parameters do not follow a normal distribution, it can be worthy to perform the following analyses on the median instead of the mean, the first one being more robust when normality assumptions is not met and\/or if extreme values are abundant. This choice is actually incarnated by the <code>metricToUse_value<\/code> variable which should be equal to either <code>mean<\/code> or <code>median<\/code>. This variable will be used in a lot of subsequent analyses from sections 9) and 10).<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Choose the adequate metric to use in further analyses (either <\/em>`<em>mean<\/em>`<em> or <\/em>`<em>median<\/em>`<em>) <\/em>\n\nmetricToUse_value = \"median\"<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"9.2\" style=\"font-size:22px\"><strong>9.2) Apply a clustering method<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This step allows to discover cell clusters within the dataset using different approaches implemented in <code>PICAFlow<\/code>. Here, users are totally free to choose any method they like among the ones that are actually implemented wihtin <code>PICAFlow<\/code>. At the moment, we offer to possibility to use either our own approach, FlowSOM or PhenoGraph methods, but others could eventually be added if needed.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">As a consequence, users should follow only one clustering method among the ones described in the 9.2 section.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"9.2.1\" style=\"font-size:20px\"><strong>9.2.1) Using hierarchical clustering + k-nearest neighbors approach<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This approach is the one which was initially developed when <code>PICAFlow<\/code> was created. Basically, we compute a standard hierarchical clustering analysis on a given subset of the downsampled dataset, which allows users to choose (see below) the optimal number of clusters to retain. Afterwards, we apply the clustering obtained in the subset of downsampled dataset to the remaining cells that were not used using a k-nearest neighbors modelling method. Finally, using manually-determined thresholds between low and high-expressing cells for each parameter, we collapse the phenotypically close clusters to reduce the total number of clusters to a human-understandable one (typically dozens instead of hundreds or thousands).<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that this specific approach combines both supervised and unsupervised steps, in a way that we like to call a \u00ab\u00a0guided semi-supervised\u00a0\u00bb approach, which allows both flexibility, tuning and robustness of clusters identification. Concretely, users will sometimes have to manually determine parameters or values that are dataset-dependent using dedicated functions and command lines already implemented in <code>PICAFlow<\/code>. <\/strong> <\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"9.2.1.1\" style=\"font-size:18px\"><strong>9.2.1.1) Initial clustering on training dataset<\/strong><\/h5>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">From now, the next step is to identify clusters within the cells, based on their phenotype. To this, we begin by setting some variables used to perform an initial clustering step on a subset of cells coming from the downsampled subdataset (called \u00ab\u00a0training\u00a0\u00bb here):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define parameter to use for initial clustering\n<\/em>\nsubsetDownsampled_value = 45000\nclusterMinPercentage_value = 0.25\n\n<em># Perform initial clustering on training data\n<\/em>\ninitialClusteringTraining_out = <strong>initialClusteringTraining(<\/strong>\n    <strong>data<\/strong> = data,\n    <strong>parametersToAnalyze<\/strong> = parametersToKeepFinal,\n    <strong>subsetDownsampled<\/strong> = subsetDownsampled_value,\n    <strong>clusterMinPercentage<\/strong> = clusterMinPercentage_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, we need to plot the result of this function (see <strong>Figure 6<\/strong> below), which will display an interactive <code>ggplotly<\/code> plot:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Plot an interactive ggplotly plot for the initial clustering step\n<\/em>\n<strong>plotInitialClusteringTraining(<\/strong>\n    <strong>initialClustering<\/strong> = initialClusteringTraining_out\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a191e475&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a191e475\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"821\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot-1024x821.png\" alt=\"\" class=\"wp-image-1385\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot-1024x821.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot-300x241.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot-768x616.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot-15x12.png 15w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot.png 1312w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 6 &#8211; Interactive plot helping to choose the best cutoff to cut the hierarchical clustering tree <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This plot shows the relation between the number of identified clusters with an abundance greater than <code>clusterMinPercentage<\/code> (x axis) and the percentage of clusters showing an abundance greater than <code>clusterMinPercentage<\/code> (y axis) according to the threshold value used to cut the hierarchical tree (<code>h<\/code>). Typically, the ID value associated with the point where the x axis reaches a maximum should be considered as optimal for the tree cutting (see <strong>Figure 7<\/strong> below).<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a191ef38&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a191ef38\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"821\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThreshold-1-1024x821.png\" alt=\"\" class=\"wp-image-1387\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThreshold-1-1024x821.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThreshold-1-300x241.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThreshold-1-768x616.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThreshold-1-15x12.png 15w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThreshold-1.png 1313w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 7 &#8211; Plot overlaying the best cutoff to cut the hierarchical clustering tree <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">In the case of our example, the optimal value for the tree cutting (<code>h = 3.45<\/code>) and leading to the generation of a total 544 clusters (of which 113 show a minimum abundance of 0.25%) is reached at the <code>ID = 50<\/code> index value.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"9.2.1.2\" style=\"font-size:18px\"><strong>9.2.1.2) Final clustering on training dataset<\/strong><\/h5>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Once the optimal cutoff is identified, we can run the \u00ab\u00a0definitive\u00a0\u00bb cell clustering on the whole downsampled subdataset:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Here, the plot shows that the ID = 50 value is the best threshold\n<\/em>\ncutoff_value = 50\n\n<em># Once the cutoff is chosen, close the current plot\n<\/em>\n<strong>dev.off()<\/strong>\n\n<em># Perform final clustering on training data\n<\/em>\nfinalClusteringTraining_out = <strong>finalClusteringTraining(<\/strong>\n    <strong>initialClusteringData<\/strong> = initialClusteringTraining_out,\n    <strong>clusterMinPercentage<\/strong> = clusterMinPercentage_value,\n    <strong>cutoff<\/strong> = cutoff_value\n)<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">To keep a trace of this value, we can export a plot showing the cutoff colored in red on the previous interactive plot (see <strong>Figure 8<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Plot the graph for the final clustering step\n<\/em>\n<strong>plotFinalClusteringTraining(<\/strong>\n    <strong>initialClustering<\/strong> = initialClusteringTraining_out,\n    <strong>finalClustering<\/strong> = finalClusteringTraining_out\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a191fe47&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a191fe47\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1092\" height=\"1094\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1092px) 100vw, 1092px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThresholdRed.png\" alt=\"\" class=\"wp-image-1388\" style=\"aspect-ratio:1.004120879120879;object-fit:cover;width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThresholdRed.png 1092w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThresholdRed-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThresholdRed-1022x1024.png 1022w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThresholdRed-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThresholdRed-768x769.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringInitialInteractivePlot_withThresholdRed-12x12.png 12w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 8 &#8211; Plot showing the best cutoff to cut the hierarchical clustering tree<\/strong> <strong><strong>(click on the image to open in fullscreen)<\/strong><\/strong>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">We can also export a plot showing how the clustering of the downsampled subdataset overlays with the previously determined UMAP coordinates (see <strong>Figure 9<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Extract the clustered data\n<\/em>\ndataTrainingClustered = finalClusteringTraining_out$dataTraining\n\n<em># Plot UMAP graph with final clustering projection on training dataset\n<\/em>\n<strong>plotUMAP_projectionTraining(<\/strong>\n    <strong>dataTrainingClustered<\/strong> = dataTrainingClustered,\n    <strong>datasetFolder<\/strong> = datasetToUse_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a19209c3&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a19209c3\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1019\" height=\"1024\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1019px) 100vw, 1019px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringTraining_overlayUMAP-1019x1024.png\" alt=\"\" class=\"wp-image-1389\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringTraining_overlayUMAP-1019x1024.png 1019w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringTraining_overlayUMAP-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringTraining_overlayUMAP-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringTraining_overlayUMAP-768x772.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringTraining_overlayUMAP-12x12.png 12w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringTraining_overlayUMAP.png 1092w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 9 &#8211; Plot showing UMAP parameters of the downsampled subdataset overlaid with the hierarchical clustering previously performed <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"9.2.1.3\" style=\"font-size:18px\"><strong>9.2.1.3) Apply clustering model on validation dataset<\/strong><\/h5>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Now, we can apply the clustering model previously generated on the remaining cells of the downsampled subdataset (called \u00ab\u00a0validation\u00a0\u00bb here):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Identify rows that were not already taken for the clustering model generation\n<\/em>\nrowsDataValidation = <strong>which(rownames(<\/strong>data<strong>)<\/strong> %in% <strong>rownames(<\/strong>dataTrainingClustered<strong>)<\/strong> == FALSE<strong>)<\/strong>\n\n<em># Extract the validation dataset\n<\/em>\ndataValidation = data&#91;rowsDataValidation, ]\n\n<em># Define parameters to use for the clustering model application\n<\/em>\ncoresNumber_value = 8\nchunksMaxSize_value = 100000\n\n<em># Apply the clustering model to the validation dataset\n<\/em>\nclusteredFullData = <strong>applyClusterModel(<\/strong>\n    <strong>dataTraining<\/strong> = dataTrainingClustered,\n    <strong>dataValidation<\/strong> = dataValidation,\n    <strong>parametersToUse<\/strong> = parametersToKeepFinal,\n    <strong>coresNumber<\/strong> = coresNumber_value,\n    <strong>chunksMaxSize<\/strong> = chunksMaxSize_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"9.2.1.4\" style=\"font-size:18px\"><strong>9.2.1.4) Determine binary thresholds<\/strong><\/h5>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, we need to determine a threshold for separating \u00ab\u00a0low\u00a0\u00bb and \u00ab\u00a0high\u00a0\u00bb (or \u00ab\u00a0negative\u00a0\u00bb and \u00ab\u00a0positive\u00a0\u00bb) cells for each marker of the dataset. To this, we first open the dataset, then choose a number of cells to display on the plot, and finally create an interactive <code>ggplotly<\/code> plot (see <strong>Figure 10<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Open the dataset of interest\n<\/em>\ndataThresholds = <strong>openUMAPData(<\/strong>\n    <strong>datasetToUse<\/strong> = datasetToUse_value\n<strong>)<\/strong>\n\n<em># Define the maximum number of cells to display on each graph\n<\/em>\ndisplayedCells_value = 100000\n\n<em># Plot the density plot for the parameter of interest\n<\/em>\n<strong>determineParameterThreshold(<\/strong>\n    <strong>data<\/strong> = dataThresholds,\n    <strong>parameter<\/strong> = parametersToKeepFinal&#91;1],\n    <strong>displayedCells<\/strong> = displayedCells_value\n<strong>)<\/strong>\n\n<em># Close the current active plot\n<\/em>\n<strong>dev.off()<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a192174b&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a192174b\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"817\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_thresholdsDeterminationParameters-1024x817.png\" alt=\"\" class=\"wp-image-1391\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_thresholdsDeterminationParameters-1024x817.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_thresholdsDeterminationParameters-300x239.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_thresholdsDeterminationParameters-768x613.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_thresholdsDeterminationParameters-15x12.png 15w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_thresholdsDeterminationParameters.png 1318w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 10 &#8211; Density plot of the first parameter (<code>FCRL5_APC<\/code>) of the current dataset <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that the code above displays the first parameter (<code>FCRL5_APC<\/code>) of the <code>parametersToKeepFinal<\/code> variable, but users indeed have to repeat it for each parameter of the dataset.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The principle is to manually identify the thresholds to use for each parameter, then write them down on a new named vector:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Save the visually-determined thresholds for each parameter in a vector\n<\/em>\nthresholds = <strong>c(<\/strong>\n    6.42,\n    5.71,\n    7.42,\n    4.08,\n    6.28,\n    6.8,\n    6.9,\n    6.74,\n    5.11,\n    7.05,\n    6.47,\n    4.94,\n    5.19\n<strong>)<\/strong>\n\nparametersToUseThresholds = <strong>data.frame(<\/strong>\n    <strong>parameter<\/strong> = parametersToKeepFinal,\n    <strong>threshold<\/strong> = thresholds\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h5 class=\"wp-block-heading\" id=\"9.2.1.5\" style=\"font-size:18px\"><strong>9.2.1.5) Collapse phenotypically close clusters<\/strong><\/h5>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Once the previous step is done, we can finally proceed with the collapsing of clusters which show a very close phenotype, thanks to the binary thresholds we just determined:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Collapse the phenotypically close clusters\n<\/em>\nclusteredFullData_final = <strong>collapseCloseClusters(<\/strong>\n    <strong>data<\/strong> = clusteredFullData,\n    <strong>parametersToUse<\/strong> = parametersToKeepFinal,\n    <strong>parametersToUseThresholds<\/strong> = parametersToUseThresholds,\n    <strong>metricUsed<\/strong> = metricToUse_value,\n    <strong>datasetFolder<\/strong> = datasetToUse_value,\n    <strong>customClusteringCut<\/strong> = 0\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">This step helps to greatly reduce the number of final clusters, which will lead to a simpler data analysis and clusters labelling. Please note that this function will save a new rds file called <code>7_Clustered_datasetFolder.rds<\/code> in the <code>rds<\/code> directory.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>If the final number of collapsed clusters is still too high (typically superior to 50\/70, depending on the number of cells and parameters involved but also on users and their global vision of cell clustering), users can set the <code>customClusteringCut<\/code> argument to a value superior than <code>0<\/code> (the default value), in order to further increase the strength of the collapsing. To better help users to choose the optimal threshold, a PDF file containing the dendrogram of the cell clusters as well as the position of the chosen value is exported in the <code>output &gt; 7_Clustering<\/code> directory.<\/strong><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"9.2.2\" style=\"font-size:20px\"><strong>9.2.2) Using FlowSOM method<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If users do not want to use our own approach for cell clustering, they always have the possibility to use another cell clustering method already implemented in <code>PICAFlow<\/code>, such as <code>FlowSOM<\/code>. Please note that this section totally replaces the 9.2.1) one and can be used interchangeably (but not cumulatively!).<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Cluster the cells using FlowSOM method\n<\/em>\nclusteredFullData_final = <strong>FlowSOM_clustering(<\/strong>\n    <strong>data<\/strong> = data,\n    <strong>parametersToUse<\/strong> = parametersToKeepFinal,\n    <strong>seed<\/strong> = seed_value,\n    <strong>maxMeta<\/strong> = 90,\n    <strong>datasetFolder<\/strong> = datasetToUse_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: The <code>maxMeta<\/code> argument is defined in the <code>FlowSOM()<\/code> original function as the maximum number of clusters to try out for meta-clustering. Please see <a href=\"https:\/\/bioconductor.org\/packages\/release\/bioc\/html\/FlowSOM.html\">this link<\/a> (as well as the documentation of the function itself) for further information about the FlowSOM clustering method.<\/strong><\/p>\n\n\n\n<h4 class=\"wp-block-heading\" id=\"9.2.3\" style=\"font-size:20px\"><strong>9.2.3) Using PhenoGraph method<\/strong><\/h4>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If users do not want to use our own approach for cell clustering, they always have the possibility to use another cell clustering method already implemented in <code>PICAFlow<\/code>, such as <code>PhenoGraph<\/code> (thanks to the <code>FastPG<\/code> R package). Please note that this section totally replaces the 9.2.1) one and can be used interchangeably (but not cumulatively!).<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Cluster the cells using PhenoGraph method<\/em>\n\nclusteredFullData_final = <strong>FastPhenoGraph_clustering(<\/strong>\n    <strong>data<\/strong> = data,\n    <strong>parametersToUse<\/strong> = parametersToKeepFinal,\n    <strong>k<\/strong> = 100,\n    <strong>coresNumber<\/strong> = 10,\n    <strong>datasetFolder<\/strong> = datasetToUse_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: The <code>k<\/code> argument is defined in the <code>fastCluster()<\/code> original function as the local neighborhood size (also called the k-nearest neighbors) to use for the generation of clusters. Please see <a href=\"https:\/\/github.com\/sararselitsky\/FastPG\">this link<\/a> for further information about the <code>FastPG<\/code> implementation of the PhenoGraph method.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"9.3\" style=\"font-size:22px\"><strong>9.3) Visualize clusters on UMAP dimensions<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Once the cells were clusterized (using any method implemented within <code>PICAFlow<\/code>), users have the possibility to project the determined clusters over the UMAP coordinates previously computed (see <strong>Figure 11<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define the maximum number of cells to display on each graph\n<\/em>\ndisplayedCells_value = 100000\n\n<em># Plot UMAP graph with final clustering projection on the whole dataset\n<\/em>\n<strong>plotUMAP_projectionFinalClusters(<\/strong>\n    <strong>data<\/strong> = clusteredFullData_final,\n    <strong>displayedCells<\/strong> = displayedCells_value,\n    <strong>datasetFolder<\/strong> = datasetToUse_value\n)<\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a19227be&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a19227be\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1022\" height=\"1024\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1022px) 100vw, 1022px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_overlayUMAP-1022x1024.png\" alt=\"\" class=\"wp-image-1392\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_overlayUMAP-1022x1024.png 1022w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_overlayUMAP-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_overlayUMAP-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_overlayUMAP-768x769.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_overlayUMAP-12x12.png 12w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_overlayUMAP.png 1091w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 11 &#8211; Plot showing UMAP parameters of the whole dataset overlaid with the collapsed clusters previously determined using the method detailed in the 9.2.1) section <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"9.4\" style=\"font-size:22px\"><strong>9.4) Export clusters-associated statistics and plots<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Next, we can use our newly determined clusters to export:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\">Plots showing the UMAP coordinates of each cluster for \u00ab\u00a0training\u00a0\u00bb and \u00ab\u00a0validation\u00a0\u00bb subdatasets (applicable only for the clustering method detailed in the 9.2.1) section) as well as for the whole dataset (applicable for all the clustering methods implemented in <code>PICAFlow<\/code>) (see <strong>Figures 12-14<\/strong> below)<\/li>\n\n\n\n<li style=\"font-size:16px\">Text files containing information about clusters abundances (per sample and per group) as well as their phenotypes<\/li>\n<\/ul>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define several values for the next commands \n<\/em>\nfolder_value = \"clusters\"\ncoresNumber_value = 6\nprefix_value = \"MixB\"\nmaxCellsPerPlot_value = 1000\n\n<em># Open the dataset of interest\n<\/em>\nclusteredFullData_final = <strong>openClusteredFullDataCollapsed(<\/strong>\n    <strong>datasetToUse<\/strong> = datasetToUse_value\n<strong>)<\/strong>\n\n<em># Export clusters-associated statistics and plots\n<\/em>\n<strong>exportClustersStatsAndPlots(<\/strong>\n    <strong>data<\/strong> = clusteredFullData_final,\n    <strong>folder<\/strong> = folder_value,\n    <strong>parametersToUse<\/strong> = parametersToKeepFinal,\n    <strong>coresNumber<\/strong> = coresNumber_value,\n    <strong>prefix<\/strong> = prefix_value,\n    <strong>maxCellsPerPlot<\/strong> = maxCellsPerPlot_value,\n    <strong>metricUsed<\/strong> = metricToUse_value,\n    <strong>datasetFolder<\/strong> = datasetToUse_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a192313e&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a192313e\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1022\" height=\"1024\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1022px) 100vw, 1022px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_trainingVisualization-1022x1024.png\" alt=\"\" class=\"wp-image-1393\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_trainingVisualization-1022x1024.png 1022w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_trainingVisualization-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_trainingVisualization-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_trainingVisualization-768x769.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_trainingVisualization-12x12.png 12w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_trainingVisualization.png 1088w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 12 &#8211; Plot showing UMAP parameters of a given cluster, only for the cells that were used for the \u00ab\u00a0training\u00a0\u00bb step of the hierarchical clustering <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a192383f&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a192383f\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1022\" height=\"1024\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1022px) 100vw, 1022px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_validationVisualization-1022x1024.png\" alt=\"\" class=\"wp-image-1394\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_validationVisualization-1022x1024.png 1022w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_validationVisualization-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_validationVisualization-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_validationVisualization-768x769.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_validationVisualization-12x12.png 12w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_validationVisualization.png 1088w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 13 &#8211; Plot showing UMAP parameters of a given cluster, only for the cells that were used for the \u00ab\u00a0validation\u00a0\u00bb step of the hierarchical clustering <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a1923fe8&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a1923fe8\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1022\" height=\"1024\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1022px) 100vw, 1022px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_allDataVisualization-1022x1024.png\" alt=\"\" class=\"wp-image-1395\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_allDataVisualization-1022x1024.png 1022w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_allDataVisualization-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_allDataVisualization-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_allDataVisualization-768x769.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_allDataVisualization-12x12.png 12w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringAllData_allDataVisualization.png 1090w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 14 &#8211; Plot showing UMAP parameters of a given cluster for all the cells of the dataset <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"9.5\" style=\"font-size:22px\"><strong>9.5) Export clusters-associated heatmaps<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">It is also possible to export heatmaps showing the clusters phenotypes (by marker) and abundances (by group or by sample) (see <strong>Figures 15-16<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Export clusters-associated heatmap (phenotypes by marker)<\/em>\n\n<em># Note: If you followed the hierarchical clustering + k-nearest neighbors approach for cell clustering (as detailed in the subsection 9.2.1) of this tutorial), then you already have set the `thresholds` variable and you can directly run the following code.\n<\/em>\n<em># On the contrary, you have to set the `thresholds` argument to `NULL`. This will prevent the binary heatmaps (which use these thresholds) from being generated.\n<\/em>\n<strong>clustersPhenotypesHeatmap(<\/strong>\n    <strong>prefix<\/strong> = prefix_value,\n    <strong>thresholds<\/strong> = thresholds,\n    <strong>metricUsed<\/strong> = metricToUse_value,\n    <strong>datasetFolder<\/strong> = datasetToUse_value\n<strong>)<\/strong>\n\n<em># Export clusters-associated heatmap (percentages by group)\n<\/em>\n<strong>clustersPercentagesHeatmap(<\/strong>\n    <strong>prefix<\/strong> = prefix_value,\n    <strong>metricUsed<\/strong> = metricToUse_value,\n    <strong>datasetFolder<\/strong> = datasetToUse_value,\n    <strong>mode<\/strong> = \"group\"\n<strong>)<\/strong>\n\n<em># Export clusters-associated heatmap (percentages by sample)\n<\/em>\n<strong>clustersPercentagesHeatmap(<\/strong>\n    <strong>prefix<\/strong> = prefix_value,\n    <strong>metricUsed<\/strong> = metricToUse_value,\n    <strong>datasetFolder<\/strong> = datasetToUse_value,\n    <strong>mode<\/strong> = \"sample\"\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a192493b&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a192493b\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1707\" height=\"1205\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1707px) 100vw, 1707px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersPhenotypes.png\" alt=\"\" class=\"wp-image-1396\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersPhenotypes.png 1707w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersPhenotypes-300x212.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersPhenotypes-1024x723.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersPhenotypes-768x542.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersPhenotypes-1536x1084.png 1536w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersPhenotypes-18x12.png 18w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 15 &#8211; Heatmaps showing the row-scaled phenotypes of each determined final cluster <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>If you chose our own clustering method implemented by default in <code>PICAFlow<\/code> (not <code>FlowSOM<\/code> nor <code>PhenoGraph<\/code>), another heatmap will be generated, in parallel to the one presented in the Figure 15. It will contain the same information except that it will be simplified to a \u00ab\u00a0binary heatmap\u00a0\u00bb, using the <code>thresholds<\/code> variable that you set up earlier. If you did not follow our method, then this <code>thresholds<\/code> variable should be set to <code>NULL<\/code>, as described in the previous code section.<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a19252f8&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a19252f8\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"536\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances-1024x536.png\" alt=\"\" class=\"wp-image-1397\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances-1024x536.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances-300x157.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances-768x402.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances-1536x805.png 1536w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances-18x9.png 18w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances.png 1659w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 16 &#8211; Heatmap showing the abundances of each determined final cluster in each group <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"10\" style=\"font-size:28px\"><strong>10) Metadata integration and analysis<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">From here, we can proceed to the integration of metadata to the processed dataset in order to add new information and layers of complexity, which can help to more precisely analyze the clusters and eventually remove outliers if needed (see below).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.1\" style=\"font-size:22px\"><strong>10.1) Open and merge datasets<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Before beginning, please note that all the following process occurs within the <code>output &gt; 8_Analysis<\/code> directory. It is also mandatory to copy\/paste at least one <code>xxx_clustersPercentages.txt<\/code> file from the <code>output &gt; 7_Clustering<\/code> directory to the <code>output &gt; 8_Analysis<\/code> directory, as this file is the basis of this whole section of the workflow. If needed, users can provide more than one text files: this typically happens when more than one antibody panel is used on the same samples.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that these automatically generated <code>xxx_clustersPercentages.txt<\/code> files contain 2 close but distinct columns: <code>SampleCorrected<\/code> and <code>Sample<\/code>. The <code>Sample<\/code> column contains the original sample name (matching the name of the FCS file), whereas the <code>SampleCorrected<\/code> column can be manually edited if necessary to eventually correct a typo or change the formatting of the sample names.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">First, we have to select which column of the text file actually contains the sample names, then procceed to the text files merging:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define the column to use for data merging\n<\/em>\nsampleNamesColumn_value = \"SampleCorrected\"\n\n<em># Open and merge the text files of interest\n<\/em>\ndata = <strong>mergeData(<\/strong>\n    <strong>pattern<\/strong> = \"(.*).txt\",\n    <strong>sampleNamesColumn<\/strong> = sampleNamesColumn_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Even if highly recommended in order to benefit from all the features included in <code>PICAFlow<\/code>, the following 10.2) section is technically optional, as it heavily relies on the importation of metadata associated to the dataset of interest. If one does not have any metadata to use, this 10.2) section is therefore useless. In this case, please proceed to the 10.3) section.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.2\" style=\"font-size:22px\"><strong>10.2) I have metadata available for my dataset<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" id=\"10.2.1\" style=\"font-size:20px\"><strong>10.2.1) Metadata integration<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, we have to copy\/paste in the <code>output &gt; 8_Analysis<\/code> directory an Excel file named <code>metadata.xlsx<\/code> containing the actual metadata associated to the samples. For instance, if the samples are from human blood, this file will probably contain information about the patients: ID, name, disease, remarks, biological features, gender, etc. This Excel file has to follow a simple format: the first row should contain the column names, and every next row is referring to a single sample. There can be as much columns and rows as desired, but only the rows matching the actual samples will be kept.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">For the example dataset provided in this tutorial, you can download and use the associated <a href=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_test_dataset_metadata.xlsx\">PICAFlow_test_dataset_metadata.xlsx<\/a> file. Do not forget to rename it to <code>metadata.xlsx<\/code> before use!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Concretely, we only have to define the column of the <code>metadata.xlsx<\/code> file which contains the sample names that will match with the previously defined <code>sampleNamesColumn<\/code> used in the <code><strong>mergeData()<\/strong><\/code> function:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define the column to use for metadata merging\n<\/em>\nmergeColumn_value = \"General.Code_Disease_Groups\"\nreplaceColumn_value = NULL\n\n<em># Merge the data and metadata\n<\/em>\ndata = <strong>mergeMetadata(<\/strong>\n    <strong>data<\/strong> = data,\n    <strong>mergeColumn<\/strong> = mergeColumn_value,\n    <strong>replaceColumn<\/strong> = replaceColumn_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: the match should be exact between <code>mergeColumn<\/code> and <code>sampleNamesColumn<\/code>!<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note 2: the <code>replaceColumn<\/code> argument can be used to rename sample rows after merging is done. This can be helpful is you have several clinical subgroups for your samples of a given group for instance.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.2.2\" style=\"font-size:20px\"><strong>10.2.2) Subset merged data<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Next, we subset the obtained merged data to keep only the desired features that will be included in the next UMAP analysis, both at the data level (with <code>columnsToKeepData<\/code> arugment) and metadata level (with <code>columnsToKeepMetadata<\/code> argument):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define which data and metadata columns to keep\n<\/em>\ncolumnsToKeepData_value = \"MixB\"\ncolumnsToKeepMetadata_value = NULL\n\n<em># Subset the merged data\n<\/em>\ndataOut = <strong>subsetDataUMAP(<\/strong>\n    <strong>data<\/strong> = data,\n    <strong>columnsToKeepData<\/strong> = columnsToKeepData_value,\n    <strong>columnsToKeepMetadata<\/strong> = columnsToKeepMetadata_value,\n    <strong>isColumnsToKeepDataRegex<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that the <code>subsetDataUMAP()<\/code> function does not really delete any column. It rather splits the dataset in a list of 2 elements: one named <code>dataSubset<\/code> which contains the actual columns of interest, and another named <code>dataRemoved<\/code> which contains the unwanted columns.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.2.3\" style=\"font-size:20px\"><strong>10.2.3) UMAP dimensionality reduction of cell cluster abundances<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, we can perform a UMAP dimensionality reduction analysis to observe how data cluster at the cell cluster abundance level, with either groups or gender of samples as overlay (see <strong>Figures 17-18<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define UMAP hyperparameters\n<\/em>\nn_neighbors_value_UMAP = 3\nmin_dist_value_UMAP = 0.25\n\n<em># Perform UMAP analysis on subset data and export a plot showing the group overlay\n<\/em>\n<strong>UMAP_clusters(<\/strong>\n    <strong>data<\/strong> = dataOut,\n    <strong>n_neighbors_UMAP<\/strong> = n_neighbors_value_UMAP,\n    <strong>min_dist_UMAP<\/strong> = min_dist_value_UMAP,\n    <strong>feature<\/strong> = \"General.Group\",\n    <strong>suffix<\/strong> = \"_allPatients\",\n    <strong>returnUMAPData<\/strong> = FALSE,\n    <strong>computeUMAP<\/strong> = TRUE,\n    <strong>seedValue<\/strong> = seed_value,\n    <strong>plotHighlightedFeatureItems<\/strong> = TRUE\n<strong>)<\/strong>\n\n<em># Perform UMAP analysis on subset data and export a plot showing the gender overlay\n<\/em>\n<strong>UMAP_clusters(<\/strong>\n    <strong>data<\/strong> = dataOut,\n    <strong>n_neighbors_UMAP<\/strong> = n_neighbors_value_UMAP,\n    <strong>min_dist_UMAP<\/strong> = min_dist_value_UMAP,\n    <strong>feature<\/strong> = \"General.Gender\",\n    <strong>suffix<\/strong> = \"_allPatients\",\n    <strong>returnUMAPData<\/strong> = FALSE,\n    <strong>computeUMAP<\/strong> = TRUE,\n    <strong>seedValue<\/strong> = seed_value,\n    <strong>plotHighlightedFeatureItems<\/strong> = TRUE\n)<\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: Users can provide any metadata-associated column name for the UMAP overlay, as long as it represents categorical values.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note 2: As previously mentioned, users can test several values for <code>min_dist<\/code> and <code>n_neighbors<\/code> UMAP arguments, even if this function does not allow to provide more than one value for each parameter.<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a192688a&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a192688a\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1172\" height=\"1173\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1172px) 100vw, 1172px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGroups.png\" alt=\"\" class=\"wp-image-1398\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGroups.png 1172w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGroups-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGroups-1024x1024.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGroups-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGroups-768x769.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGroups-12x12.png 12w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 17 &#8211; Plot showing UMAP parameters of clustered samples overlaid with their respective groups <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a1927046&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a1927046\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1092\" height=\"1100\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1092px) 100vw, 1092px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGender.png\" alt=\"\" class=\"wp-image-1399\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGender.png 1092w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGender-298x300.png 298w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGender-1017x1024.png 1017w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGender-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGender-768x774.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_overlayGender-12x12.png 12w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 18 &#8211; Plot showing UMAP parameters of clustered samples overlaid with their respective gender <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.2.4\" style=\"font-size:20px\"><strong>10.2.4) Remove outliers (optional)<\/strong><\/h3>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Note: If your samples made it to this step, it means that they successfully passed standard quality control tests and that their staining do not seem to present any important trouble as compared to the other samples. But, if you really need to, you have the possibility to eventually remove one or more sample(s) from the dataset. Although totally optional, this can eventually become necessary when there are a very important number of batches in your dataset (typically incarnated by the \u00ab\u00a0one sample, one batch\u00a0\u00bb experimental design), or if the associated metadata for some samples are incorrect or not precise enough. The example provided in the tutorial is purely illustrative and do not represent actual biological artifacts. In all cases, please use this feature knowingly.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">To try reducing this previously identified heterogeneity, we can choose (or not) to eliminate one or more samples that we identify as outliers (in a purely illustrative way here):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define outliers\n<\/em>\noutliersToRemove_value = <strong>c(<\/strong>\n    \"Group-HD_Sample-BARBIm\",\n    \"Group-RA_Sample-DOMm\",\n    \"Group-SLE_Sample-BARc\",\n    \"Group-Sjogren_Sample-BELk\",\n    \"Group-Cryo_Sample-CHAr\"\n<strong>)<\/strong>\n\n<em># Remove outliers from the dataset\n<\/em>\ndata = <strong>removeOutliers(<\/strong>\n    <strong>data<\/strong> = data,\n    <strong>outliers<\/strong> = outliersToRemove_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then we need to recompute the columns subsetting and the UMAP dimensionality reduction (see <strong>Figure 19<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Recompute subset data (as previously)\n<\/em>\ndataOut = <strong>subsetDataUMAP(<\/strong>\n    <strong>data<\/strong> = data,\n    <strong>columnsToKeepData<\/strong> = columnsToKeepData_value,\n<strong>    columnsToKeepMetadata<\/strong> = columnsToKeepMetadata_value,\n    <strong>isColumnsToKeepDataRegex<\/strong> = TRUE\n<strong>)<\/strong>\n\n<em># Perform UMAP analysis on the remaining dataset (as previously)\n<\/em>\n<strong>UMAP_clusters(<\/strong>\n    <strong>data<\/strong> = dataOut,\n    <strong>n_neighbors_UMAP<\/strong> = n_neighbors_value_UMAP,\n    <strong>min_dist_UMAP<\/strong> = min_dist_value_UMAP,\n    <strong>feature<\/strong> = \"General.Group\",\n    <strong>suffix<\/strong> = \"_noOutliers\",\n    <strong>returnUMAPData<\/strong> = FALSE,\n    <strong>computeUMAP<\/strong> = TRUE,\n    <strong>seedValue<\/strong> = seed_value,\n    <strong>plotHighlightedFeatureItems<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a192827e&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a192827e\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1090\" height=\"1102\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1090px) 100vw, 1090px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayGroups.png\" alt=\"\" class=\"wp-image-1400\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayGroups.png 1090w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayGroups-297x300.png 297w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayGroups-1013x1024.png 1013w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayGroups-768x776.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayGroups-12x12.png 12w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 19 &#8211; Plot showing UMAP parameters of non-outlier samples overlaid with their respective groups <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>If you need to remove other samples, feel free to redo the previous steps starting from the removal of outliers step. If you need to add again a removed sample, please start from the beginning of the chapter 10).<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.2.5\" style=\"font-size:20px\"><strong>10.2.5) Include final UMAP embeddings to the dataset<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Once we are fully satisfied with the remaining data, we include the latest computed UMAP embeddings to the columns of interest of the dataset, which will constitute the final dataset to use in the last parts of the workflow:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Include final UMAP embeddings to the dataset\n<\/em>\ndataOut = <strong>UMAP_clusters(<\/strong>\n    <strong>data<\/strong> = dataOut,\n    <strong>n_neighbors_UMAP<\/strong> = n_neighbors_value_UMAP,\n    <strong>min_dist_UMAP<\/strong> = min_dist_value_UMAP,\n    <strong>feature<\/strong> = \"General.Group\",\n    <strong>suffix<\/strong> = \"_noOutliers\",\n    <strong>returnUMAPData<\/strong> = TRUE,\n    <strong>computeUMAP<\/strong> = TRUE,\n    <strong>seedValue<\/strong> = seed_value,\n    <strong>plotHighlightedFeatureItems<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.2.6\" style=\"font-size:20px\"><strong>10.2.6) Hierarchical clustering of UMAP projection<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Next, we can perform a hierarchical clustering analysis based on previously generated UMAP embeddings and export the associated plot (see <strong>Figures 20-21<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define a number of clusters to export\n<\/em>\nclustersNb_value = 3\n\n<em># Perform a hierarchical clustering of the dataset UMAP projection\n<\/em>\ndataOut = <strong>hierarchicalClusteringData(<\/strong>\n    <strong>data<\/strong> = dataOut,\n    <strong>clustersNb<\/strong> = clustersNb_value\n<strong>)<\/strong>\n\n<em># Perform UMAP analysis on dataset and export a plot showing the clusters overlay\n<\/em>\n<strong>UMAP_clusters(<\/strong>\n    <strong>data<\/strong> = dataOut,\n    <strong>n_neighbors_UMAP<\/strong> = n_neighbors_value_UMAP,\n    <strong>min_dist_UMAP<\/strong> = min_dist_value_UMAP,\n    <strong>feature<\/strong> = \"cluster\",\n    <strong>suffix<\/strong> = \"_noOutliers\",\n    <strong>returnUMAPData<\/strong> = FALSE,\n    <strong>computeUMAP<\/strong> = FALSE,\n    <strong>seedValue<\/strong> = seed_value,\n    <strong>plotHighlightedFeatureItems<\/strong> = TRUE\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a192938b&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a192938b\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"724\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringSamplesDendrogram-1024x724.png\" alt=\"\" class=\"wp-image-1401\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringSamplesDendrogram-1024x724.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringSamplesDendrogram-300x212.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringSamplesDendrogram-768x543.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringSamplesDendrogram-1536x1086.png 1536w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringSamplesDendrogram-18x12.png 18w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_clusteringSamplesDendrogram.png 1678w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 20 &#8211; Dendrogram showing the number of desired sample clusters <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a1929e47&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a1929e47\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1013\" height=\"1024\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1013px) 100vw, 1013px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayClusters-1013x1024.png\" alt=\"\" class=\"wp-image-1402\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayClusters-1013x1024.png 1013w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayClusters-297x300.png 297w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayClusters-768x777.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayClusters-12x12.png 12w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_UMAPClusters_noOutliers_overlayClusters.png 1163w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 21 &#8211; Plot showing UMAP parameters of non-outlier samples overlaid with their respective clusters <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">You can see in the <strong>Figure 21<\/strong> above that patients are separated in 3 well-defined clusters. Here, the chosen number of clusters was purely illustrative, but users can actually define it to any number they want. The hierarchical clustering approach will output this chosen number of clusters, so users have to perform iterative tests to determine the optimal number of clusters for their dataset.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">These clusters could for instance help users to study differences between previously unthought patient groups (for instance, cluster 1 vs. cluster 2, or cluster 1 vs. cluster 3, or cluster 2 vs. cluster 3 in our example).<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.2.7\" style=\"font-size:20px\"><strong>10.2.7) Prepare data for export<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">One of the last steps is to restructure the cell cluster abundances matrix to have a simple table containing all the associated metadata columns (and not only the ones of interest used for the UMAP dimensionality reduction):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Restructure the data to merge all the columns of interest\n<\/em>\ndataBind = <strong>bindData(<\/strong>\n    <strong>data<\/strong> = dataOut\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.2.8\" style=\"font-size:20px\"><strong>10.2.8) Export merged data, boxplots and feature tables<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Then, we choose columns that we want to plot as histograms\/boxplots according to one or more feature(s) of interest to split the dataset by. Using these information, we finally construct the associated plots (see <strong>Figure 22<\/strong>), then export the final matrix as a table as well as other tables (one per feature of interest) containing several statistics about cell clusters (notably their abundances according to the separating feature, statistical comparison between every possible pair, etc.):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define features and columns to use\n<\/em>\nfeaturesToUse_value = <strong>c(<\/strong>\n    \"General.Group\",\n    \"cluster\",\n    \"General.Gender\"\n<strong>)<\/strong>\n\ncolumnsToPlot_value = <strong>colnames(<\/strong>dataOut$dataSubset<strong>)<\/strong>\n\n<em># Construct boxplots and UMAP overlays for the previously defined values\n<\/em>\n<strong>constructPlots(<\/strong>\n    <strong>data<\/strong> = dataBind,\n    <strong>columnsToPlot<\/strong> = columnsToPlot_value,\n    <strong>features<\/strong> = featuresToUse_value,\n    <strong>plotUMAPOverlays<\/strong> = TRUE\n<strong>)<\/strong>\n\n<em># Export the full data table\n<\/em>\n<strong>exportDataBind(<\/strong>\n    <strong>data<\/strong> = dataBind\n<strong>)<\/strong>\n\n<em># Export feature tables\n<\/em>\n<strong>exportFeaturesTables(<\/strong>\n    <strong>data<\/strong> = dataBind,\n    <strong>columnsToUse<\/strong> = columnsToPlot_value,\n    <strong>featuresToUse<\/strong> = featuresToUse_value,\n    <strong>metricToUse<\/strong> = metricToUse_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a192af9d&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a192af9d\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"937\" height=\"937\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 937px) 100vw, 937px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_boxplotsClusterByGroup.png\" alt=\"\" class=\"wp-image-1403\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_boxplotsClusterByGroup.png 937w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_boxplotsClusterByGroup-300x300.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_boxplotsClusterByGroup-150x150.png 150w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_boxplotsClusterByGroup-768x768.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_boxplotsClusterByGroup-12x12.png 12w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 22 &#8211; Boxplots showing a given cluster abundance for the feature \u00ab\u00a0group\u00a0\u00bb <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Furthermore, we can export several heatmaps showing the clusters abundances according to the different features of interest (see <strong>Figure 23<\/strong> below):<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Export heatmaps showing clusters abundances according to the feature of interest\n<\/em>\n<em># Feature = \"Group\"\n<\/em>\n<strong>heatmapAbundancesGroups(<\/strong>\n    <strong>feature<\/strong> = \"General.Group\",\n    <strong>clustersToKeepRegex<\/strong> = \"MixB\",\n    <strong>metricToUse<\/strong> = metricToUse_value\n<strong>)<\/strong>\n\n<em># Feature = \"Gender\"\n<\/em>\n<strong>heatmapAbundancesGroups(<\/strong>\n    <strong>feature<\/strong> = \"General.Gender\",\n    <strong>clustersToKeepRegex<\/strong> = \"MixB\",\n    <strong>metricToUse<\/strong> = metricToUse_value\n<strong>)<\/strong>\n\n<em># Feature = \"Cluster\"<\/em>\n\n<strong>heatmapAbundancesGroups(<\/strong>\n    <strong>feature<\/strong> = \"cluster\",\n    <strong>clustersToKeepRegex<\/strong> = \"MixB\",\n    <strong>metricToUse<\/strong> = metricToUse_value\n<strong>)<\/strong><\/code><\/pre>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a192bae1&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a192bae1\" class=\"aligncenter size-large is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"674\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances_noOutliers-1024x674.png\" alt=\"\" class=\"wp-image-1404\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances_noOutliers-1024x674.png 1024w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances_noOutliers-300x197.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances_noOutliers-768x505.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances_noOutliers-1536x1010.png 1536w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances_noOutliers-18x12.png 18w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_heatmapClustersAbundances_noOutliers.png 1695w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 23 &#8211; Heatmap showing the abundances of each determined cluster in each feature \u00ab\u00a0group\u00a0\u00bb, without the presence of outliers <strong>(click on the image to open in fullscreen)<\/strong>.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.2.9\" style=\"font-size:20px\"><strong>10.2.9) ROC analysis<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Ultimately, we can also perform ROC analyses on the dataset by defining a list of predictors to use and a list of pairs of groups to analyze. Please note that the following function needs an Excel file of any desired name (specified in the <code>dataFile<\/code> argument of the <code><strong>ROCanalysis()<\/strong><\/code> function) in the <code>output &gt; 8_Analysis<\/code> directory.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that this Excel file should be generated based on the <code>FullData.txt<\/code> file previously generated by the <code>exportDataBind()<\/code> function.<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># If necessary, recall the clusters present in the dataset with one of the following commands\n<\/em>\n<strong>colnames(<\/strong>dataBind<strong>)<\/strong>\n<strong>colnames(<\/strong>dataBind&#91;, <strong>grep(<\/strong>columnsToKeepData_value, <strong>colnames(<\/strong>dataBind<strong>))<\/strong>]<strong>)<\/strong>\n\n<em># Define parameters to use for ROC analysis\n<\/em>\npredictorsList = <strong>list(<\/strong>\n    <strong>c(<\/strong>\n        \"MixB_C328\",\n        \"MixB_C296\",\n        \"MixB_C139\"\n    <strong>)<\/strong>\n<strong>)<\/strong>\n\npairsToAnalyzeList = <strong>list(<\/strong>\n    <strong>list(<\/strong>\n        <strong>c(<\/strong>\"HD\", \"SLE\", \"RA\", \"Sjogren\"<strong>)<\/strong>,\n        <strong>c(<\/strong>\"Cryo\"<strong>)<\/strong>\n    <strong>)<\/strong>\n<strong>)<\/strong>\n\n<em># Perform ROC analysis\n<\/em>\n<strong>ROCanalysis(<\/strong>\n    <strong>dataFile<\/strong> = \"FullData_custom.xlsx\",\n    <strong>predictors<\/strong> = predictorsList,\n    <strong>pairsToAnalyze<\/strong> = pairsToAnalyzeList,\n    <strong>pairs_columnToCheck<\/strong> = \"General.Group\"\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that each <code>predictorsList<\/code> element can also be a vector, meaning that it can contain several items. In this case, the final predictor used for the ROC computation will be automatically defined as the sum of the indicated subpredictors. More information about this function is available in the associated documentation (<code>?ROCanalysis<\/code>)<\/strong>.<\/p>\n\n\n\n<p class=\"has-background wp-block-paragraph\" style=\"background-color:#c6cbe1;font-size:16px\"><strong>Please note that the provided cluster name in the previous script <strong>(<code>MixB_C102<\/code> here<\/strong>) may not actually be present in the clustering when you perform it, due to the initial randomness of the clusters&rsquo; generation.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The choice was made here to rely on a manually-edited Excel file because users could need to create new predictors for their dataset, which can be for instance the sum of several cell clusters, or even an ad-hoc score generated by any method of interest.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Associated ROC curves (see <strong>Figure 24<\/strong> below) and summary table will be saved in the <code>output &gt; 8_Analysis &gt; ROC<\/code> directory.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure data-wp-context=\"{&quot;imageId&quot;:&quot;6a559a192c98a&quot;}\" data-wp-interactive=\"core\/image\" data-wp-key=\"6a559a192c98a\" class=\"aligncenter size-full is-resized wp-lightbox-container\"><img loading=\"lazy\" decoding=\"async\" width=\"962\" height=\"845\" data-wp-class--hide=\"state.isContentHidden\" data-wp-class--show=\"state.isContentVisible\" data-wp-init=\"callbacks.setButtonStyles\" data-wp-on--click=\"actions.showLightbox\" data-wp-on--load=\"callbacks.setButtonStyles\" data-wp-on--pointerdown=\"actions.preloadImage\" data-wp-on--pointerenter=\"actions.preloadImageWithDelay\" data-wp-on--pointerleave=\"actions.cancelPreload\" data-wp-on-window--resize=\"callbacks.setButtonStyles\" sizes=\"auto, (max-width: 962px) 100vw, 962px\" src=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_ROCAnalysis.png\" alt=\"\" class=\"wp-image-1405\" style=\"width:768px\" srcset=\"https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_ROCAnalysis.png 962w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_ROCAnalysis-300x264.png 300w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_ROCAnalysis-768x675.png 768w, https:\/\/paul-regnier.fr\/wp-content\/uploads\/2023\/11\/PICAFlow_ROCAnalysis-14x12.png 14w\" \/><button\n\t\t\tclass=\"lightbox-trigger\"\n\t\t\ttype=\"button\"\n\t\t\taria-haspopup=\"dialog\"\n\t\t\tdata-wp-bind--aria-label=\"state.thisImage.triggerButtonAriaLabel\"\n\t\t\tdata-wp-init=\"callbacks.initTriggerButton\"\n\t\t\tdata-wp-on--click=\"actions.showLightbox\"\n\t\t\tdata-wp-style--right=\"state.thisImage.buttonRight\"\n\t\t\tdata-wp-style--top=\"state.thisImage.buttonTop\"\n\t\t>\n\t\t\t<svg xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"12\" height=\"12\" fill=\"none\" viewbox=\"0 0 12 12\">\n\t\t\t\t<path fill=\"#fff\" d=\"M2 0a2 2 0 0 0-2 2v2h1.5V2a.5.5 0 0 1 .5-.5h2V0H2Zm2 10.5H2a.5.5 0 0 1-.5-.5V8H0v2a2 2 0 0 0 2 2h2v-1.5ZM8 12v-1.5h2a.5.5 0 0 0 .5-.5V8H12v2a2 2 0 0 1-2 2H8Zm2-12a2 2 0 0 1 2 2v2h-1.5V2a.5.5 0 0 0-.5-.5H8V0h2Z\" \/>\n\t\t\t<\/svg>\n\t\t<\/button><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\"><strong>Figure 24 &#8211; ROC curve showing the impact of a given cluster abundance on selected groups classification (click on the image to open in fullscreen).<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"10.3\" style=\"font-size:22px\"><strong>10.3) I do not have metadata available for my dataset<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Even if one does not have any available metadata related to its dataset, <code>PICAFlow<\/code> still offers the possibility to perform some minimal operations without any imported metadata, and notably export boxplots showing the abundances of each cluster of interest, split by any feature of interest present within the dataset.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">To do this, users can run the following commands:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Define variables that will not be used anymore to `NULL`\n<\/em>\nmaxCellsPerPlot_value = maxCellsPerPlot_value\nsampleNamesColumn_value = sampleNamesColumn_value\nmergeColumn_value = mergeColumn_value\ncolumnsToKeepData_value = columnsToKeepData_value\ncolumnsToKeepMetadata_value = columnsToKeepMetadata_value\nn_neighbors_value_UMAP = n_neighbors_value_UMAP\nmin_dist_value_UMAP = min_dist_value_UMAP\noutliersToRemove_value = outliersToRemove_value\nclustersNb_value = clustersNb_value\n\n<em># Generate a new column with the actual group of each sample\n<\/em>\ndata$group = gsub(\"Group-(.+)_Sample-(.+)\", \"\\\\1\", rownames(data))\n\n<em># Define which columns to plot and which columns to use as features to split by\n<\/em>\ncolumnsToPlot_value = colnames(data)&#91;grep(prefix_value, colnames(data))]\nfeaturesToUse_value = c(\"group\")\n\n<em># Construct boxplots without UMAP overlays for the previously defined values\n<\/em>\n<strong>constructPlots(<\/strong>\n    <strong>data<\/strong> = data,\n    <strong>columnsToPlot<\/strong> = columnsToPlot_value,\n    <strong>features<\/strong> = featuresToUse_value,\n    <strong>plotUMAPOverlays<\/strong> = FALSE\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"11\" style=\"font-size:28px\"><strong>11) Parameters export<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Finally, we want to export all the parameters used throughout the <code>PICAFlow<\/code> workflow to keep a trace of what we did:<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:16px\"><code><em># Export all the parameters used throughout the analysis\n<\/em>\nparametersToExport = <strong>list(<\/strong>\n    <strong>datasetToUse_value<\/strong> = datasetToUse_value,\n    <strong>seed_value<\/strong> = seed_value,\n    <strong>parametersToKeep<\/strong> = parametersToKeep,\n    <strong>customNames<\/strong> = customNames,\n    <strong>warpSet_value<\/strong> = warpSet_value,\n    <strong>gaussNorm_value<\/strong> = gaussNorm_value,\n    <strong>max.lms.sequence<\/strong> = max.lms.sequence,\n    <strong>samplesToDelete<\/strong> = samplesToDelete,\n    <strong>parametersToKeepFinal<\/strong> = parametersToKeepFinal,\n    <strong>downsampleMinEvents_value<\/strong> = downsampleMinEvents_value,\n    <strong>maxCellsNb<\/strong> = maxCellsNb,\n    <strong>estimateThreshold_value<\/strong> = estimateThreshold_value, \n    <strong>trainingDatasetProportion_value<\/strong> = trainingDatasetProportion_value,\n    <strong>min_dist_value<\/strong> = min_dist_value,\n    <strong>n_neighbors_value<\/strong> = n_neighbors_value,\n    <strong>subsetDownsampled_value<\/strong> = subsetDownsampled_value,\n    <strong>clusterMinPercentage_value<\/strong> = clusterMinPercentage_value,\n    <strong>metricToUse_value<\/strong> = metricToUse_value,\n    <strong>cutoff_value<\/strong> = cutoff_value,\n    <strong>parametersToUseThresholds<\/strong> = parametersToUseThresholds,\n    <strong>folder_value<\/strong> = folder_value,\n    <strong>prefix_value<\/strong> = prefix_value,\n    <strong>maxCellsPerPlot_value<\/strong> = maxCellsPerPlot_value,\n    <strong>sampleNamesColumn_value<\/strong> = sampleNamesColumn_value,\n    <strong>mergeColumn_value<\/strong> = mergeColumn_value,\n    <strong>columnsToKeepData_value<\/strong> = columnsToKeepData_value,\n    <strong>columnsToKeepMetadata_value<\/strong> = columnsToKeepMetadata_value,\n    <strong>n_neighbors_value_UMAP<\/strong> = n_neighbors_value_UMAP,\n    <strong>min_dist_value_UMAP<\/strong> = min_dist_value_UMAP,\n    <strong>outliersToRemove_value<\/strong> = outliersToRemove_value,\n    <strong>clustersNb_value<\/strong> = clustersNb_value,\n    <strong>columnsToPlot_value<\/strong> = columnsToPlot_value,\n    <strong>featuresToUse_value<\/strong> = featuresToUse_value\n<strong>)<\/strong>\n\n<strong>exportParametersUsed(<\/strong>\n    <strong>parametersToExport<\/strong> = parametersToExport\n<strong>)<\/strong><\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">The output text file will be saved in the <code>output<\/code> directory.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"12\" style=\"font-size:28px\"><strong>12) Acknowledgements<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">Please note that the following R packages are used by <code>PICAFlow<\/code> to perform its own operations:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\">Parallelized operations and optimized loops: <code>parallel<\/code>, <code>doSNOW<\/code> and <code>foreach<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">Flow\/mass cytometry-related data handling: <code>flowCore<\/code>, <code>flowStats<\/code> and <code>flowWorkspace<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">Plots and heatmaps generation: <code>ggplot2<\/code>, <code>ggcyto<\/code>, <code>gplots<\/code>, <code>plotly<\/code> and <code>cowplot<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">UMAP computing: <code>uwot<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">ROC analyses: <code>ROCit<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">Text-based data import: <code>utils<\/code> and <code>readxl<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">Addition of interactivity to data transformation: <code>shiny<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">Exceptions capture: <code>attempt<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">Density peaks identification: <code>pracma<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">Cell clustering: <code>class<\/code>, <code>FlowSOM<\/code> and <code>FastPG<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">Progress bars: <code>tcltk<\/code><\/li>\n\n\n\n<li style=\"font-size:16px\">Miscellaneous operations: <code>Biobase<\/code>, <code>matrixStats<\/code>, <code>methods<\/code>, <code>rlang<\/code> and <code>stats<\/code><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"13\" style=\"font-size:28px\"><strong>13) Citation<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\" style=\"font-size:16px\">If you used <code>PICAFlow<\/code> in your analyses, please cite our Application Note published in the <em>Bioinformatics Advances<\/em> journal:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li style=\"font-size:16px\"><a href=\"https:\/\/academic.oup.com\/bioinformaticsadvances\/article\/3\/1\/vbad177\/7458441\" target=\"_blank\" rel=\"noreferrer noopener\">Bioinformatics Advances<\/a><\/li>\n\n\n\n<li style=\"font-size:16px\">PubMed: <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/38089110\/\" target=\"_blank\" rel=\"noreferrer noopener\">38089110<\/a><\/li>\n\n\n\n<li style=\"font-size:16px\">DOI: <a href=\"https:\/\/doi.org\/10.1093\/bioadv\/vbad177\" target=\"_blank\" rel=\"noreferrer noopener\">10.1093\/bioadv\/vbad177<\/a><\/li>\n<\/ul>","protected":false},"excerpt":{"rendered":"<p>PICAFlow: Pipeline for Integrative and Comprehensive Analysis of flow\/mass cytometry data Warning: this tutorial is only available in English, even if you choose the French language at the bottom of the screen. Thank you for your understanding. PICAFlow is a R package allowing to process cytometry data from raw FCS [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"page-templates\/template-fullwidth.php","meta":{"_themeisle_gutenberg_block_has_review":false,"footnotes":""},"class_list":["post-395","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/pages\/395","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/comments?post=395"}],"version-history":[{"count":641,"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/pages\/395\/revisions"}],"predecessor-version":[{"id":2023,"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/pages\/395\/revisions\/2023"}],"wp:attachment":[{"href":"https:\/\/paul-regnier.fr\/en_gb\/wp-json\/wp\/v2\/media?parent=395"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}