Tutorial 1: 10X Visium DLPFC dataset ==================================== .. raw:: html
In this tutorial, we show how to apply scSTADE to identify spatial domains on 10X Visium data. As a example, we analyse the 151672 sample of the dorsolateral prefrontal cortex (DLPFC) dataset.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. raw:: html
The source code package is freely available at https://github.com/cuiyaxuan/scSTADE/tree/master. The datasets used in this study can be found at https://drive.google.com/drive/folders/1H-ymfCqlDR1wpMRX-bCewAjG5nOrIF51?usp=sharing.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. raw:: html
First, cd /home/.../scSTADE/scSTADE_Cluster_Functions/
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: ipython3 # Using python virtual environment with conda. Please create a Pytorch environment, install Pytorch and some other packages, such as "numpy","pandas", "scikit-learn" and "scanpy". See the requirements.txt file for an overview of the packages in the environment we used to produce our results. Alternatively, you can install the environment dependencies in the following sequence to minimize environment conflicts. conda create -n pipeline source activate pipeline conda search r-base conda install r-base=4.2.0 conda install python=3.8 conda install conda-forge::gmp conda install conda-forge::r-seurat==4.4.0 conda install conda-forge::r-hdf5r conda install bioconda::bioconductor-sc3 conda install conda-forge::pot conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia conda search -c conda-forge r-mclust conda install -n pipeline -c conda-forge r-mclust=5.4.10 pip install scanpy pip install anndata==0.8.0 pip install pandas==1.4.2 conda install -c conda-forge rpy2=3.5.1 pip install scikit-learn==1.1.1 pip install scipy==1.8.1 pip install tqdm==4.64.0 pip install scikit-misc .. raw:: html
to install some R packages.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code:: ipython3 import rpy2.robjects as robjects robjects.r(''' install.packages('foreach') install.packages('doParallel') ''') .. code:: ipython3 import sys sys.path.append("/.../scSTADE-master/scSTADE_Cluster_Functions") # Input the path. from scSTADE import scSTADE import os import torch import pandas as pd import scanpy as sc from sklearn import metrics import multiprocessing as mp import dropout file = '/home/cuiyaxuan/spatialLIBD/151672' # Input the data path for the nonlinear model. count='151672_filtered_feature_bc_matrix.h5' # Input the file name for the nonlinear model. adata = sc.read_visium(file, count_file=count, load_images=True) dropout.setup_seed(41) dropout_rate=dropout.dropout(adata) print(dropout_rate) # Data quality assessment. device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') # cpu or gpu n_clusters = 5 # Users can input either the default number of clusters or the estimated number of clusters. import rpy2.robjects as robjects data_path = '/home/cuiyaxuan/spatialLIBD/151672/151672_filtered_feature_bc_matrix.h5' # Input the data path and file name for the nonlinear model. position_path = '/home/cuiyaxuan/spatialLIBD/151672/spatial/tissue_positions_list.csv' # Input the data path and position file name for the nonlinear model. ARI_compare='/home/cuiyaxuan/spatialLIBD/151672/cluster_labels_151672.csv' # Input the ground truth data path and file name for comparing with the clustering results robjects.globalenv['data_path'] = robjects.vectors.StrVector([data_path]) robjects.globalenv['position_path'] = robjects.vectors.StrVector([position_path]) robjects.globalenv['ARI_compare'] = robjects.vectors.StrVector([ARI_compare]) robjects.globalenv['n_clusters'] = robjects.IntVector([n_clusters]) #The ARI accuracy and clustering labels have been generated and saved as CSV files. if dropout_rate>0.85: for i in [4000, 4500, 5000]: file_fold = file adata = sc.read_visium(file_fold, count_file = count, load_images=True) adata.var_names_make_unique() model = scSTADE(adata,device=device,n_top_genes=i) adata = model.train() radius = 50 tool = 'mclust' # mclust, leiden, and louvain from utils import clustering if tool == 'mclust': clustering(adata, n_clusters, radius=radius, method=tool, refinement=True) # For DLPFC dataset, we use optional refinement step. elif tool in ['leiden', 'louvain']: clustering(adata, n_clusters, radius=radius, method=tool, start=0.1, end=2.0, increment=0.01, refinement=False) adata.obs['domain'] adata.obs['domain'].to_csv(f"label_{i}.csv") robjects.r(''' library(SingleCellExperiment) library(SC3) library("Seurat") library("dplyr") library("hdf5r") library(foreach) library(doParallel) print(data_path) print(position_path) print(ARI_compare) print(n_clusters) source('Cri4.R') hc1= Read10X_h5(data_path) #### to your path and project name feature<-select_feature(hc1,4000,500) detectCores() cl <- makeCluster(3) # call 3 cpu cores k=n_clusters # k represent the number of spatial domains. parLapply(cl,1:3,feature=feature,k=k,pearson_metric) stopCluster(cl) tissue_local=read.csv(position_path,row.names = 1,header = FALSE) adj_matrix=construct_adj_matrix(feature[[1]],tissue_local) write.table(adj_matrix,file="adj_matrix.txt",sep=" ",quote=TRUE) detectCores() cl <- makeCluster(3) # call 3 cpu cores parLapply(cl,1:3,K=k,spectral_nei) stopCluster(cl) source('GNN_Tradition_6.R') source('label_ARI.R') true_label=read.csv(ARI_compare,row.names = 1) conlabel(hc1,k,true_label,compare=T) #### compare=T is compare ARI with the ground truth, compare=F is no compare ARI with the ground truth. Writing ARI results to a CSV file. ''') else: file_fold = file adata = sc.read_visium(file_fold, count_file= count, load_images=True) adata.var_names_make_unique() model = scSTADE(adata,device=device,n_top_genes=5000) adata = model.train() radius = 50 tool = 'mclust' # mclust, leiden, and louvain from utils import clustering if tool == 'mclust': clustering(adata, n_clusters, radius=radius, method=tool, refinement=True) # For DLPFC dataset, we use optional refinement step. elif tool in ['leiden', 'louvain']: clustering(adata, n_clusters, radius=radius, method=tool, start=0.1, end=2.0, increment=0.01, refinement=False) adata.obs['domain'] adata.obs['domain'].to_csv(f"label.csv") .. parsed-literal:: 0.8968134495641344 0.8968134495641344 Begin to train ST data... .. parsed-literal:: 0%| | 0/500 [00:00