👽 Cell phenotyping using a Hierarchical Prior Knowledge Table

# import packages
import scimap as sm
import anndata as ad
import pandas as pd

Running SCIMAP  1.3.14

# Load the data that we saved in the last tutorial - (Prepare data for SCIMAP)
adata = ad.read_h5ad('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/scimapExampleData.h5ad')

This method requires more work but is significantly more sensitive and scalable compared to clustering-based approaches.

To run the method, you'll need two main components:

To successfully execute the method, you will need:

A .csv file containing manual gates. Should manual gates be absent, the algorithm will attempt to adjust the data by fitting two Gaussian distributions. Nonetheless, employing manual gating is recommended due to its heightened sensitivity.
A .csv file outlining a gating workflow strategy.

The execution of the algorithm is structured into three primary steps:

Gate Identification: Utilize sm.pl.gate_finder to identify the gates.
Data Rescaling: Apply sm.pp.rescale to adjust the data based on the identified gates.
Phenotyping: Process the rescaled data using sm.tl.phenotype to run the phenotyping algorithm.

Step 1: Define manual gates

%gui qt

This method will launch a napari window displaying several layers, each representing a different gate. You can toggle these layers on and off with the goal of identifying the gate that most accurately captures the positive cells. It's crucial to zoom and pan through the image to ensure the selected gate encompasses all positive cells throughout the image. Bear in mind that achieving 100% accuracy might not be feasible, so a certain margin of error may need to be accepted. This process is iterative, meaning adjustments may be necessary even after initial phenotyping and analysis. Manual gating must be conducted independently for each marker and each image.

image_path = '/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/registration/exemplar-001.ome.tif'
marker_of_interest = 'CD45'

sm.pl.gate_finder (image_path, adata, marker_of_interest, from_gate = 5, to_gate = 9, increment = 0.1, point_size=6)

# Do the above step for each marker and then create a `.csv` file like below:
pd.read_csv('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/manual_gates.csv')

	marker	exemplar-001--unmicst_cell
0	ELANE	7.80
1	CD57	8.90
2	CD45	6.40
3	CD11B	7.60
4	SMA	7.50
5	CD16	6.50
6	ECAD	7.35
7	FOXP3	7.40
8	NCAM	7.00

You'll observe that the first column lists the markers present in the dataset, while the second column specifies the gate, named after the specific image's ID found in adata.obs['imageid']. This is especially useful when dealing with datasets containing multiple images, as it allows for a distinct column for each image.

Although visual gating has proven to be the most sensitive method for us, you can also apply single and bi-marker gating approaches, similar to FACS, to assist in determining a threshold.

# single 
sm.pl.distPlot(adata, layer='log', markers=['CD45','ECAD','FOXP3'], ncols=3, fontsize=6, figsize=(5,2))

png

We also use bimarker gating to identify a gate

sm.pl.densityPlot2D(adata, markerA='SMA', markerB='CD45', layer='log')

/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/mpl_scatter_density/generic_density_artist.py:77: RuntimeWarning:

All-NaN slice encountered

/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/mpl_scatter_density/generic_density_artist.py:82: RuntimeWarning:

All-NaN slice encountered

png

Step 2: Rescale the data

Here, we input a manual_gates.csv file into the gate parameter. This file contains gates determined visually with the help of the sm.pl.gate_finder function. For the markers specified in the manual_gates.csv file, the function will adjust the data so that cells exhibiting expression levels above the gate threshold are classified as positive for that marker, while those with expression levels below the threshold are deemed negative.

For markers not listed in the manual_gates.csv file, the function will employ an automatic approach to identify suitable gates by applying a Gaussian mixture model algorithm to the data.

# Load the manual gates and rescale the data based on the gates
manual_gate = pd.read_csv('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/manual_gates.csv')
adata = sm.pp.rescale (adata, gate=manual_gate)

Scaling Image exemplar-001--unmicst_cell
Scaling ELANE
Scaling CD57
Scaling CD45
Scaling CD11B
Scaling SMA
Scaling CD16
Scaling ECAD
Scaling FOXP3
Scaling NCAM


/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/scimap/preprocessing/rescale.py:105: FutureWarning:

Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`

Step 3: Run the phenotyping algorithm

# load the phenotyping workflow
phenotype = pd.read_csv('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/phenotype_workflow.csv')
# view the table:
phenotype.style.format(na_rep='')

	Unnamed: 0	Unnamed: 1	CD57	CD45	CD11B	SMA	CD16	ECAD	FOXP3
0	all	ECAD+						pos
1	all	Immune		pos
2	all	SMA+				pos
3	Immune	NK cells	allpos		neg		allpos
4	Immune	Other myeloid cells			pos
5	Immune	Treg							pos
6	Other myeloid cells	Dendritic cells	allneg				allneg

As it can be seen from the table above,

The first column has to contain the cell that are to be classified.
The second column indicates the phenotype a particular cell will be assigned if it satifies the conditions in the row.
Column three and onward represent protein markers. If the protein marker is known to be expressed for that cell type, then it is denoted by either pos, allpos. If the protein marker is known to not express for a cell type it can be denoted by neg, allneg. If the protein marker is irrelevant or uncertain to express for a cell type, then it is left empty. anypos and anyneg are options for using a set of markers and if any of the marker is positive or negative, the cell type is denoted accordingly.

To give users maximum flexibility in identifying desired cell types, we have implemented various classification arguments as described above for strategical classification. They include

allpos
allneg
anypos
anyneg
pos
neg

pos : "Pos" looks for cells positive for a given marker. If multiple markers are annotated as pos, all must be positive to denote the cell type. For example, a Regulatory T cell can be defined as CD3+CD4+FOXP3+ by passing pos to each marker. If one or more markers don't meet the criteria (e.g. CD4-), the program will classify it as Likely-Regulatory-T cell, pending user confirmation. This is useful in cases of technical artifacts or when cell types (such as cancer cells) are defined by marker loss (e.g. T-cell Lymphomas).

neg : Same as pos but looks for negativity of the defined markers.

allpos : "Allpos" requires all defined markers to be positive. Unlike pos, it doesn't classify cells as Likely-cellType, but strictly annotates cells positive for all defined markers.

allneg : Same as allpos but looks for negativity of the defined markers.

anypos : "Anypos" requires only one of the defined markers to be positive. For example, to define macrophages, a cell could be designated as such if any of CD68, CD163, or CD206 is positive.

anyneg : Same as anyneg but looks for negativity of the defined markers.

adata = sm.tl.phenotype_cells (adata, phenotype=phenotype, label="phenotype") 

Phenotyping ECAD+
Phenotyping Immune
Phenotyping SMA+
-- Subsetting Immune
Phenotyping NK cells
Phenotyping Other myeloid cells
Phenotyping Treg
-- Subsetting Other myeloid cells
Phenotyping Dendritic cells
Consolidating the phenotypes across all groups


/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/scimap/tools/phenotype_cells.py:174: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/scimap/tools/phenotype_cells.py:290: FutureWarning:

DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.

/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/scimap/tools/phenotype_cells.py:290: FutureWarning:

Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`

# Summary of the phenotyping
adata.obs['phenotype'].value_counts()

phenotype
Immune                 4746
ECAD+                  3015
Unknown                2278
SMA+                    602
Treg                    282
Other myeloid cells     193
NK cells                 66
Dendritic cells          19
Name: count, dtype: int64

Visualisation of the Results

# Heatmap (here values over 0.5 are considered positive as we have scaled the data)
sm.pl.heatmap(adata, groupBy='phenotype', standardScale=None, figsize=(5,3), showPrevalence=True, vmin=0, vmax=1)

/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/scimap/plotting/heatmap.py:312: UserWarning:

This figure includes Axes that are not compatible with tight_layout, so results might be incorrect.

png

# view the phenotype on a UMAP
adata = sm.tl.umap(adata)

/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/umap/umap_.py:1943: UserWarning:

n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism.

sm.pl.umap(adata, color=['phenotype'])

/Users/aj/miniconda3/envs/scimap/lib/python3.10/site-packages/scimap/plotting/umap.py:267: UserWarning:

No data for colormapping provided via 'c'. Parameters 'cmap' will be ignored

png

# spatial scatter plot to look at the distribution of cells
sm.pl.spatial_scatterPlot (adata, colorBy = ['phenotype'],figsize=(2.75,2), s=0.3, fontsize=5, catCmap='Set1')

png

# View the results by overlaying it on the raw image
image_path = '/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/registration/exemplar-001.ome.tif'

# view in napari
sm.pl.image_viewer(image_path=image_path, 
                   adata=adata, 
                   overlay='phenotype', 
                   point_size=10,
                   point_color='white')

# Note that if your AnnotatedData object (adata) includes multiple images, 
# you can use the `subset` parameter to specify the image name found in the `imageid` column, 
# enabling the loading of just that particular image.

Save Results

# Save the results
adata.write('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/scimapExampleData.h5ad')