🥷 Getting started with SCIMAP

# import scimap
import scimap as sm

Running SCIMAP  1.3.12

The sample data provided is generated by the mcmicro pipeline. Let's begin by exploring how to seamlessly import mcmicro pipeline output into Scimap with a straightforward single-line command.

In this section of the tutorial, we'll be working with the single-cell data you've obtained, located within the directory "scimapExampleData/quantification". This dataset is rich with information, encompassing not just the expressions of various markers in individual cells but also valuable metadata such as XY coordinates and cell sizes.

# Provide the path to the single-cell feature table. Note that you can specify multiple paths as a list.
feature_table_path = ["/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/quantification/exemplar-001--unmicst_cell.csv"]

# create the annData object
adata = sm.pp.mcmicro_to_scimap(feature_table_path)

Loading exemplar-001--unmicst_cell.csv

Exploring contents of the annData object

adata

# The dataset contains 11201 cells and 9 markers
# The obs sections contains the meta data related to each cell

AnnData object with n_obs × n_vars = 11201 × 9
    obs: 'X_centroid', 'Y_centroid', 'Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'Solidity', 'Extent', 'Orientation', 'CellID', 'imageid'
    uns: 'all_markers'
    layers: 'log'

# print the contents of the expression matrix. 
# By default on import, scimap applies a log transformation, you could set `log=False` if it is already log transformed
adata.X

array([[7.0737187 , 5.09012558, 6.56401223, ..., 6.56296306, 5.32982432,
        6.75017136],
       [7.02618545, 5.1830041 , 6.67364145, ..., 6.93618427, 5.96229317,
        6.80626408],
       [7.13129875, 5.06030083, 6.67480083, ..., 7.19693192, 5.66577023,
        6.8434742 ],
       ...,
       [7.07756235, 5.31643791, 6.67671156, ..., 6.89449634, 5.64170442,
        6.80208784],
       [6.90256944, 5.31600484, 6.31401645, ..., 6.7339149 , 5.45318208,
        6.77120455],
       [7.06054074, 5.45066285, 6.32567977, ..., 6.34348493, 5.14151735,
        6.73978032]])

# print metadata
adata.obs

	X_centroid	Y_centroid	Area	MajorAxisLength	MinorAxisLength	Eccentricity	Solidity	Extent	Orientation	CellID	imageid
exemplar-001--unmicst_cell_1	1767.692308	257.290598	117	12.402944	12.006487	0.250814	0.959016	0.812500	-1.146733	1	exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_2	1107.173913	665.869565	92	11.874070	9.982065	0.541562	0.948454	0.696970	-0.435290	2	exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_3	1116.413793	671.068966	58	10.113305	7.629922	0.656364	0.878788	0.585859	1.221658	3	exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_4	982.728625	677.029740	269	25.433196	15.183300	0.802251	0.835404	0.531621	-0.705293	4	exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_5	1141.071078	680.125000	408	26.604670	19.759781	0.669604	0.937931	0.739130	-0.711002	5	exemplar-001--unmicst_cell
...	...	...	...	...	...	...	...	...	...	...	...
exemplar-001--unmicst_cell_11197	1270.593750	3131.731250	160	19.414487	11.039993	0.822582	0.893855	0.701754	-1.364872	11197	exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_11198	1177.349057	3130.839623	106	14.080819	10.062622	0.699499	0.876033	0.706667	1.478579	11198	exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_11199	1255.904762	3131.285714	105	15.623503	9.143181	0.810875	0.882353	0.596591	-1.065479	11199	exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_11200	1354.448276	3131.810345	58	9.779089	7.836216	0.598231	0.878788	0.725000	-1.072712	11200	exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_11201	1125.662500	3133.100000	80	14.311249	7.225347	0.863194	0.941176	0.714286	1.370610	11201	exemplar-001--unmicst_cell

11201 rows × 11 columns

# lastly lets print the markers
adata.var


ELANE
CD57
CD45
CD11B
SMA
CD16
ECAD
FOXP3
NCAM

Upon inspection, you will note the following characteristics about the data: it has undergone log transformation, is devoid of DNA channels, and lacks background channels. By default, the data has been processed to address key considerations: it has been log-transformed, DNA channels have been removed, and background channels have been excluded. However, these preprocessing steps can be customized or overridden as needed.

🧐 What if your data was not generated using the MCMICRO pipeline?

If you're working with data not produced by mcmicro, it's crucial to consult the documentation for each function used in this series of tutorials. Each function operates under specific assumptions about its parameters. For instance, all spatial functions assume that XY coordinates are located in the 'X_centroid' and 'Y_centroid' columns. If your dataset organizes this information differently, you'll need to specify your column names when running the function.

# import packages
import anndata as ad
import pandas as pd

After importing the necessary packages, you can create an AnnData object as shown below.

# Load data
data = pd.read_csv ('path/to/counts_table.csv') # Counts matrix
meta = pd.read_csv ('path/to/meta_data.csv') # Meta data like x and y coordinates 

# combine the data and metadata file to generate the AnnData object
adata = ad.AnnData (data)
adata.obs = meta

When manually importing data without using the built-in function that automates the process, it is crucial to follow four essential steps to ensure compatibility and effective data management for further analysis:

Ensure Unique Image Identification: Incorporate a column named imageid within the metadata to assign a unique identifier to each image, especially when handling datasets comprising multiple images. This facilitates the organization and retrieval of specific image data within a larger dataset.
Preserve Raw Data: Store the unprocessed raw data in adata.raw. This practice retains the original state of the data for reference or baseline comparisons before any preprocessing steps are applied.
Log Transformation Layer: Generate a layer named log to hold log-transformed data. Log transformation is a critical step for normalizing data and mitigating the impact of large-scale differences across measurements, enhancing the analysis's robustness and interpretability.
Marker Annotation: Maintain a record of all markers present in the images, ensuring their order matches the layers within the image data. This annotation is instrumental when loading images to precisely identify which layer corresponds to each marker, thus streamlining the analysis process by clarifying the relationship between image layers and their respective biological markers.

By adhering to these guidelines, researchers can ensure their manually imported datasets are well-organized and primed for comprehensive analysis, leveraging the full capabilities of their analytical platforms.

# preserve raw data
adata.raw = adata

# log transform data
adata = sm.pp.log1p(adata)

# Add marker annotation
adata.uns['all_markers'] = ['list', 'of', 'markers']

Save the annData object

Once the AnnData object is created, it becomes the central data structure for all subsequent analyses. This is highly beneficial because it encapsulates all results within the object, eliminating the need to manage multiple related files. You can conveniently share this single file with collaborators, allowing them to continue the analysis seamlessly or resume from where you left off. Furthermore, numerous single-cell analysis tools, such as Scanpy, are built upon this framework. This integration allows for the straightforward application of functions from various packages without the necessity of data reformatting to suit each tool's specific requirements.

# Save the results
adata.write('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/scimapExampleData.h5ad')