Skip to content

🥷 Getting started with SCIMAP

1
2
# import scimap
import scimap as sm
Running SCIMAP  1.3.12

The sample data provided is generated by the mcmicro pipeline. Let's begin by exploring how to seamlessly import mcmicro pipeline output into Scimap with a straightforward single-line command.

In this section of the tutorial, we'll be working with the single-cell data you've obtained, located within the directory "scimapExampleData/quantification". This dataset is rich with information, encompassing not just the expressions of various markers in individual cells but also valuable metadata such as XY coordinates and cell sizes.

1
2
3
4
5
# Provide the path to the single-cell feature table. Note that you can specify multiple paths as a list.
feature_table_path = ["/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/quantification/exemplar-001--unmicst_cell.csv"]

# create the annData object
adata = sm.pp.mcmicro_to_scimap(feature_table_path)
Loading exemplar-001--unmicst_cell.csv

Exploring contents of the annData object

1
2
3
4
adata

# The dataset contains 11201 cells and 9 markers
# The obs sections contains the meta data related to each cell
AnnData object with n_obs × n_vars = 11201 × 9
    obs: 'X_centroid', 'Y_centroid', 'Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'Solidity', 'Extent', 'Orientation', 'CellID', 'imageid'
    uns: 'all_markers'
    layers: 'log'
1
2
3
# print the contents of the expression matrix. 
# By default on import, scimap applies a log transformation, you could set `log=False` if it is already log transformed
adata.X
array([[7.0737187 , 5.09012558, 6.56401223, ..., 6.56296306, 5.32982432,
        6.75017136],
       [7.02618545, 5.1830041 , 6.67364145, ..., 6.93618427, 5.96229317,
        6.80626408],
       [7.13129875, 5.06030083, 6.67480083, ..., 7.19693192, 5.66577023,
        6.8434742 ],
       ...,
       [7.07756235, 5.31643791, 6.67671156, ..., 6.89449634, 5.64170442,
        6.80208784],
       [6.90256944, 5.31600484, 6.31401645, ..., 6.7339149 , 5.45318208,
        6.77120455],
       [7.06054074, 5.45066285, 6.32567977, ..., 6.34348493, 5.14151735,
        6.73978032]])
1
2
# print metadata
adata.obs
X_centroid Y_centroid Area MajorAxisLength MinorAxisLength Eccentricity Solidity Extent Orientation CellID imageid
exemplar-001--unmicst_cell_1 1767.692308 257.290598 117 12.402944 12.006487 0.250814 0.959016 0.812500 -1.146733 1 exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_2 1107.173913 665.869565 92 11.874070 9.982065 0.541562 0.948454 0.696970 -0.435290 2 exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_3 1116.413793 671.068966 58 10.113305 7.629922 0.656364 0.878788 0.585859 1.221658 3 exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_4 982.728625 677.029740 269 25.433196 15.183300 0.802251 0.835404 0.531621 -0.705293 4 exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_5 1141.071078 680.125000 408 26.604670 19.759781 0.669604 0.937931 0.739130 -0.711002 5 exemplar-001--unmicst_cell
... ... ... ... ... ... ... ... ... ... ... ...
exemplar-001--unmicst_cell_11197 1270.593750 3131.731250 160 19.414487 11.039993 0.822582 0.893855 0.701754 -1.364872 11197 exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_11198 1177.349057 3130.839623 106 14.080819 10.062622 0.699499 0.876033 0.706667 1.478579 11198 exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_11199 1255.904762 3131.285714 105 15.623503 9.143181 0.810875 0.882353 0.596591 -1.065479 11199 exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_11200 1354.448276 3131.810345 58 9.779089 7.836216 0.598231 0.878788 0.725000 -1.072712 11200 exemplar-001--unmicst_cell
exemplar-001--unmicst_cell_11201 1125.662500 3133.100000 80 14.311249 7.225347 0.863194 0.941176 0.714286 1.370610 11201 exemplar-001--unmicst_cell

11201 rows × 11 columns

1
2
# lastly lets print the markers
adata.var
ELANE
CD57
CD45
CD11B
SMA
CD16
ECAD
FOXP3
NCAM

Upon inspection, you will note the following characteristics about the data: it has undergone log transformation, is devoid of DNA channels, and lacks background channels. By default, the data has been processed to address key considerations: it has been log-transformed, DNA channels have been removed, and background channels have been excluded. However, these preprocessing steps can be customized or overridden as needed.


🧐 What if your data was not generated using the MCMICRO pipeline?

If you're working with data not produced by mcmicro, it's crucial to consult the documentation for each function used in this series of tutorials. Each function operates under specific assumptions about its parameters. For instance, all spatial functions assume that XY coordinates are located in the 'X_centroid' and 'Y_centroid' columns. If your dataset organizes this information differently, you'll need to specify your column names when running the function.

1
2
3
# import packages
import anndata as ad
import pandas as pd

After importing the necessary packages, you can create an AnnData object as shown below.

1
2
3
4
5
6
7
# Load data
data = pd.read_csv ('path/to/counts_table.csv') # Counts matrix
meta = pd.read_csv ('path/to/meta_data.csv') # Meta data like x and y coordinates 

# combine the data and metadata file to generate the AnnData object
adata = ad.AnnData (data)
adata.obs = meta

When manually importing data without using the built-in function that automates the process, it is crucial to follow four essential steps to ensure compatibility and effective data management for further analysis:

  1. Ensure Unique Image Identification: Incorporate a column named imageid within the metadata to assign a unique identifier to each image, especially when handling datasets comprising multiple images. This facilitates the organization and retrieval of specific image data within a larger dataset.

  2. Preserve Raw Data: Store the unprocessed raw data in adata.raw. This practice retains the original state of the data for reference or baseline comparisons before any preprocessing steps are applied.

  3. Log Transformation Layer: Generate a layer named log to hold log-transformed data. Log transformation is a critical step for normalizing data and mitigating the impact of large-scale differences across measurements, enhancing the analysis's robustness and interpretability.

  4. Marker Annotation: Maintain a record of all markers present in the images, ensuring their order matches the layers within the image data. This annotation is instrumental when loading images to precisely identify which layer corresponds to each marker, thus streamlining the analysis process by clarifying the relationship between image layers and their respective biological markers.

By adhering to these guidelines, researchers can ensure their manually imported datasets are well-organized and primed for comprehensive analysis, leveraging the full capabilities of their analytical platforms.

1
2
3
4
5
6
7
8
# preserve raw data
adata.raw = adata

# log transform data
adata = sm.pp.log1p(adata)

# Add marker annotation
adata.uns['all_markers'] = ['list', 'of', 'markers']

Save the annData object

Once the AnnData object is created, it becomes the central data structure for all subsequent analyses. This is highly beneficial because it encapsulates all results within the object, eliminating the need to manage multiple related files. You can conveniently share this single file with collaborators, allowing them to continue the analysis seamlessly or resume from where you left off. Furthermore, numerous single-cell analysis tools, such as Scanpy, are built upon this framework. This integration allows for the straightforward application of functions from various packages without the necessity of data reformatting to suit each tool's specific requirements.

1
2
# Save the results
adata.write('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/scimapExampleData.h5ad')
1

1