🥷 Getting started with SCIMAP
| # import scimap
import scimap as sm
|
The sample data provided is generated by the mcmicro pipeline. Let's begin by exploring how to seamlessly import mcmicro pipeline output into Scimap with a straightforward single-line command.
In this section of the tutorial, we'll be working with the single-cell data you've obtained, located within the directory "scimapExampleData/quantification". This dataset is rich with information, encompassing not just the expressions of various markers in individual cells but also valuable metadata such as XY coordinates and cell sizes.
| # Provide the path to the single-cell feature table. Note that you can specify multiple paths as a list.
feature_table_path = ["/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/quantification/exemplar-001--unmicst_cell.csv"]
# create the annData object
adata = sm.pp.mcmicro_to_scimap(feature_table_path)
|
Loading exemplar-001--unmicst_cell.csv
Exploring contents of the annData object
| adata
# The dataset contains 11201 cells and 9 markers
# The obs sections contains the meta data related to each cell
|
AnnData object with n_obs × n_vars = 11201 × 9
obs: 'X_centroid', 'Y_centroid', 'Area', 'MajorAxisLength', 'MinorAxisLength', 'Eccentricity', 'Solidity', 'Extent', 'Orientation', 'CellID', 'imageid'
uns: 'all_markers'
layers: 'log'
| # print the contents of the expression matrix.
# By default on import, scimap applies a log transformation, you could set `log=False` if it is already log transformed
adata.X
|
array([[7.0737187 , 5.09012558, 6.56401223, ..., 6.56296306, 5.32982432,
6.75017136],
[7.02618545, 5.1830041 , 6.67364145, ..., 6.93618427, 5.96229317,
6.80626408],
[7.13129875, 5.06030083, 6.67480083, ..., 7.19693192, 5.66577023,
6.8434742 ],
...,
[7.07756235, 5.31643791, 6.67671156, ..., 6.89449634, 5.64170442,
6.80208784],
[6.90256944, 5.31600484, 6.31401645, ..., 6.7339149 , 5.45318208,
6.77120455],
[7.06054074, 5.45066285, 6.32567977, ..., 6.34348493, 5.14151735,
6.73978032]])
| # print metadata
adata.obs
|
|
X_centroid |
Y_centroid |
Area |
MajorAxisLength |
MinorAxisLength |
Eccentricity |
Solidity |
Extent |
Orientation |
CellID |
imageid |
exemplar-001--unmicst_cell_1 |
1767.692308 |
257.290598 |
117 |
12.402944 |
12.006487 |
0.250814 |
0.959016 |
0.812500 |
-1.146733 |
1 |
exemplar-001--unmicst_cell |
exemplar-001--unmicst_cell_2 |
1107.173913 |
665.869565 |
92 |
11.874070 |
9.982065 |
0.541562 |
0.948454 |
0.696970 |
-0.435290 |
2 |
exemplar-001--unmicst_cell |
exemplar-001--unmicst_cell_3 |
1116.413793 |
671.068966 |
58 |
10.113305 |
7.629922 |
0.656364 |
0.878788 |
0.585859 |
1.221658 |
3 |
exemplar-001--unmicst_cell |
exemplar-001--unmicst_cell_4 |
982.728625 |
677.029740 |
269 |
25.433196 |
15.183300 |
0.802251 |
0.835404 |
0.531621 |
-0.705293 |
4 |
exemplar-001--unmicst_cell |
exemplar-001--unmicst_cell_5 |
1141.071078 |
680.125000 |
408 |
26.604670 |
19.759781 |
0.669604 |
0.937931 |
0.739130 |
-0.711002 |
5 |
exemplar-001--unmicst_cell |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
... |
exemplar-001--unmicst_cell_11197 |
1270.593750 |
3131.731250 |
160 |
19.414487 |
11.039993 |
0.822582 |
0.893855 |
0.701754 |
-1.364872 |
11197 |
exemplar-001--unmicst_cell |
exemplar-001--unmicst_cell_11198 |
1177.349057 |
3130.839623 |
106 |
14.080819 |
10.062622 |
0.699499 |
0.876033 |
0.706667 |
1.478579 |
11198 |
exemplar-001--unmicst_cell |
exemplar-001--unmicst_cell_11199 |
1255.904762 |
3131.285714 |
105 |
15.623503 |
9.143181 |
0.810875 |
0.882353 |
0.596591 |
-1.065479 |
11199 |
exemplar-001--unmicst_cell |
exemplar-001--unmicst_cell_11200 |
1354.448276 |
3131.810345 |
58 |
9.779089 |
7.836216 |
0.598231 |
0.878788 |
0.725000 |
-1.072712 |
11200 |
exemplar-001--unmicst_cell |
exemplar-001--unmicst_cell_11201 |
1125.662500 |
3133.100000 |
80 |
14.311249 |
7.225347 |
0.863194 |
0.941176 |
0.714286 |
1.370610 |
11201 |
exemplar-001--unmicst_cell |
11201 rows × 11 columns
| # lastly lets print the markers
adata.var
|
|
ELANE |
CD57 |
CD45 |
CD11B |
SMA |
CD16 |
ECAD |
FOXP3 |
NCAM |
Upon inspection, you will note the following characteristics about the data: it has undergone log transformation, is devoid of DNA channels, and lacks background channels. By default, the data has been processed to address key considerations: it has been log-transformed, DNA channels have been removed, and background channels have been excluded. However, these preprocessing steps can be customized or overridden as needed.
🧐 What if your data was not generated using the MCMICRO pipeline?
If you're working with data not produced by mcmicro, it's crucial to consult the documentation for each function used in this series of tutorials. Each function operates under specific assumptions about its parameters. For instance, all spatial functions assume that XY coordinates are located in the 'X_centroid' and 'Y_centroid' columns. If your dataset organizes this information differently, you'll need to specify your column names when running the function.
| # import packages
import anndata as ad
import pandas as pd
|
After importing the necessary packages, you can create an AnnData object as shown below.
| # Load data
data = pd.read_csv ('path/to/counts_table.csv') # Counts matrix
meta = pd.read_csv ('path/to/meta_data.csv') # Meta data like x and y coordinates
# combine the data and metadata file to generate the AnnData object
adata = ad.AnnData (data)
adata.obs = meta
|
When manually importing data without using the built-in function that automates the process, it is crucial to follow four essential steps to ensure compatibility and effective data management for further analysis:
-
Ensure Unique Image Identification: Incorporate a column named imageid
within the metadata to assign a unique identifier to each image, especially when handling datasets comprising multiple images. This facilitates the organization and retrieval of specific image data within a larger dataset.
-
Preserve Raw Data: Store the unprocessed raw data in adata.raw
. This practice retains the original state of the data for reference or baseline comparisons before any preprocessing steps are applied.
-
Log Transformation Layer: Generate a layer named log
to hold log-transformed data. Log transformation is a critical step for normalizing data and mitigating the impact of large-scale differences across measurements, enhancing the analysis's robustness and interpretability.
-
Marker Annotation: Maintain a record of all markers present in the images, ensuring their order matches the layers within the image data. This annotation is instrumental when loading images to precisely identify which layer corresponds to each marker, thus streamlining the analysis process by clarifying the relationship between image layers and their respective biological markers.
By adhering to these guidelines, researchers can ensure their manually imported datasets are well-organized and primed for comprehensive analysis, leveraging the full capabilities of their analytical platforms.
| # preserve raw data
adata.raw = adata
# log transform data
adata = sm.pp.log1p(adata)
# Add marker annotation
adata.uns['all_markers'] = ['list', 'of', 'markers']
|
Save the annData object
Once the AnnData object is created, it becomes the central data structure for all subsequent analyses. This is highly beneficial because it encapsulates all results within the object, eliminating the need to manage multiple related files. You can conveniently share this single file with collaborators, allowing them to continue the analysis seamlessly or resume from where you left off. Furthermore, numerous single-cell analysis tools, such as Scanpy, are built upon this framework. This integration allows for the straightforward application of functions from various packages without the necessity of data reformatting to suit each tool's specific requirements.
| # Save the results
adata.write('/Users/aj/Dropbox (Partners HealthCare)/nirmal lab/resources/exemplarData/scimapExampleData/scimapExampleData.h5ad')
|