Skip to content

spatial_distance

Short Description

sm.pl.spatial_distance: This function enables the visualization of the average shortest distances between selected phenotypes or cell types, offering insights into spatial relationships within biological samples. To accurately generate these visual representations, it's essential to first compute spatial distances using sm.tl.spatial_distance. This preparatory step ensures the data necessary for creating comprehensive heatmaps, numeric comparisons, and distribution plots is available, facilitating a deeper understanding of spatial patterning and interactions among cell populations.

Function

spatial_distance(adata, spatial_distance='spatial_distance', phenotype='phenotype', imageid='imageid', log=False, method='heatmap', heatmap_summarize=True, heatmap_na_color='grey', heatmap_cmap='vlag_r', heatmap_row_cluster=False, heatmap_col_cluster=False, heatmap_standard_scale=0, distance_from=None, distance_to=None, x_axis=None, y_axis=None, facet_by=None, plot_type=None, return_data=False, subset_col=None, subset_value=None, fileName='spatial_distance.pdf', saveDir=None, **kwargs)

Parameters:

Name Type Description Default
adata AnnData

The annotated data matrix with spatial distance calculations.

required
spatial_distance str

Key in adata.uns where spatial distance data is stored, typically the output of sm.tl.spatial_distance.

'spatial_distance'
phenotype str

Column in adata.obs containing phenotype or cell type annotations.

'phenotype'
imageid str

Column in adata.obs identifying different images or samples.

'imageid'
log bool

If True, applies log transformation to the distance data.

False
method str

Visualization method: 'heatmap', 'numeric', or 'distribution'.

'heatmap'
heatmap_summarize bool

If True, summarizes distances across all images or samples for the heatmap.

True
heatmap_na_color str

Color for NA values in the heatmap.

'grey'
heatmap_cmap str

Colormap for the heatmap.

'vlag_r'
heatmap_row_cluster, heatmap_col_cluster (bool

If True, clusters rows or columns in the heatmap.

required
heatmap_standard_scale int

Standardizes rows (0) or columns (1) in the heatmap.

0
distance_from, distance_to (str

Phenotypes of interest for distance calculation in 'numeric' or 'distribution' plots.

required
x_axis, y_axis (str

Axes labels for 'numeric' or 'distribution' plots.

required
facet_by str

Categorizes plots into subplots based on this column.

None
plot_type str

For 'numeric' plots: options include 'box', 'violin', etc. For 'distribution' plots: 'hist', 'kde', etc.

None
subset_col str

Column name for subsetting data before plotting.

None
subset_value list

Values in subset_col to include in the plot.

None
fileName str

Name of the file to save the plot. Relevant only if saveDir is not None.

'spatial_distance.pdf'
saveDir str

Directory to save the generated plot. If None, the plot is not saved.

None
**kwargs

Additional keyword arguments for plotting functions.

{}

Returns:

Type Description

Plot and dataFrame (matplotlib, pandasDF): If return_data is True, returns the data frame used for plotting; otherwise, displays the plot.

Example
1
2
3
4
5
6
7
8
9
# Generate a heatmap of spatial distances
sm.pl.spatial_distance(adata, method='heatmap', phenotype='cell_type', imageid='sample_id')

# Numeric plot showing distance from one phenotype to all others
sm.pl.spatial_distance(adata, method='numeric', distance_from='Tumor', phenotype='cell_type', plot_type='boxen')

# Distribution plot comparing distances between two specific phenotypes
sm.pl.spatial_distance(adata, method='distribution', distance_from='Tumor', distance_to='Stroma',
                 plot_type='kde', x_axis='distance', y_axis='group')
Source code in scimap/plotting/spatial_distance.py
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
def spatial_distance(
    adata,
    spatial_distance='spatial_distance',
    phenotype='phenotype',
    imageid='imageid',
    log=False,
    method='heatmap',
    heatmap_summarize=True,
    heatmap_na_color='grey',
    heatmap_cmap='vlag_r',
    heatmap_row_cluster=False,
    heatmap_col_cluster=False,
    heatmap_standard_scale=0,
    distance_from=None,
    distance_to=None,
    x_axis=None,
    y_axis=None,
    facet_by=None,
    plot_type=None,
    return_data=False,
    subset_col=None,
    subset_value=None,
    fileName='spatial_distance.pdf',
    saveDir=None,
    **kwargs,
):
    """
    Parameters:
            adata (anndata.AnnData):
                The annotated data matrix with spatial distance calculations.

            spatial_distance (str, optional):
                Key in `adata.uns` where spatial distance data is stored, typically the output of `sm.tl.spatial_distance`.

            phenotype (str):
                Column in `adata.obs` containing phenotype or cell type annotations.

            imageid (str, optional):
                Column in `adata.obs` identifying different images or samples.

            log (bool, optional):
                If True, applies log transformation to the distance data.

            method (str, optional):
                Visualization method: 'heatmap', 'numeric', or 'distribution'.

            heatmap_summarize (bool, optional):
                If True, summarizes distances across all images or samples for the heatmap.

            heatmap_na_color (str, optional):
                Color for NA values in the heatmap.

            heatmap_cmap (str, optional):
                Colormap for the heatmap.

            heatmap_row_cluster, heatmap_col_cluster (bool, optional):
                If True, clusters rows or columns in the heatmap.

            heatmap_standard_scale (int, optional):
                Standardizes rows (0) or columns (1) in the heatmap.

            distance_from, distance_to (str, optional):
                Phenotypes of interest for distance calculation in 'numeric' or 'distribution' plots.

            x_axis, y_axis (str, optional):
                Axes labels for 'numeric' or 'distribution' plots.

            facet_by (str, optional):
                Categorizes plots into subplots based on this column.

            plot_type (str, optional):
                For 'numeric' plots: options include 'box', 'violin', etc. For 'distribution' plots: 'hist', 'kde', etc.

            subset_col (str, optional):
                Column name for subsetting data before plotting.

            subset_value (list, optional):
                Values in `subset_col` to include in the plot.

            fileName (str, optional):
                Name of the file to save the plot. Relevant only if `saveDir` is not None.

            saveDir (str, optional):
                Directory to save the generated plot. If None, the plot is not saved.

            **kwargs:
                Additional keyword arguments for plotting functions.

    Returns:
        Plot and dataFrame (matplotlib, pandasDF):
            If `return_data` is True, returns the data frame used for plotting; otherwise, displays the plot.

    Example:
        ```python

        # Generate a heatmap of spatial distances
        sm.pl.spatial_distance(adata, method='heatmap', phenotype='cell_type', imageid='sample_id')

        # Numeric plot showing distance from one phenotype to all others
        sm.pl.spatial_distance(adata, method='numeric', distance_from='Tumor', phenotype='cell_type', plot_type='boxen')

        # Distribution plot comparing distances between two specific phenotypes
        sm.pl.spatial_distance(adata, method='distribution', distance_from='Tumor', distance_to='Stroma',
                         plot_type='kde', x_axis='distance', y_axis='group')

        ```
    """

    # set color for heatmap
    # cmap_updated = matplotlib.cm.get_cmap(heatmap_cmap)
    cmap_updated = matplotlib.colormaps[heatmap_cmap]
    cmap_updated.set_bad(color=heatmap_na_color)

    # Copy the spatial_distance results from anndata object
    try:
        diatance_map = adata.uns[spatial_distance].copy()
    except KeyError:
        raise ValueError(
            'spatial_distance not found- Please run sm.tl.spatial_distance first'
        )

    # subset the data if user requests
    if subset_col is not None:
        if isinstance(subset_value, str):
            subset_value = [subset_value]
        # find the cell names to be subsetted out
        obs = adata.obs[[subset_col]]
        cells_to_subset = obs[obs[subset_col].isin(subset_value)].index

        # subset the diatance_map
        diatance_map = diatance_map.loc[
            diatance_map.index.intersection(cells_to_subset)
        ]
        # diatance_map = diatance_map.loc[cells_to_subset]

    # Convert distance to log scale if user requests
    if log is True:
        diatance_map = np.log1p(diatance_map)

    # Method
    if method == 'heatmap':
        if heatmap_summarize is True:
            # create the necessary data
            data = pd.DataFrame({'phenotype': adata.obs[phenotype]})
            data = pd.merge(
                data, diatance_map, how='outer', left_index=True, right_index=True
            )  # merge with the distance map
            k = data.groupby(
                ['phenotype'], observed=False
            ).mean()  # collapse the whole dataset into mean expression
            d = k[k.index]
        else:
            # create new naming scheme for the phenotypes
            non_summary = pd.DataFrame(
                {'imageid': adata.obs[imageid], 'phenotype': adata.obs[phenotype]}
            )
            non_summary['imageid'] = non_summary['imageid'].astype(
                str
            )  # convert the column to string
            non_summary['phenotype'] = non_summary['phenotype'].astype(
                str
            )  # convert the column to string
            non_summary['image_phenotype'] = non_summary['imageid'].str.cat(
                non_summary['phenotype'], sep="_"
            )
            # Merge distance map with phenotype
            data = pd.DataFrame(non_summary[['image_phenotype']])
            data = pd.merge(
                data, diatance_map, how='outer', left_index=True, right_index=True
            )
            k = data.groupby(['image_phenotype'], observed=False).mean()
            d = k.sort_index(axis=1)
        # Generate the heatmap
        mask = d.isnull()  # identify the NAN's for masking
        d = d.fillna(0)  # replace nan's with 0 so that clustering will work
        # Heatmap
        plot = sns.clustermap(
            d,
            cmap=heatmap_cmap,
            row_cluster=heatmap_row_cluster,
            col_cluster=heatmap_col_cluster,
            mask=mask,
            standard_scale=heatmap_standard_scale,
            **kwargs,
        )
    else:

        # condition-1
        if distance_from is None and distance_to is None:
            raise ValueError(
                'Please include distance_from and/or distance_to parameters to use this method'
            )

        # condition-2
        if distance_from is None and distance_to is not None:
            raise ValueError('Please `distance_from` parameters to use this method')

        # condition-3
        if distance_to is not None:
            # convert input to list if needed
            if isinstance(distance_to, str):
                distance_to = [distance_to]

        # Start
        pheno_df = pd.DataFrame(
            {'imageid': adata.obs[imageid], 'phenotype': adata.obs[phenotype]}
        )  # image id and phenotype
        data = pd.merge(
            pheno_df, diatance_map, how='outer', left_index=True, right_index=True
        )  # merge with the distance map
        data = data[data['phenotype'] == distance_from]  # subset the pheno of interest

        if distance_to is not None:
            data = data[
                distance_to
            ]  # drop columns that are not requested in distance_to
        else:
            data = data.drop(
                ['phenotype', 'imageid'], axis=1
            )  # drop the phenotype column before stacking

        d = data.stack().reset_index()  # collapse everything to one column
        d.columns = ['cellid', 'group', 'distance']
        d = pd.merge(
            d, pheno_df, left_on='cellid', right_index=True
        )  # bring back the imageid and phenotype

        # Convert columns to str
        for col in ['imageid', 'group', 'phenotype']:
            d[col] = d[col].astype(str)

        # Convert columns to categorical so that it drops unused categories
        for col in ['imageid', 'group', 'phenotype']:
            d[col] = d[col].astype('category')

        # re arrange the order based on from and to list provided
        if distance_to is not None:
            d['group'] = d['group'].cat.reorder_categories(distance_to)
            d = d.sort_values('group')

        # Plotting
        if method == 'numeric':
            if (
                x_axis is None
                and y_axis is None
                and facet_by is None
                and plot_type is None
            ):
                plot = sns.catplot(
                    data=d,
                    x="distance",
                    y="group",
                    col="imageid",
                    kind="boxen",
                    **kwargs,
                )
            else:
                plot = sns.catplot(
                    data=d, x=x_axis, y=y_axis, col=facet_by, kind=plot_type, **kwargs
                )

        if method == 'distribution':
            if (
                x_axis is None
                and y_axis is None
                and facet_by is None
                and plot_type is None
            ):
                plot = sns.displot(
                    data=d,
                    x="distance",
                    hue="imageid",
                    col="group",
                    kind="kde",
                    **kwargs,
                )
            else:
                plot = sns.displot(
                    data=d, x=x_axis, hue=y_axis, col=facet_by, kind=plot_type, **kwargs
                )

    # Saving the figure if saveDir and fileName are provided
    if saveDir:
        if not os.path.exists(saveDir):
            os.makedirs(saveDir)
        full_path = os.path.join(saveDir, fileName)
        plot.savefig(full_path, dpi=300)
        plt.close()
        print(f"Saved plot to {full_path}")
    else:
        plt.show()

    # return
    if return_data is True:
        return d