Skip to content

Sm.pl.spatial distance

Short Description

sm.pl.spatial_distance: The function allows users to visualize the average shortest distance between phenotypes of interest. Run sm.tl.spatial_distance before running this function.

Function

spatial_distance(adata, spatial_distance='spatial_distance', phenotype='phenotype', imageid='imageid', log=False, method='heatmap', heatmap_summarize=True, heatmap_na_color='grey', heatmap_cmap='vlag_r', heatmap_row_cluster=False, heatmap_col_cluster=False, heatmap_standard_scale=0, distance_from=None, distance_to=None, x_axis=None, y_axis=None, facet_by=None, plot_type=None, return_data=False, subset_col=None, subset_value=None, **kwargs)

Parameters:

Name Type Description Default
adata

AnnData object

required
spatial_distance

string, optional In order to locate the spatial_distance data within the AnnData object please provide the output label/columnname of sm.tl.spatial_distance function.

'spatial_distance'
phenotype

string, required Column name of the column containing the phenotype information. It could also be any categorical assignment given to single cells.

'phenotype'
imageid

string, optional Column name of the column containing the image id.

'imageid'
log

bool, optional Convert distance to log scale.

False
method

string, optional Three options are available. 1) heatmap - generates a heatmap of average shortest distance between all phenotypes. 2) numeric - can be used to generate boxplot, violin plot etc between a given set of phenotypes. 3) distribution - can be used to generate distribution plots between a given set of phenotypes.

'heatmap'
heatmap_summarize

bool, optional In the event multiple images are present in the dataset, True allows to calculate the average across all the images.

True
heatmap_na_color

string, optional Color for NA values within the heatmap.

'grey'
heatmap_cmap

string, optional Color map to use for continous variables. Can be a name or a Colormap instance (e.g. 'magma', 'viridis').

'vlag_r'
heatmap_row_cluster

bool, optional Cluster Rows.

False
heatmap_col_cluster

bool, optional Cluster Columns.

False
heatmap_standard_scale

int, optional Either 0 (rows) or 1 (columns). Whether or not to standardize that dimension, meaning for each row or column, subtract the minimum and divide each by its maximum.

0
distance_from

string, optional In the event of using method = 'numeric' or 'distribution', this argument is required. Pass a phenotype of interest. If distance_from is provided and distance_to is not provided, the function will plot the average distance from the phenotype of interest to all phenotypes present within the dataset.

None
distance_to

string, optional In the event of using method = 'numeric' or 'distribution', this argument is required. Pass a phenotype of interest. The function will plot the average shortest between two phenotypes of interest (distance_from and distance_to).

None
x_axis

string, optional In the event of using method = 'numeric' or 'distribution', this argument is required. This determines the elements present in the x-axis of the resultant plot. Allowed arguments are: 'group', 'distance', 'imageid'.

None
y_axis

string, optional In the event of using method = 'numeric' or 'distribution', this argument is required. This determines the elements present in the y-axis of the numeric plot and if the user uses the distribution plot this argument is used to overlaying multiple categories within the same distribution plot. Allowed arguments are: 'group', 'distance', 'imageid'.

None
facet_by

string, optional In the event of using method = 'numeric' or 'distribution', this argument can be used to generate sub-plots. Allowed arguments are: 'group', 'imageid'.

None
plot_type

string, optional In the event of using method = 'numeric' or 'distribution', this argument is required. For numeric plot, the following options are available: “strip”, “swarm”, “box”, “violin”, “boxen”, “point”, “bar”, or “count”. For distribution plot, the following options are available: “hist”, “kde”, “ecdf”. The default for numeric plot is 'boxen'. The default for distribution plot is 'kde`.

None
subset_col

string, optional If the users wants to consider only a subset of observations while plotting, this argument in conjuction to subset_value can be used. For example, in the event of a multi-image dataset, the sm.tl.spatial_distance was run on all images but the user is interested in plotting only a subset of images. Pass the name of the column which contains the categories to be subsetted.

None
subset_value

list, optional If the users wants to consider only a subset of observations while plotting, this argument in conjuction to subset_col can be used. Pass a list of the categories to be subsetted.

None
**kwargs

dict Are passed to sns.clustermap. Pass other parameters that works with sns.clustermap, sns.catplot or sns.displot e.g. linecolor='black'.

{}

Returns:

Type Description

Heatmap or Numeric Plot or Distribution Plot.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
    # summary heatmap
    sm.pl.spatial_distance (adata)

    # Heatmap without summarizing the individual images
    sm.pl.spatial_distance (adata, heatmap_summarize=False,
    imageid='ImageId')

    # Numeric plot of shortest distance of phenotypes
    # from tumor cells
    sm.pl.spatial_distance (adata, method='numeric',
    distance_from='Tumor CD30+',imageid='ImageId')

    # Distribution plot of shortest distance of phenotypes
    # from tumor cells
    sm.pl.spatial_distance (adata, method='distribution',
    distance_from='Tumor CD30+',imageid='ImageId',
    x_axis="distance", y_axis="imageid", plot_type="kde")

    # Numeric plot of shortest distance of phenotypes from
    # tumor cells to M2 Macrophages
    sm.pl.spatial_distance (adata, method='numeric',
    distance_from='Tumor CD30+',distance_to = 'M2 Macrophages',
    imageid='ImageId')

    # Distribution plot of shortest distance of phenotypes from
    # tumor cells to M2 Macrophages
    sm.pl.spatial_distance (adata, method='distribution',
    distance_from='Tumor CD30+',distance_to = 'M2 Macrophages',
    imageid='ImageId')
Source code in scimap/plotting/_spatial_distance.py
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
def spatial_distance (adata, spatial_distance='spatial_distance',phenotype='phenotype',imageid='imageid',log=False,
                      method='heatmap',heatmap_summarize=True,heatmap_na_color='grey',heatmap_cmap='vlag_r',
                      heatmap_row_cluster=False,heatmap_col_cluster=False,heatmap_standard_scale=0,
                      distance_from=None,distance_to=None,x_axis = None,y_axis = None,facet_by = None,plot_type = None,
                      return_data = False, subset_col=None, subset_value=None,
                      **kwargs):
    """
Parameters:

    adata : AnnData object

    spatial_distance : string, optional
        In order to locate the spatial_distance data within the AnnData object please provide the output
        label/columnname of `sm.tl.spatial_distance` function.

    phenotype : string, required
        Column name of the column containing the phenotype information.
        It could also be any categorical assignment given to single cells.

    imageid : string, optional
        Column name of the column containing the image id.

    log : bool, optional
        Convert distance to log scale.

    method : string, optional
        Three options are available.
        1) heatmap - generates a heatmap of average shortest distance between all phenotypes.
        2) numeric - can be used to generate boxplot, violin plot etc between a given set of phenotypes.
        3) distribution - can be used to generate distribution plots between a given set of phenotypes.

    heatmap_summarize : bool, optional
        In the event multiple images are present in the dataset, True allows to calculate the
        average across all the images.

    heatmap_na_color : string, optional
        Color for NA values within the heatmap.

    heatmap_cmap : string, optional
        Color map to use for continous variables.
        Can be a name or a Colormap instance (e.g. 'magma', 'viridis').

    heatmap_row_cluster : bool, optional
        Cluster Rows.

    heatmap_col_cluster : bool, optional
        Cluster Columns.

    heatmap_standard_scale : int, optional
        Either 0 (rows) or 1 (columns). Whether or not to standardize that dimension,
        meaning for each row or column, subtract the minimum and divide each by its maximum.

    distance_from : string, optional
        In the event of using method = 'numeric' or 'distribution', this argument is required.
        Pass a phenotype of interest. If distance_from is provided and distance_to is not provided,
        the function will plot the average distance from the phenotype of interest to all
        phenotypes present within the dataset.

    distance_to : string, optional
        In the event of using method = 'numeric' or 'distribution', this argument is required.
        Pass a phenotype of interest. The function will plot the average shortest between two phenotypes of
        interest (distance_from and distance_to).

    x_axis : string, optional
        In the event of using method = 'numeric' or 'distribution', this argument is required.
        This determines the elements present in the x-axis of the resultant plot.
        Allowed arguments are: 'group', 'distance', 'imageid'.

    y_axis : string, optional
        In the event of using method = 'numeric' or 'distribution', this argument is required.
        This determines the elements present in the y-axis of the numeric plot and if the user uses the distribution
        plot this argument is used to overlaying multiple categories within the same distribution plot.
        Allowed arguments are: 'group', 'distance', 'imageid'.

    facet_by : string, optional
         In the event of using method = 'numeric' or 'distribution', this argument can be used to
         generate sub-plots. Allowed arguments are: 'group', 'imageid'.

    plot_type : string, optional
        In the event of using method = 'numeric' or 'distribution', this argument is required.
        For `numeric` plot, the following options are available: “strip”, “swarm”, “box”, “violin”, “boxen”, “point”, “bar”, or “count”.
        For `distribution` plot, the following options are available: “hist”, “kde”, “ecdf”.
        The default for `numeric` plot is 'boxen'.
        The default for `distribution` plot is 'kde`.

    subset_col : string, optional
        If the users wants to consider only a subset of observations while plotting, this argument in conjuction to
        `subset_value` can be used.
        For example, in the event of a multi-image dataset, the `sm.tl.spatial_distance` was run on all images
        but the user is interested in plotting only a subset of images. Pass the name of the column which contains
        the categories to be subsetted.

    subset_value : list, optional
        If the users wants to consider only a subset of observations while plotting, this argument in conjuction to
        `subset_col` can be used. Pass a list of the categories to be subsetted.

    **kwargs : dict
        Are passed to sns.clustermap. Pass other parameters that works with `sns.clustermap`, `sns.catplot` or `sns.displot`
        e.g. `linecolor='black'`.

Returns:
    Heatmap or Numeric Plot or Distribution Plot.

Example:
```python
    # summary heatmap
    sm.pl.spatial_distance (adata)

    # Heatmap without summarizing the individual images
    sm.pl.spatial_distance (adata, heatmap_summarize=False,
    imageid='ImageId')

    # Numeric plot of shortest distance of phenotypes
    # from tumor cells
    sm.pl.spatial_distance (adata, method='numeric',
    distance_from='Tumor CD30+',imageid='ImageId')

    # Distribution plot of shortest distance of phenotypes
    # from tumor cells
    sm.pl.spatial_distance (adata, method='distribution',
    distance_from='Tumor CD30+',imageid='ImageId',
    x_axis="distance", y_axis="imageid", plot_type="kde")

    # Numeric plot of shortest distance of phenotypes from
    # tumor cells to M2 Macrophages
    sm.pl.spatial_distance (adata, method='numeric',
    distance_from='Tumor CD30+',distance_to = 'M2 Macrophages',
    imageid='ImageId')

    # Distribution plot of shortest distance of phenotypes from
    # tumor cells to M2 Macrophages
    sm.pl.spatial_distance (adata, method='distribution',
    distance_from='Tumor CD30+',distance_to = 'M2 Macrophages',
    imageid='ImageId')
```
    """


    # set color for heatmap
    cmap_updated = matplotlib.cm.get_cmap(heatmap_cmap)
    cmap_updated.set_bad(color=heatmap_na_color)


    # Copy the spatial_distance results from anndata object
    try:
        diatance_map = adata.uns[spatial_distance].copy()
    except KeyError:
        raise ValueError('spatial_distance not found- Please run sm.tl.spatial_distance first')

    # subset the data if user requests
    if subset_col is not None:
        if isinstance(subset_value, str):
            subset_value = [subset_value]
        # find the cell names to be subsetted out
        obs = adata.obs[[subset_col]]
        cells_to_subset = obs[obs[subset_col].isin(subset_value)].index

        # subset the diatance_map
        diatance_map = diatance_map.loc[diatance_map.index.intersection(cells_to_subset)]
        #diatance_map = diatance_map.loc[cells_to_subset]


    # Convert distance to log scale if user requests
    if log is True:
        diatance_map = np.log1p(diatance_map)

    # Method
    if method=='heatmap':
        if heatmap_summarize is True:
            # create the necessary data
            data = pd.DataFrame({'phenotype': adata.obs[phenotype]})
            data = pd.merge(data, diatance_map, how='outer',left_index=True, right_index=True) # merge with the distance map
            k = data.groupby(['phenotype']).mean() # collapse the whole dataset into mean expression
            d = k[k.index]
        else:
            # create new naming scheme for the phenotypes
            non_summary = pd.DataFrame({'imageid': adata.obs[imageid], 'phenotype': adata.obs[phenotype]})
            non_summary['imageid'] = non_summary['imageid'].astype(str) # convert the column to string
            non_summary['phenotype'] = non_summary['phenotype'].astype(str) # convert the column to string
            non_summary['image_phenotype'] = non_summary['imageid'].str.cat(non_summary['phenotype'],sep="_")
            # Merge distance map with phenotype
            data = pd.DataFrame(non_summary[['image_phenotype']])
            data = pd.merge(data, diatance_map, how='outer',left_index=True, right_index=True)
            k = data.groupby(['image_phenotype']).mean()
            d = k.sort_index(axis=1)
        # Generate the heatmap
        mask = d.isnull() # identify the NAN's for masking 
        d = d.fillna(0) # replace nan's with 0 so that clustering will work
        # Heatmap
        sns.clustermap(d, cmap=heatmap_cmap, row_cluster=heatmap_row_cluster,
                       col_cluster=heatmap_col_cluster, mask=mask,
                       standard_scale=heatmap_standard_scale, **kwargs)
    else:

        # condition-1
        if distance_from is None and distance_to is None:
            raise ValueError('Please include distance_from and/or distance_to parameters to use this method')

        # condition-2
        if distance_from is None and distance_to is not None:
            raise ValueError('Please `distance_from` parameters to use this method')

        # condition-3
        if distance_to is not None:
            # convert input to list if needed
            if isinstance(distance_to, str):
                distance_to = [distance_to]

        # Start
        pheno_df = pd.DataFrame({'imageid': adata.obs[imageid], 'phenotype': adata.obs[phenotype]}) #image id and phenotype
        data = pd.merge(pheno_df, diatance_map, how='outer',left_index=True, right_index=True) # merge with the distance map
        data = data[data['phenotype'] == distance_from] # subset the pheno of interest

        if distance_to is not None:
            data = data[distance_to] # drop columns that are not requested in distance_to
        else:
            data = data.drop(['phenotype','imageid'], axis=1) # drop the phenotype column before stacking

        d = data.stack().reset_index() # collapse everything to one column
        d.columns = ['cellid', 'group', 'distance']
        d = pd.merge(d, pheno_df, left_on='cellid', right_index=True) # bring back the imageid and phenotype

        # Convert columns to str
        for col in ['imageid', 'group','phenotype']:
            d[col] = d[col].astype(str)

        # Convert columns to categorical so that it drops unused categories
        for col in ['imageid', 'group','phenotype']:
            d[col] = d[col].astype('category')

        # re arrange the order based on from and to list provided
        if distance_to is not None:
            d['group'] = d['group'].cat.reorder_categories(distance_to)
            d = d.sort_values('group')

        # Plotting
        if method=='numeric':
            if x_axis is None and y_axis is None and facet_by is None and plot_type is None:
                sns.catplot(data=d, x="distance", y="group", col="imageid", kind="boxen", **kwargs)
            else:
                sns.catplot(data=d, x=x_axis, y=y_axis, col=facet_by, kind=plot_type, **kwargs)

        if method=='distribution':
            if x_axis is None and y_axis is None and facet_by is None and plot_type is None:
                sns.displot(data=d, x="distance", hue="imageid",  col="group", kind="kde", **kwargs)
            else:
                sns.displot(data=d, x=x_axis, hue=y_axis, col=facet_by, kind=plot_type,**kwargs)

    # return
    if return_data is True:
        return d