Skip to content

umap

Short Description

sm.pl.umap: This function facilitates the creation of scatter plots based on UMAP (Uniform Manifold Approximation and Projection) embeddings stored in an AnnData object. It offers extensive customization options for visualizing high-dimensional data reduced to two dimensions, including the ability to color points by gene expression levels, metadata annotations, or other categorical or continuous variables. Users can leverage this function to explore and interpret complex datasets visually, enhancing the understanding of underlying biological variations and relationships.

Function

umap(adata, color=None, layer=None, use_raw=False, log=False, label='umap', cmap='vlag', palette=None, alpha=0.8, figsize=(5, 5), s=None, ncols=None, tight_layout=False, return_data=False, save_figure=None, **kwargs)

Parameters:

Name Type Description Default
adata AnnData

The annotated data matrix.

required
color list

List of keys from adata.obs.columns or adata.var.index to color the plot. Allows multiple keys for facetted plotting.

None
layer str

Specifies the AnnData layer to use for UMAP calculations. Defaults to using adata.X.

None
use_raw bool

If True, uses adata.raw.X for coloring the plot, useful for visualizing gene expression on UMAP.

False
log bool

Applies log transformation (np.log1p) to the data before plotting. Useful for gene expression data.

False
label str

Key in adata.obsm where UMAP coordinates are stored.

'umap'
cmap str

Colormap for continuous variables. Supports matplotlib colormap names and objects.

'vlag'
palette dict

Specific colors for different categories as a dictionary mapping from categories to colors.

None
alpha float

Transparency level of the points. Ranges from 0 (transparent) to 1 (opaque).

0.8
figsize tuple

Figure size specified as (width, height) in inches.

(5, 5)
s int

Size of the points in the plot.

None
ncols int

Number of columns for facetted plotting.

None
tight_layout bool

Adjusts subplot params for a tight layout.

False
return_data bool

If True, returns the DataFrame containing data used for plotting instead of displaying the plot.

False
save_figure str

Path and filename to save the figure. File extension determines the format (e.g., .pdf, .png).

None
**kwargs

Additional keyword arguments passed to matplotlib plot function.

{}

Returns:

Name Type Description
Plot matplotlib

Optionally returns the data used for plotting if return_data=True.

Example
1
2
3
4
5
6
7
8
# Basic UMAP visualization with default settings
sm.pl.umap(adata, color='cell_type')

# UMAP visualization with log transformation and custom colormap
sm.pl.umap(adata, color='gene_expression', log=True, cmap='coolwarm')

# Facetted UMAP plotting with custom point size and saved figure
sm.pl.umap(adata, color=['cell_type', 'condition'], s=100, figsize=(10, 5), save_figure='/path/to/umap_plot.png')
Source code in scimap/plotting/umap.py
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
def umap (adata, 
          color=None, 
          layer=None, 
          use_raw=False, 
          log=False, 
          label='umap',
          cmap='vlag', 
          palette=None, 
          alpha=0.8, 
          figsize=(5, 5), 
          s=None, 
          ncols=None, 
          tight_layout=False, 
          return_data=False, 
          save_figure=None, **kwargs):
    """
Parameters:
        adata (anndata.AnnData):  
            The annotated data matrix.

        color (list, optional):  
            List of keys from `adata.obs.columns` or `adata.var.index` to color the plot. 
            Allows multiple keys for facetted plotting.

        layer (str, optional):  
            Specifies the AnnData layer to use for UMAP calculations. Defaults to using `adata.X`.

        use_raw (bool, optional):  
            If True, uses `adata.raw.X` for coloring the plot, useful for visualizing gene expression on UMAP.

        log (bool, optional):  
            Applies log transformation (`np.log1p`) to the data before plotting. Useful for gene expression data.

        label (str, optional):  
            Key in `adata.obsm` where UMAP coordinates are stored.

        cmap (str, optional):  
            Colormap for continuous variables. Supports matplotlib colormap names and objects.

        palette (dict, optional):  
            Specific colors for different categories as a dictionary mapping from categories to colors.

        alpha (float, optional):  
            Transparency level of the points. Ranges from 0 (transparent) to 1 (opaque).

        figsize (tuple, optional):  
            Figure size specified as (width, height) in inches.

        s (int, optional):  
            Size of the points in the plot.

        ncols (int, optional):  
            Number of columns for facetted plotting.

        tight_layout (bool, optional):  
            Adjusts subplot params for a tight layout.

        return_data (bool, optional):  
            If True, returns the DataFrame containing data used for plotting instead of displaying the plot.

        save_figure (str, optional):  
            Path and filename to save the figure. File extension determines the format (e.g., `.pdf`, `.png`).

        **kwargs:  
            Additional keyword arguments passed to matplotlib plot function.

Returns:
        Plot (matplotlib):
                Optionally returns the data used for plotting if `return_data=True`.

Example:
    ```python

    # Basic UMAP visualization with default settings
    sm.pl.umap(adata, color='cell_type')

    # UMAP visualization with log transformation and custom colormap
    sm.pl.umap(adata, color='gene_expression', log=True, cmap='coolwarm')

    # Facetted UMAP plotting with custom point size and saved figure
    sm.pl.umap(adata, color=['cell_type', 'condition'], s=100, figsize=(10, 5), save_figure='/path/to/umap_plot.png')

    ```
    """

    # check if umap tool has been run
    try:
        adata.obsm[label]
    except KeyError:
        raise KeyError("Please run `sm.tl.umap(adata)` first")

    # identify the coordinates
    umap_coordinates = pd.DataFrame(adata.obsm[label],index=adata.obs.index, columns=['umap-1','umap-2'])

    # other data that the user requests
    if color is not None:
        if isinstance(color, str):
            color = [color]
        # identify if all elemets of color are available        
        if set(color).issubset(list(adata.var.index) + list(adata.obs.columns)) is False:
            raise ValueError("Element passed to `color` is not found in adata, please check!")

        # organise the data
        if any(item in color for item in list(adata.obs.columns)):
            adataobs = adata.obs.loc[:, adata.obs.columns.isin(color)]
            adataobs = adataobs.apply(lambda x: x.astype('category'))

        else:
            adataobs = None

        if any(item in color for item in list(adata.var.index)):
            # find the index of the marker
            marker_index = np.where(np.isin(list(adata.var.index), color))[0]
            if layer is not None:
                adatavar = adata.layers[layer][:, np.r_[marker_index]]
            elif use_raw is True:
                adatavar = adata.raw.X[:, np.r_[marker_index]]
            else:
                adatavar = adata.X[:, np.r_[marker_index]]
            adatavar = pd.DataFrame(adatavar, index=adata.obs.index, columns = list(adata.var.index[marker_index]))
        else:
            adatavar = None

        # combine all color data
        if adataobs is not None and adatavar is not None:
            color_data = pd.concat ([adataobs, adatavar], axis=1)
        elif adataobs is not None and adatavar is None:
            color_data = adataobs
        elif adataobs is None and adatavar is not None:
            color_data = adatavar
    else:
        color_data = None

    # combine color data with umap coordinates
    if color_data is not None:
        final_data = pd.concat([umap_coordinates, color_data], axis=1)
    else:
        final_data = umap_coordinates

    # create some reasonable defaults
    # estimate number of columns in subpolt
    nplots = len(final_data.columns) - 2 # total number of plots
    if ncols is None:
        if nplots >= 4:
            subplot = [math.ceil(nplots / 4), 4]
        elif nplots == 0:
            subplot = [1, 1]
        else:
            subplot = [math.ceil(nplots / nplots), nplots]
    else:
        subplot = [math.ceil(nplots /ncols), ncols]

    if nplots == 0:
        n_plots_to_remove = 0
    else:
        n_plots_to_remove = np.prod(subplot) - nplots # figure if we have to remove any subplots

    # size of points
    if s is None:
        if nplots == 0:
            s = (100000 / adata.shape[0])
        else:
            s = (100000 / adata.shape[0]) / nplots


    # if there are categorical data then assign colors to them
    if final_data.select_dtypes(exclude=["number","bool_","object_"]).shape[1] > 0:
        # find all categories in the dataframe
        cat_data = final_data.select_dtypes(exclude=["number","bool_","object_"])
        # find all categories
        all_cat = []
        for i in cat_data.columns:
            all_cat.append(list(cat_data[i].cat.categories))

        # generate colormapping for all categories
        less_9 = [colors.rgb2hex(x) for x in sns.color_palette('Set1')]
        nineto20 = [colors.rgb2hex(x) for x in sns.color_palette('tab20')]
        greater20 = [colors.rgb2hex(x) for x in sns.color_palette('gist_ncar', max([len(i) for i in all_cat]))]

        all_cat_colormap = dict()
        for i in range(len(all_cat)):
            if len(all_cat[i]) <= 9:
                dict1 = dict(zip(all_cat[i] , less_9[ : len(all_cat[i]) ]   ))
            elif len(all_cat[i]) > 9 and len(all_cat[i]) <= 20:
                dict1 = dict(zip(all_cat[i] , nineto20[ : len(all_cat[i]) ]   ))
            else:
                dict1 = dict(zip(all_cat[i] , greater20[ : len(all_cat[i]) ]   ))
            all_cat_colormap.update(dict1)

        # if user has passed in custom colours update the colors
        if palette is not None:
            all_cat_colormap.update(palette)
    else:
        all_cat_colormap = None


    # plot
    fig, ax = plt.subplots(subplot[0],subplot[1], figsize=figsize)
    plt.rcdefaults()
    #plt.rcParams['axes.facecolor'] = 'white'

    # remove unwanted axes
    #fig.delaxes(ax[-1])
    if n_plots_to_remove > 0:
        for i in range(n_plots_to_remove):
            fig.delaxes(ax[-1][  (len(ax[-1])-1)-i :   (len(ax[-1]))-i   ][0])

    # to make sure the ax is always 2x2
    if any(i > 1 for i in subplot):
        if any(i == 1 for i in subplot):
            ax = ax.reshape(subplot[0],subplot[1])

    if nplots == 0:
        ax.scatter(x = final_data['umap-1'], y = final_data['umap-2'], s=s, cmap=cmap, alpha=alpha, **kwargs)
        plt.xlabel("UMAP-1"); plt.ylabel("UMAP-2") 
        plt.tick_params(right= False,top= False,left= False, bottom= False)
        ax.get_xaxis().set_ticks([]); ax.get_yaxis().set_ticks([])
        if tight_layout is True:
            plt.tight_layout()

    elif all(i == 1 for i in subplot):
        column_to_plot = [e for e in list(final_data.columns) if e not in ('umap-1', 'umap-2')][0]
        if all_cat_colormap is None:
            im = ax.scatter(x = final_data['umap-1'], y = final_data['umap-2'], s=s, 
                       c=final_data[column_to_plot],
                       cmap=cmap, alpha=alpha, **kwargs)
            plt.colorbar(im, ax=ax)
        else:
            ax.scatter(x = final_data['umap-1'], y = final_data['umap-2'], s=s, 
                       c=final_data[column_to_plot].map(all_cat_colormap),
                       cmap=cmap, alpha=alpha, **kwargs)
            # create legend
            patchList = []
            for key in list(final_data[column_to_plot].unique()):
                data_key = mpatches.Patch(color=all_cat_colormap[key], label=key)
                patchList.append(data_key)    
                ax.legend(handles=patchList,bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

        plt.xlabel("UMAP-1"); plt.ylabel("UMAP-2") 
        plt.title(column_to_plot)
        plt.tick_params(right= False,top= False,left= False, bottom= False)
        ax.set(xticklabels = ([])); ax.set(yticklabels = ([]))
        if tight_layout is True:
            plt.tight_layout()

    else: 
        column_to_plot = [e for e in list(final_data.columns) if e not in ('umap-1', 'umap-2')]
        k=0
        for i,j in itertools.product(range(subplot[0]), range(subplot[1])):

            if final_data[column_to_plot[k]].dtype == 'category':
                ax[i,j].scatter(x = final_data['umap-1'], y = final_data['umap-2'], s=s,
                                c=final_data[column_to_plot[k]].map(all_cat_colormap),
                                cmap=cmap, alpha=alpha, **kwargs)
                # create legend
                patchList = []
                for key in list(final_data[column_to_plot[k]].unique()):
                    data_key = mpatches.Patch(color=all_cat_colormap[key], label=key)
                    patchList.append(data_key)    
                    ax[i,j].legend(handles=patchList,bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
            else:               
                im = ax[i,j].scatter(x = final_data['umap-1'], y = final_data['umap-2'], s=s,
                                c=final_data[column_to_plot[k]],
                                cmap=cmap, alpha=alpha, **kwargs)
                plt.colorbar(im, ax=ax[i, j])

            ax[i,j].tick_params(right= False,top= False,left= False, bottom= False)
            ax[i,j].set_xticklabels([]); ax[i,j].set_yticklabels([])
            ax[i,j].set_xlabel("UMAP-1"); ax[i,j].set_ylabel("UMAP-2")
            ax[i,j].set_title(column_to_plot[k])
            if tight_layout is True:
                plt.tight_layout()
            k = k+1 # iterator

    # if save figure is requested
    if save_figure is not None:
        plt.savefig(save_figure)  

    # return data if needed
    if return_data is True:
        return final_data