Skip to content

umap

Short Description

sm.tl.umap: This function enables dimensionality reduction on high-dimensional datasets using UMAP, allowing for the visualization of complex data structures in a lower-dimensional space. It supports customization through various parameters, including data source selection, logarithmic transformation, and manifold approximation settings, accommodating a wide range of analytical needs. Results are stored in adata.obsm, ready for subsequent visualization or analysis.

Function

umap(adata, use_layer=None, use_raw=False, log=False, n_neighbors=15, n_components=2, metric='euclidean', min_dist=0.1, random_state=0, label='umap', **kwargs)

Parameters:

Name Type Description Default
adata AnnData

Annotated data matrix or path to an AnnData object, containing spatial gene expression data.

required
use_layer str

Specifies a layer in adata.layers for UMAP. Defaults to using adata.X.

None
use_raw bool

Whether to use adata.raw.X for the analysis.

False
log bool

Applies natural log transformation to the data if True.

False
n_neighbors int

Number of neighboring points used in manifold approximation.

15
n_components int

Dimensionality of the target embedding space.

2
metric str

Metric used to compute distances in high-dimensional space.

'euclidean'
min_dist float

Effective minimum distance between embedded points.

0.1
random_state int

Seed used by the random number generator for reproducibility.

0
label str

Key for storing UMAP results in adata.obsm.

'umap'

Returns:

Name Type Description
adata AnnData

The input adata object, updated with UMAP embedding results in adata.obsm[label].

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Basic UMAP reduction
adata = sm.tl.umap(adata, n_neighbors=15, min_dist=0.1, label='umap_basic')

# UMAP using specific layer and log transformation
adata = sm.tl.umap(adata, use_layer='counts', use_raw=True, log=True, n_neighbors=30, min_dist=0.05, label='umap_layer_log')

# UMAP with a different metric and higher dimensionality
adata = sm.tl.umap(adata, metric='manhattan', n_components=3, n_neighbors=50, label='umap_manhattan_3d')

# plot results
sm.pl.umap(adata)
Source code in scimap/tools/umap.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def umap (adata, 
          use_layer=None, 
          use_raw=False, 
          log=False,
          n_neighbors=15, 
          n_components=2, 
          metric='euclidean',
          min_dist=0.1, 
          random_state=0, 
          label='umap', **kwargs):
    """
Parameters:
        adata (anndata.AnnData):  
            Annotated data matrix or path to an AnnData object, containing spatial gene expression data.

        use_layer (str, optional):  
            Specifies a layer in `adata.layers` for UMAP. Defaults to using `adata.X`.

        use_raw (bool, optional):  
            Whether to use `adata.raw.X` for the analysis.

        log (bool, optional):  
            Applies natural log transformation to the data if `True`.

        n_neighbors (int, optional):  
            Number of neighboring points used in manifold approximation.

        n_components (int, optional):  
            Dimensionality of the target embedding space.

        metric (str, optional):  
            Metric used to compute distances in high-dimensional space.

        min_dist (float, optional):  
            Effective minimum distance between embedded points.

        random_state (int, optional):  
            Seed used by the random number generator for reproducibility.

        label (str, optional):  
            Key for storing UMAP results in `adata.obsm`.

Returns:
        adata (anndata.AnnData):  
            The input `adata` object, updated with UMAP embedding results in `adata.obsm[label]`.

Example:
        ```python

        # Basic UMAP reduction
        adata = sm.tl.umap(adata, n_neighbors=15, min_dist=0.1, label='umap_basic')

        # UMAP using specific layer and log transformation
        adata = sm.tl.umap(adata, use_layer='counts', use_raw=True, log=True, n_neighbors=30, min_dist=0.05, label='umap_layer_log')

        # UMAP with a different metric and higher dimensionality
        adata = sm.tl.umap(adata, metric='manhattan', n_components=3, n_neighbors=50, label='umap_manhattan_3d')

        # plot results
        sm.pl.umap(adata)

        ```
    """

    # adata_layer=None;use_raw=False;log=False;n_neighbors=15;n_components=2;metric='euclidean';min_dist=0.1;
    # random_state=0;
    # load data
    if use_layer is not None:
        data = adata.layers[use_layer]
    elif use_raw is True:
        data = adata.raw.X
    else:
        data = adata.X

    # log the data if user requests
    if log is True:
        data = np.log1p(data)


    # embedding
    embedding = um.UMAP(n_neighbors=n_neighbors,
                          n_components=n_components,
                          metric=metric,
                          min_dist=min_dist,
                          random_state=random_state).fit_transform(data)

    # plot
    #plt.scatter(embedding[:, 0], embedding[:, 1], s=5)

    # return data
    adata.obsm[label] = embedding
    return adata