Skip to content

Sm.tl.spatial cluster

Short Description

sm.tl.spatial_cluster: This function allows users to cluster the spatial neighbourhood matrix genereated by either sm.tl.spatial_expression, sm.tl.spatial_count, sm.tl.spatial_lda etc.

Function

spatial_cluster(adata, df_name='spatial_count', method='kmeans', k=10, n_pcs=None, resolution=1, phenograph_clustering_metric='euclidean', nearest_neighbors=30, random_state=0, label=None, output_dir=None)

Parameters:

Name Type Description Default
adata

AnnData object loaded into memory or path to AnnData object.

required
df_name

string, required
Label of the spatial analysis performed. By default if sm.tl.spatial_count was run the results will be saved under spatial_count and if sm.tl.spatial_expression was run, the results will be saved under spatial_expression.

'spatial_count'
method

string, optional
Clustering method to be used- Implemented methods- kmeans, phenograph and leiden.

'kmeans'
k

int, optional
Number of clusters to return when using K-Means clustering.

10
phenotype

string, optional
The column name that contains the cluster/phenotype information.

required
n_pcs

int, optional
Number of PC's to be used in leiden clustering. By default it uses all PC's.

None
resolution

float, optional
A parameter value controlling the coarseness of the clustering. Higher values lead to more clusters.

1
phenograph_clustering_metric

string, optional
Distance metric to define nearest neighbors. Note that performance will be slower for correlation and cosine. Available methods- cityblock’, ‘cosine’, ‘euclidean’, ‘manhattan’, braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’

'euclidean'
nearest_neighbors

int, optional
Number of nearest neighbors to use in first step of graph construction. This parameter is used both in leiden and phenograph clustering.

30
random_state

int, optional
Change the initialization of the optimization.

0
label

string, optional
Key or optional column name for the returned data, stored in adata.obs. The default is adata.obs [spatial_method used].

None
output_dir

string, optional
Path to output directory.

None

Returns:

Type Description
adata

AnnData Object
Returns an updated anndata object with a new column. check- adata.obs [spatial_method used]

Examples:

1
    adata = sm.tl.spatial_cluster (adata, k= 10, method = 'kmeans') # results will be saved under adata.obs['spatial_kmeans']
Source code in scimap/tools/_spatial_cluster.py
def spatial_cluster (adata, df_name='spatial_count', method = 'kmeans',k=10,
                     n_pcs=None, resolution=1, phenograph_clustering_metric='euclidean', 
                     nearest_neighbors=30, random_state=0,label=None, output_dir=None):
    """


Parameters:
    adata : AnnData object loaded into memory or path to AnnData object.

    df_name : string, required  
        Label of the spatial analysis performed.
        By default if `sm.tl.spatial_count` was run the results will be saved under `spatial_count` and
        if `sm.tl.spatial_expression` was run, the results will be saved under `spatial_expression`.

    method : string, optional  
        Clustering method to be used- Implemented methods- kmeans, phenograph and leiden.

    k : int, optional  
        Number of clusters to return when using K-Means clustering.

    phenotype : string, optional  
        The column name that contains the cluster/phenotype information.

    n_pcs : int, optional  
        Number of PC's to be used in leiden clustering. By default it uses all PC's.

    resolution : float, optional  
        A parameter value controlling the coarseness of the clustering. 
        Higher values lead to more clusters.

    phenograph_clustering_metric : string, optional  
        Distance metric to define nearest neighbors. Note that performance will be slower for correlation and cosine. 
        Available methods- cityblock’, ‘cosine’, ‘euclidean’, ‘manhattan’, braycurtis’, ‘canberra’, ‘chebyshev’, 
        ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’, 
        ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’

    nearest_neighbors : int, optional  
        Number of nearest neighbors to use in first step of graph construction. 
        This parameter is used both in leiden and phenograph clustering.

    random_state : int, optional  
        Change the initialization of the optimization.

    label : string, optional  
        Key or optional column name for the returned data, stored in `adata.obs`. The default is adata.obs [spatial_method used].

    output_dir : string, optional  
        Path to output directory.

Returns:
    adata : AnnData Object  
        Returns an updated anndata object with a new column. check- adata.obs [spatial_method used]

Example:
```python
    adata = sm.tl.spatial_cluster (adata, k= 10, method = 'kmeans') # results will be saved under adata.obs['spatial_kmeans']
```
    """

    # Load the andata object    
    if isinstance(adata, str):
        imid = str(adata.rsplit('/', 1)[-1])
        adata = ad.read(adata)
    else:
        adata = adata

    # Make a copy of adata to modify
    adata_copy = adata.copy()

    # Error check
    try:
        adata_copy.uns[df_name]
    except KeyError:
        print (str('Supplied df_name not found, please run `sm.tl.spatial_expression` or LDA, counts or other similar methods'))

    # Crete a new anndata object with the user defined spatial information
    adata_new = ad.AnnData(adata_copy.uns[df_name].fillna(0))
    adata_new.obs = adata_copy.obs

    # Create a meaningful label name
    if label is None:
        label = 'spatial_' + str(method)

    # Run the clustering algorithm
    adata_new = cluster (adata = adata_new,
                         method = method,
                         k=k, 
                         n_pcs=n_pcs, 
                         resolution=resolution,
                         phenograph_clustering_metric=phenograph_clustering_metric,
                         nearest_neighbors=nearest_neighbors, 
                         use_raw=False, 
                         random_state=random_state,
                         label=label)

    # Get the clusters and append that to original adata object
    result = adata_new.obs[label]
    result = result.reindex(adata.obs.index)
    adata.obs[label] = result


    # Save data if requested
    if output_dir is not None:
        output_dir = pathlib.Path(output_dir)
        output_dir.mkdir(exist_ok=True, parents=True)
        adata.write(output_dir / imid)
    else:    
        # Return data
        return adata
Back to top