spatial_cluster

Short Description

sm.tl.spatial_cluster: This function clusters cells based on their spatial neighborhood matrices, which can be derived from analyses such as sm.tl.spatial_expression, sm.tl.spatial_count, or sm.tl.spatial_lda. By leveraging various clustering algorithms, including k-means, phenograph, and leiden, it enables the identification of spatially coherent cell groups or microenvironments within tissue sections.

Function

`spatial_cluster(adata, df_name='spatial_count', method='kmeans', k=10, n_pcs=None, resolution=1, phenograph_clustering_metric='euclidean', nearest_neighbors=30, random_state=0, label=None, verbose=True, output_dir=None)`

Parameters:

Name	Type	Description	Default
`adata`	`AnnData`	Annotated data matrix or path to an AnnData object, containing spatial gene expression data.	required
`df_name`	`str`	Specifies the label of the spatial analysis results to use for clustering. Default options are 'spatial_count' and 'spatial_expression'.	`'spatial_count'`
`method`	`str`	The clustering method to apply. Supported methods include 'kmeans', 'phenograph', and 'leiden'.	`'kmeans'`
`k`	`int`	Number of clusters to form when using K-Means clustering. Applies only if method='kmeans'.	`10`
`n_pcs`	`int`	Number of principal components to use in 'leiden' clustering. If None, all components are used.	`None`
`resolution`	`float`	Controls the granularity of clustering. Higher values lead to more clusters. Applies to 'leiden' and 'phenograph'.	`1`
`phenograph_clustering_metric`	`str`	The metric for defining nearest neighbors in 'phenograph' clustering. Choices include 'euclidean', 'manhattan', 'cosine', etc.	`'euclidean'`
`nearest_neighbors`	`int`	Number of nearest neighbors to consider in the graph construction step, for 'leiden' and 'phenograph'.	`30`
`random_state`	`int`	Seed for random number generation, ensuring reproducible results.	`0`
`label`	`str`	Custom label for storing results in `adata.obs`. Defaults to method name (e.g., 'spatial_kmeans').	`None`
`verbose`	`bool`	If set to `True`, the function will print detailed messages about its progress and the steps being executed.	`True`
`output_dir`	`str`	Directory path for saving output files. If None, results are not saved to disk.	`None`

Returns:

Name	Type	Description
`adata`	`AnnData`	The input `adata` object updated with clustering results in `adata.obs[label]`.

Example

# Apply K-Means clustering
adata = sm.tl.spatial_cluster(adata, df_name='spatial_count', method='kmeans', k=10, label='cluster_kmeans')

# Apply Leiden clustering with specific resolution and principal components
adata = sm.tl.spatial_cluster(adata, df_name='spatial_expression', method='leiden', resolution=0.5, n_pcs=20, label='cluster_leiden')

# Apply Phenograph clustering with a specific metric and nearest neighbors
adata = sm.tl.spatial_cluster(adata, df_name='spatial_lda', method='phenograph', phenograph_clustering_metric='manhattan', nearest_neighbors=15, label='cluster_phenograph')

Source code in scimap/tools/spatial_cluster.py

def spatial_cluster(
    adata,
    df_name='spatial_count',
    method='kmeans',
    k=10,
    n_pcs=None,
    resolution=1,
    phenograph_clustering_metric='euclidean',
    nearest_neighbors=30,
    random_state=0,
    label=None,
    verbose=True,
    output_dir=None,
):
    """


    Parameters:
            adata (anndata.AnnData):
                Annotated data matrix or path to an AnnData object, containing spatial gene expression data.

            df_name (str):
                Specifies the label of the spatial analysis results to use for clustering. Default options are 'spatial_count' and 'spatial_expression'.

            method (str):
                The clustering method to apply. Supported methods include 'kmeans', 'phenograph', and 'leiden'.

            k (int):
                Number of clusters to form when using K-Means clustering. Applies only if method='kmeans'.

            n_pcs (int, optional):
                Number of principal components to use in 'leiden' clustering. If None, all components are used.

            resolution (float):
                Controls the granularity of clustering. Higher values lead to more clusters. Applies to 'leiden' and 'phenograph'.

            phenograph_clustering_metric (str):
                The metric for defining nearest neighbors in 'phenograph' clustering. Choices include 'euclidean', 'manhattan', 'cosine', etc.

            nearest_neighbors (int):
                Number of nearest neighbors to consider in the graph construction step, for 'leiden' and 'phenograph'.

            random_state (int):
                Seed for random number generation, ensuring reproducible results.

            label (str, optional):
                Custom label for storing results in `adata.obs`. Defaults to method name (e.g., 'spatial_kmeans').

            verbose (bool):
                If set to `True`, the function will print detailed messages about its progress and the steps being executed.

            output_dir (str, optional):
                Directory path for saving output files. If None, results are not saved to disk.

    Returns:
            adata (anndata.AnnData):
                The input `adata` object updated with clustering results in `adata.obs[label]`.

    Example:
        ```python
        # Apply K-Means clustering
        adata = sm.tl.spatial_cluster(adata, df_name='spatial_count', method='kmeans', k=10, label='cluster_kmeans')

        # Apply Leiden clustering with specific resolution and principal components
        adata = sm.tl.spatial_cluster(adata, df_name='spatial_expression', method='leiden', resolution=0.5, n_pcs=20, label='cluster_leiden')

        # Apply Phenograph clustering with a specific metric and nearest neighbors
        adata = sm.tl.spatial_cluster(adata, df_name='spatial_lda', method='phenograph', phenograph_clustering_metric='manhattan', nearest_neighbors=15, label='cluster_phenograph')
        ```
    """

    # Load the andata object
    if isinstance(adata, str):
        imid = str(adata.rsplit('/', 1)[-1])
        adata = ad.read_h5ad(adata)
    else:
        adata = adata

    # Make a copy of adata to modify
    adata_copy = adata.copy()

    # Error check
    try:
        adata_copy.uns[df_name]
    except KeyError:
        print(
            str(
                'Supplied df_name not found, please run `sm.tl.spatial_expression` or LDA, counts or other similar methods'
            )
        )

    # Crete a new anndata object with the user defined spatial information
    adata_new = ad.AnnData(adata_copy.uns[df_name].fillna(0))
    adata_new.obs = adata_copy.obs

    # Create a meaningful label name
    if label is None:
        label = 'spatial_' + str(method)

    # Run the clustering algorithm
    adata_new = cluster(
        adata=adata_new,
        method=method,
        k=k,
        n_pcs=n_pcs,
        resolution=resolution,
        phenograph_clustering_metric=phenograph_clustering_metric,
        nearest_neighbors=nearest_neighbors,
        use_raw=False,
        random_state=random_state,
        label=label,
    )

    # Get the clusters and append that to original adata object
    result = adata_new.obs[label]
    result = result.reindex(adata.obs.index)
    adata.obs[label] = result

    # Save data if requested
    if output_dir is not None:
        output_dir = pathlib.Path(output_dir)
        output_dir.mkdir(exist_ok=True, parents=True)
        adata.write(output_dir / imid)
    else:
        # Return data
        return adata