Sm.tl.spatial cluster
Short Description
sm.tl.spatial_cluster
: This function allows users to cluster the spatial neighbourhood matrix
genereated by either sm.tl.spatial_expression
, sm.tl.spatial_count
, sm.tl.spatial_lda
etc.
Function
spatial_cluster(adata, df_name='spatial_count', method='kmeans', k=10, n_pcs=None, resolution=1, phenograph_clustering_metric='euclidean', nearest_neighbors=30, random_state=0, label=None, output_dir=None)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata |
AnnData object loaded into memory or path to AnnData object. |
required | |
df_name |
string, required |
'spatial_count' |
|
method |
string, optional |
'kmeans' |
|
k |
int, optional |
10 |
|
phenotype |
string, optional |
required | |
n_pcs |
int, optional |
None |
|
resolution |
float, optional |
1 |
|
phenograph_clustering_metric |
string, optional |
'euclidean' |
|
nearest_neighbors |
int, optional |
30 |
|
random_state |
int, optional |
0 |
|
label |
string, optional |
None |
|
output_dir |
string, optional |
None |
Returns:
Type | Description |
---|---|
adata |
AnnData Object |
Examples:
1 |
|
Source code in scimap/tools/_spatial_cluster.py
def spatial_cluster (adata, df_name='spatial_count', method = 'kmeans',k=10,
n_pcs=None, resolution=1, phenograph_clustering_metric='euclidean',
nearest_neighbors=30, random_state=0,label=None, output_dir=None):
"""
Parameters:
adata : AnnData object loaded into memory or path to AnnData object.
df_name : string, required
Label of the spatial analysis performed.
By default if `sm.tl.spatial_count` was run the results will be saved under `spatial_count` and
if `sm.tl.spatial_expression` was run, the results will be saved under `spatial_expression`.
method : string, optional
Clustering method to be used- Implemented methods- kmeans, phenograph and leiden.
k : int, optional
Number of clusters to return when using K-Means clustering.
phenotype : string, optional
The column name that contains the cluster/phenotype information.
n_pcs : int, optional
Number of PC's to be used in leiden clustering. By default it uses all PC's.
resolution : float, optional
A parameter value controlling the coarseness of the clustering.
Higher values lead to more clusters.
phenograph_clustering_metric : string, optional
Distance metric to define nearest neighbors. Note that performance will be slower for correlation and cosine.
Available methods- cityblock’, ‘cosine’, ‘euclidean’, ‘manhattan’, braycurtis’, ‘canberra’, ‘chebyshev’,
‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’,
‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’
nearest_neighbors : int, optional
Number of nearest neighbors to use in first step of graph construction.
This parameter is used both in leiden and phenograph clustering.
random_state : int, optional
Change the initialization of the optimization.
label : string, optional
Key or optional column name for the returned data, stored in `adata.obs`. The default is adata.obs [spatial_method used].
output_dir : string, optional
Path to output directory.
Returns:
adata : AnnData Object
Returns an updated anndata object with a new column. check- adata.obs [spatial_method used]
Example:
```python
adata = sm.tl.spatial_cluster (adata, k= 10, method = 'kmeans') # results will be saved under adata.obs['spatial_kmeans']
```
"""
# Load the andata object
if isinstance(adata, str):
imid = str(adata.rsplit('/', 1)[-1])
adata = ad.read(adata)
else:
adata = adata
# Make a copy of adata to modify
adata_copy = adata.copy()
# Error check
try:
adata_copy.uns[df_name]
except KeyError:
print (str('Supplied df_name not found, please run `sm.tl.spatial_expression` or LDA, counts or other similar methods'))
# Crete a new anndata object with the user defined spatial information
adata_new = ad.AnnData(adata_copy.uns[df_name].fillna(0))
adata_new.obs = adata_copy.obs
# Create a meaningful label name
if label is None:
label = 'spatial_' + str(method)
# Run the clustering algorithm
adata_new = cluster (adata = adata_new,
method = method,
k=k,
n_pcs=n_pcs,
resolution=resolution,
phenograph_clustering_metric=phenograph_clustering_metric,
nearest_neighbors=nearest_neighbors,
use_raw=False,
random_state=random_state,
label=label)
# Get the clusters and append that to original adata object
result = adata_new.obs[label]
result = result.reindex(adata.obs.index)
adata.obs[label] = result
# Save data if requested
if output_dir is not None:
output_dir = pathlib.Path(output_dir)
output_dir.mkdir(exist_ok=True, parents=True)
adata.write(output_dir / imid)
else:
# Return data
return adata