Skip to content

dropFeatures

Short Description

sm.hl.dropFeatures: This versatile function streamlines the process of refining an AnnData object by enabling users to selectively remove markers, cells, metadata columns, and specific cell groups. It facilitates targeted dataset curation, ensuring analyses are performed on relevant and clean data subsets.

Function

dropFeatures(adata, drop_markers=None, drop_cells=None, drop_meta_columns=None, drop_groups=None, groups_column=None, subset_raw=True, verbose=True)

Parameters:

Name Type Description Default
adata AnnData

Annotated data matrix or path to an AnnData object, containing spatial gene expression data.

required
drop_markers list

A list of gene or marker names to be removed from adata.var.

None
drop_cells list

A list of cell identifiers (index names) to be removed from adata.obs.

None
drop_meta_columns list

A list of metadata column names to be removed from adata.obs.

None
drop_groups list

A list of category names to be removed based on the column specified by groups_column.

None
groups_column str

The name of the column in adata.obs that contains the categorical data for drop_groups.

None
subset_raw bool

If True, the same dropping operations are applied to adata.raw.

True
verbose bool

If True, print messages about the dropping process.

True

Returns:

Name Type Description
adata AnnData

The AnnData object after the specified features have been removed.

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Example 1: Drop specific markers from the dataset
adata = sm.hl.dropFeatures(adata, drop_markers=['CD3D', 'CD19'])

# Example 2: Remove cells based on their identifiers
adata = sm.hl.dropFeatures(adata, drop_cells=['cell_001', 'cell_002'])

# Example 3: Remove metadata columns from adata.obs
adata = sm.hl.dropFeatures(adata, drop_meta_columns=['Batch', 'Condition'])

# Example 4: Exclude specific groups from a categorical column in adata.obs
adata = sm.hl.dropFeatures(adata, drop_groups=['B cell', 'NK cell'], groups_column='Cell_Type')
Source code in scimap/helpers/dropFeatures.py
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
def dropFeatures (adata, 
                  drop_markers=None, 
                  drop_cells=None, 
                  drop_meta_columns=None,
                  drop_groups=None, 
                  groups_column=None,
                  subset_raw=True,
                  verbose=True):
    """
Parameters:
        adata (anndata.AnnData):  
            Annotated data matrix or path to an AnnData object, containing spatial gene expression data.

        drop_markers (list, optional):  
            A list of gene or marker names to be removed from `adata.var`. 

        drop_cells (list, optional):  
            A list of cell identifiers (index names) to be removed from `adata.obs`. 

        drop_meta_columns (list, optional):  
            A list of metadata column names to be removed from `adata.obs`. 

        drop_groups (list, optional):  
            A list of category names to be removed based on the column specified by `groups_column`. 

        groups_column (str, optional):  
            The name of the column in `adata.obs` that contains the categorical data for `drop_groups`. 

        subset_raw (bool, optional):  
            If True, the same dropping operations are applied to `adata.raw`.

        verbose (bool, optional):  
            If True, print messages about the dropping process. 

Returns:
        adata (anndata.AnnData):  
            The AnnData object after the specified features have been removed.

Example:
        ```python
        # Example 1: Drop specific markers from the dataset
        adata = sm.hl.dropFeatures(adata, drop_markers=['CD3D', 'CD19'])

        # Example 2: Remove cells based on their identifiers
        adata = sm.hl.dropFeatures(adata, drop_cells=['cell_001', 'cell_002'])

        # Example 3: Remove metadata columns from adata.obs
        adata = sm.hl.dropFeatures(adata, drop_meta_columns=['Batch', 'Condition'])

        # Example 4: Exclude specific groups from a categorical column in adata.obs
        adata = sm.hl.dropFeatures(adata, drop_groups=['B cell', 'NK cell'], groups_column='Cell_Type')

        ```

    """

    # Drop Markers
    if drop_markers is not None:
        if isinstance(drop_markers, str):
            drop_markers = [drop_markers]
        # find the index of the given markers
        idx_markers = [adata.var.index.get_loc(x) for x in drop_markers]
        # remove from adata
        keep_markes = list(set(adata.var.index).difference(drop_markers))
        adata = adata[:, keep_markes]
        # remove from raw
        if subset_raw is True:
            raw = np.delete(adata.raw.X, idx_markers, axis=1)
            del adata.raw
            adata.raw = ad.AnnData (raw)

    # Drop cells
    if drop_cells is not None:
        if isinstance(drop_cells, str):
            drop_cells = [drop_cells]
        # find the index of the given markers
        idx_markers = [adata.obs.index.get_loc(x) for x in drop_cells]
        # remove from adata
        keep_markes = list(set(adata.obs.index).difference(drop_cells))
        adata = adata[keep_markes, :]
        # remove from raw
        if subset_raw is True:
            raw = np.delete(adata.raw.X, idx_markers, axis=1)
            del adata.raw
            adata.raw = ad.AnnData (raw)

    # Drop meta columns
    if drop_meta_columns is not None:
        if isinstance(drop_meta_columns, str):
            drop_meta_columns = [drop_meta_columns]
        # remove from adata
        adata.obs = adata.obs.drop(drop_meta_columns, axis=1)

    # Drop specific categories of cells
    if drop_groups is not None:
        if isinstance(drop_groups, str):
            drop_groups = [drop_groups]
        if isinstance(groups_column, list):
            groups_column = groups_column[0]
        # find the index of the given markers
        idx = adata[adata.obs[groups_column].isin(drop_groups)].obs.index
        idx_markers = [adata.obs.index.get_loc(x) for x in idx]
        # remove from raw
        if subset_raw is True:
            raw = np.delete(adata.raw.X, idx_markers, axis=0)
        # remove from adata
        adata = adata[~adata.obs[groups_column].isin(drop_groups)]
        # return adata raw
        if subset_raw is True:
            del adata.raw
            adata.raw = ad.AnnData (raw)


    # return
    return adata