Skip to content

classify

Short Description

sm.hl.classify: This utility function enables users to annotate cells by assessing the presence or absence of specific markers. It offers flexibility to apply classifications across the entire dataset or within previously defined subsets, such as phenotyped or clustered cell groups, facilitating targeted analyses based on marker expression.

Function

classify(adata, pos=None, neg=None, classify_label='passed_classify', failed_label='failed_classify', phenotype=None, subclassify_phenotype=None, threshold=0.5, collapse_failed=True, label='classify', showPhenotypeLabel=False, verbose=True)

Parameters:

Name Type Description Default
adata AnnData

The annotated data matrix for classification.

required
pos list

Markers that should be expressed in the cells of interest.

None
neg list

Markers that should not be expressed in the cells of interest.

None
classify_label str

Label for cells that meet the classification criteria.

'passed_classify'
failed_label str

Label for cells that do not meet the classification criteria.

'failed_classify'
phenotype str, required if subclassify_phenotype or collapse_failed is used

Column in adata.obs containing the phenotype information.

None
subclassify_phenotype list

Phenotypes within which classification should be performed.

None
threshold float

Threshold for determining positive or negative expression.

0.5
collapse_failed bool

If True, unclassified cells are grouped under a single failed label.

True
label str

Key under which classification results are stored in adata.obs.

'classify'
showPhenotypeLabel bool

If True, appends classification status to existing phenotype labels in the results.

False
verbose bool

If True, prints progress and informational messages during the classification process.

True

Returns:

Name Type Description
adata AnnData

The input AnnData object, updated with classification results in adata.obs[label].

Example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# Basic classification with positive and negative markers
adata = sm.hl.classify(adata, pos=['CD3D', 'CD8A'], neg=['PDGFRB'], label='T_cell_classification')

# Classify specific phenotypes, preserving original phenotype labels for unclassified cells
adata = sm.hl.classify(adata, pos=['CD19'], neg=['CD3D'], subclassify_phenotype=['B cells'],
                 phenotype='cell_type', collapse_failed=False, label='B_cell_subclassification')

# Use showPhenotypeLabel to append classification status to existing phenotype labels
adata = sm.hl.classify(adata, pos=['CD34'], neg=['CD45'], phenotype='cell_type',
                 showPhenotypeLabel=True, label='stem_cell_classification', verbose=True)
Source code in scimap/helpers/classify.py
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
def classify (adata, 
              pos=None, 
              neg=None, 
              classify_label='passed_classify', 
              failed_label='failed_classify',
              phenotype=None,
              subclassify_phenotype=None,
              threshold = 0.5,
              collapse_failed=True,
              label="classify",
              showPhenotypeLabel=False,
              verbose=True):

    """
Parameters:
        adata (anndata.AnnData):  
            The annotated data matrix for classification.

        pos (list, optional):  
            Markers that should be expressed in the cells of interest.

        neg (list, optional):  
            Markers that should not be expressed in the cells of interest.

        classify_label (str, optional):  
            Label for cells that meet the classification criteria.

        failed_label (str, optional):  
            Label for cells that do not meet the classification criteria.

        phenotype (str, required if subclassify_phenotype or collapse_failed is used):  
            Column in `adata.obs` containing the phenotype information.

        subclassify_phenotype (list, optional):  
            Phenotypes within which classification should be performed.

        threshold (float, optional):  
            Threshold for determining positive or negative expression.

        collapse_failed (bool, optional):  
            If True, unclassified cells are grouped under a single failed label.

        label (str, optional):  
            Key under which classification results are stored in `adata.obs`.

        showPhenotypeLabel (bool, optional):  
            If True, appends classification status to existing phenotype labels in the results.

        verbose (bool, optional):  
            If True, prints progress and informational messages during the classification process.

Returns:
        adata (anndata.AnnData):  
            The input AnnData object, updated with classification results in `adata.obs[label]`.

Example:
    ```python

    # Basic classification with positive and negative markers
    adata = sm.hl.classify(adata, pos=['CD3D', 'CD8A'], neg=['PDGFRB'], label='T_cell_classification')

    # Classify specific phenotypes, preserving original phenotype labels for unclassified cells
    adata = sm.hl.classify(adata, pos=['CD19'], neg=['CD3D'], subclassify_phenotype=['B cells'],
                     phenotype='cell_type', collapse_failed=False, label='B_cell_subclassification')

    # Use showPhenotypeLabel to append classification status to existing phenotype labels
    adata = sm.hl.classify(adata, pos=['CD34'], neg=['CD45'], phenotype='cell_type',
                     showPhenotypeLabel=True, label='stem_cell_classification', verbose=True)

    ```
    """

    # clean the input
    if isinstance(pos, str):
        pos = [pos]
    if isinstance(neg, str):
        neg = [neg]
    if isinstance(subclassify_phenotype, str):
        subclassify_phenotype = [subclassify_phenotype]
    if (showPhenotypeLabel):
        phenotype_label=phenotype+"_"+label


    # Create a dataFrame with the necessary inforamtion
    data = pd.DataFrame(adata.X, index= adata.obs.index, columns = adata.var.index)

    # if user requests to subset a specific phenotype   
    if subclassify_phenotype is not None:
        meta = pd.DataFrame(adata.obs[phenotype])
        subset_index = meta[meta[phenotype].isin(subclassify_phenotype)].index
        data = data.loc[subset_index]

    # Subset cells that pass the pos criteria
    if pos is not None:
        for i in pos:
            data = data[data[i] >= threshold]

    # Subset cells that pass the neg criteria 
    if neg is not None and not data.empty:
        for j in neg:
            data = data[data[j] < threshold]

    # Cells that passed the classify criteria
    if data.empty:
        raise TypeError("No cells were found to satisfy your `classify` criteria")
    else:
        # create new naming scheme for label and phenotype_label cols in classified
        non_summary = pd.DataFrame({phenotype: adata.obs[phenotype]}) # gets the index and phenotype
        non_summary[phenotype] = non_summary[phenotype].astype(str)

        classify_idx=data.index
        classified = pd.DataFrame(non_summary.loc[data.index]) #subsets phenotype rows to only classified cells
        if showPhenotypeLabel:
            classified[phenotype_label] = classified[phenotype]+"_"+classify_label # add phenotype_label col
        classified[label]=pd.DataFrame(np.repeat(classify_label, len(classify_idx)), index = classify_idx) # add label col
        classified.drop([phenotype], axis='columns', inplace=True) # drop phenotype col, for merge        



    if collapse_failed is True: 
        meta = non_summary # has index and phenotype col
        meta = meta.merge(classified, how='outer', left_index=True, right_index=True) # gain classified col(s) and NaNs for non-matches
        if showPhenotypeLabel is True:
            meta[phenotype_label]= meta[phenotype_label].fillna(meta[phenotype].astype(str)+"_"+failed_label)
            meta=meta[phenotype_label]
        else: 
            meta[label]=meta[label].fillna(failed_label)
            meta=meta[label]


    else:
        if phenotype is None:
            raise ValueError("Please pass a column name to the PHENOTYPE argument")

        if showPhenotypeLabel is True: 
            meta=non_summary # phenotype col
            classified=pd.DataFrame({phenotype: classified[phenotype_label]}) # takes phenotype_label col and renames to phenotype, ensures it's a df
            meta.update(classified) # updates with phenotype_label for only the classified cells
        else:
            meta= pd.DataFrame(adata.obs[phenotype])
            classified = pd.DataFrame(np.repeat(classify_label, len(classify_idx)), index = classify_idx, columns = [phenotype])
            meta.update(classified) # updates with label for only the classified cells


    # Add to Anndata 
    meta = meta.reindex(adata.obs.index)
    if showPhenotypeLabel is True:
        adata.obs[phenotype_label]=meta
    else:
        adata.obs[label]=meta 

    # return
    return adata