Skip to content

foldchange

Short Description

The sm.tl.foldchange function computes the fold change in cell-type abundance between samples or ROIs, using the from_group parameter to specify the reference group by its column name in imageid. It normalizes cell abundance to the total cell count within each sample/ROI to adjust for size differences, a feature that can be disabled. The function uses a Fisher exact test to calculate p-values, assessing the statistical significance of the observed changes.

Results are stored in the .uns section of the Anndata object for easy access and further analysis.

Function

foldchange(adata, from_group, to_group=None, imageid='imageid', phenotype='phenotype', normalize=True, subset_phenotype=None, label='foldchange', verbose=True)

Parameters:

Name Type Description Default
adata AnnData

The input AnnData object containing single-cell data for fold change analysis.

required
from_group list of str

Specifies the reference sample(s) or Region of Interest (ROI) for calculating fold change. If multiple samples or ROIs are provided (e.g., ['ROI1', 'ROI2']), they will be aggregated to serve as a singular reference point for comparison.

required
to_group list of str

Defines a specific set of samples/ROIs to compare against the reference group specified in from_group. If not provided, the reference will be compared to all other groups within the imageid column. For example, ['ROI3', 'ROI4'].

None
imageid str

The column in adata.obs that holds the sample/ROI identifiers.

'imageid'
phenotype str

The column in adata.obs that contains cell type or phenotype information.

'phenotype'
normalize bool

If True, adjusts cell counts based on the total number of cells within each sample/ROI to account for differences in sample/ROI area. If subset_phenotype is provided, normalization considers only the total cells of the specified cell types.

True
subset_phenotype list of str

Limits the analysis to a particular subset of cell types. Only cell types listed here will be included in the fold change computation.

None
label str

Designates the key under which the fold change results (both fold change values and p-values) are stored in adata.uns. The results will be accessible as <label>_fc for fold changes and <label>_pval for p-values.

'foldchange'
verbose bool

Enables the display of detailed progress updates and information about the execution steps when set to True.

True

Returns:

Name Type Description
adata AnnData

The input adata object, updated with fold change analysis results. The fold change values and p-values can be found in adata.uns['<label>_fc'] and adata.uns['<label>_pval'], respectively.

Example
1
2
3
4
5
6
7
8
# Basic usage with automatic comparison to all other groups
adata = sm.tl.foldchange(adata, from_group=['ROI1'], imageid='imageid', phenotype='phenotype', normalize=True, label='roi_comparison')

# Specifying a subset of groups for comparison
adata = sm.tl.foldchange(adata, from_group=['image_1'], to_group=['image_2', 'image_3'], imageid='imageid', phenotype='phenotype', normalize=True, label='specific_roi_comparison')

# Focusing on specific cell types for fold change analysis
adata = sm.tl.foldchange(adata, from_group=['ROI1'], to_group=['ROI3', 'ROI4'], subset_phenotype=['T cells', 'B cells', 'Macrophages'], label='subset_phenotype_comparison')
Source code in scimap/tools/foldchange.py
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
def foldchange (adata, 
                from_group, 
                to_group=None, 
                imageid='imageid', 
                phenotype='phenotype',
                normalize=True, 
                subset_phenotype=None, 
                label='foldchange',
                verbose=True):
    """


Parameters:
    adata (anndata.AnnData):
        The input AnnData object containing single-cell data for fold change analysis.

    from_group (list of str):  
        Specifies the reference sample(s) or Region of Interest (ROI) for calculating fold change. If multiple samples or ROIs are provided (e.g., ['ROI1', 'ROI2']), they will be aggregated to serve as a singular reference point for comparison.

    to_group (list of str, optional):  
        Defines a specific set of samples/ROIs to compare against the reference group specified in `from_group`. If not provided, the reference will be compared to all other groups within the `imageid` column. For example, ['ROI3', 'ROI4'].

    imageid (str):  
        The column in `adata.obs` that holds the sample/ROI identifiers.

    phenotype (str):  
        The column in `adata.obs` that contains cell type or phenotype information.

    normalize (bool):  
        If True, adjusts cell counts based on the total number of cells within each sample/ROI to account for differences in sample/ROI area. If `subset_phenotype` is provided, normalization considers only the total cells of the specified cell types.

    subset_phenotype (list of str, optional):  
        Limits the analysis to a particular subset of cell types. Only cell types listed here will be included in the fold change computation.

    label (str):   
        Designates the key under which the fold change results (both fold change values and p-values) are stored in `adata.uns`. The results will be accessible as `<label>_fc` for fold changes and `<label>_pval` for p-values.

    verbose (bool):  
        Enables the display of detailed progress updates and information about the execution steps when set to True.

Returns:
    adata (anndata.AnnData):
        The input `adata` object, updated with fold change analysis results. The fold change values and p-values can be found in `adata.uns['<label>_fc']` and `adata.uns['<label>_pval']`, respectively.


Example:
    ```python

    # Basic usage with automatic comparison to all other groups
    adata = sm.tl.foldchange(adata, from_group=['ROI1'], imageid='imageid', phenotype='phenotype', normalize=True, label='roi_comparison')

    # Specifying a subset of groups for comparison
    adata = sm.tl.foldchange(adata, from_group=['image_1'], to_group=['image_2', 'image_3'], imageid='imageid', phenotype='phenotype', normalize=True, label='specific_roi_comparison')

    # Focusing on specific cell types for fold change analysis
    adata = sm.tl.foldchange(adata, from_group=['ROI1'], to_group=['ROI3', 'ROI4'], subset_phenotype=['T cells', 'B cells', 'Macrophages'], label='subset_phenotype_comparison')

    ```

    """

    # prepare data
    data = adata.obs[[imageid,phenotype]]

    # convert from and to groups to list
    if isinstance(from_group, str):
        from_group = [from_group]
    if isinstance(to_group, str):
        to_group = [to_group]

    # subset phenotype of interest
    if subset_phenotype is not None:
        if isinstance (subset_phenotype, str):
            subset_phenotype = [subset_phenotype]
        data = data[data[phenotype].isin(subset_phenotype)]

    # subset data    
    from_data = data[data[imageid].isin(from_group)]
    if len(from_group) > 1:
        combined_name = '_'.join(from_group)
        #from_data[imageid] = combined_name
        from_data.loc[:, imageid] = combined_name


    from_data.loc[:, imageid] = from_data[imageid].astype('str').astype('category')
    from_data.loc[:, phenotype] = from_data[phenotype].astype('str').astype('category')
    if to_group is None:
        to_data = data[~data[imageid].isin(from_group)]
    else:
        to_data = data[data[imageid].isin(to_group)]
    to_data.loc[:, imageid] = to_data[imageid].astype('str').astype('category')
    to_data.loc[:, phenotype] = to_data[phenotype].astype('str').astype('category')


    if verbose:
        print('calculating foldchange')

    # consolidated counts dataframe
    from_data_consolidated = pd.DataFrame(from_data.groupby([imageid,phenotype],observed=False).size()).unstack().fillna(0)
    from_data_consolidated.columns = np.unique(from_data_consolidated.columns.get_level_values(1))

    to_data_consolidated = pd.DataFrame(to_data.groupby([imageid,phenotype],observed=False).size()).unstack().fillna(0)
    to_data_consolidated.columns = np.unique(to_data_consolidated.columns.get_level_values(1))

    # make backup of the sample names
    from_b = list(from_data_consolidated.index)
    to_b = list(to_data_consolidated.index)

    # make sure from_data_consolidated and to_data_consolidated has the same columns
    x = from_data_consolidated.T
    x.columns = x.columns.astype(str)
    y = to_data_consolidated.T
    y.columns = y.columns.astype(str)
    consolidated = x.merge(y, how='outer', left_index=True, right_index=True).fillna(0)

    # split it back into from and to
    from_data_consolidated = consolidated[from_b].T
    to_data_consolidated = consolidated[to_b].T

    # create the total minus to and from tables
    from_data_total = abs(from_data_consolidated.sub( from_data_consolidated.sum(axis=1), axis=0))
    to_data_total = abs(to_data_consolidated.sub( to_data_consolidated.sum(axis=1), axis=0))

    # 
    if verbose:
        print('calculating P values')
    p_vals = []
    for i in from_data_consolidated.columns:
        #a = from_data_consolidated[i][0]
        a = from_data_consolidated[i].iloc[0]
        #c = from_data_total[i][0]
        c = from_data_total[i].iloc[0]
        for j in to_data_consolidated.index:
            #b = to_data_consolidated[i][j]
            b = to_data_consolidated[i].loc[j]
            d = to_data_total[i][j]
            oddsratio, pvalue = stats.fisher_exact([[a, b], [c, d]])
            p_vals.append(pvalue)

    # replace 0 with a small number (1 cell) to avoind inf
    from_data_consolidated_zero = from_data_consolidated.replace(0, 1, inplace=False)
    to_data_consolidated_zero = to_data_consolidated.replace(0, 1, inplace=False)

    # normalize based on area i.e total cells if user requests
    if normalize is True:
        # Normalize for total cells
        from_data_ratio = from_data_consolidated_zero.div(from_data_consolidated_zero.sum(axis=1), axis=0)
        to_data_ratio = to_data_consolidated_zero.div(to_data_consolidated_zero.sum(axis=1), axis=0)   
    else:
        from_data_ratio = from_data_consolidated_zero
        to_data_ratio = to_data_consolidated_zero

    # foldchange
    fold_change = to_data_ratio.div(from_data_ratio.values,  axis=1)
    fold_change.index.name = '-'.join(from_group)

    # reshape the pvalues to the todata df
    p_values = np.reshape(p_vals, to_data_consolidated.shape)
    p_values = pd.DataFrame(p_values, columns = to_data_consolidated.columns, index= to_data_consolidated.index)

    # return data
    adata.uns[str(label)+'_pval'] = p_values
    adata.uns[str(label)+'_fc'] = fold_change

    return adata