Skip to content

combat

Short Description

ComBat is a well-established method for correcting batch effects in high-dimensional data, such as single-cell RNA-seq. This implementation uses the combat function to correct batch effects across multiple slides.

Function

combat(adata, batch='imageid', layer=None, log=False, replaceOriginal=False, label='combat')

Parameters:

Name Type Description Default
adata AnnData object

Annotated data matrix.

required
batch str

The batch key or column in adata.obs that indicates the batches for each cell.

'imageid'
layer str or None

The layer in adata.layers that contains the expression data to correct. If None, adata.X is used. use raw to use the data stored in adata.raw.X

None
log bool

Whether to log transform the data before applying ComBat. Generally use it with raw.

False
replaceOriginal bool

Whether to replace the original expression data in adata with the corrected data.

False
label str

The prefix for the key in adata that will contain the corrected data. If replaceOriginal is True, this parameter has no effect.

'combat'

Returns:

Name Type Description
adata anndata

The corrected expression data is stored in a new layer adata.layers['combat'].

Example
1
2
3
4
5
6
7
8
9
# applying batch correction using raw data
adata = sm.pp.combat (adata,
                batch='imageid',
                layer='raw',
                log=True,
                replaceOriginal=False,
                label='combat')

# results will be available in adata.layers['combat']
Source code in scimap/preprocessing/combat.py
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
def combat(
    adata,
    batch='imageid',
    layer=None,
    log=False,
    replaceOriginal=False,
    label='combat'):

    """
Parameters:
    adata (AnnData object):  
        Annotated data matrix.

    batch (str, optional):  
        The batch key or column in `adata.obs` that indicates the batches for each cell.

    layer (str or None, optional):
        The layer in `adata.layers` that contains the expression data to correct. If None, 
        `adata.X` is used. use `raw` to use the data stored in `adata.raw.X`

    log (bool, optional):  
        Whether to log transform the data before applying ComBat. Generally use it with `raw`.

    replaceOriginal (bool, optional):
        Whether to replace the original expression data in `adata` with the corrected data.

    label (str, optional):  
        The prefix for the key in `adata` that will contain the corrected data. If `replaceOriginal` is `True`, this parameter has no effect.  

Returns:
    adata (anndata):  
        The corrected expression data is stored in a new layer `adata.layers['combat']`.

Example:
    ```python

    # applying batch correction using raw data
    adata = sm.pp.combat (adata,
                    batch='imageid',
                    layer='raw',
                    log=True,
                    replaceOriginal=False,
                    label='combat')

    # results will be available in adata.layers['combat']

    ```
    """

    # isolate the data
    if layer is None:
        data = pd.DataFrame(adata.X, index=adata.obs.index, columns=adata.var.index)
    elif layer == 'raw':
        data = pd.DataFrame(adata.raw.X, index=adata.obs.index, columns=adata.var.index)
    else:
        data = pd.DataFrame(adata.layers[layer], index=adata.obs.index, columns=adata.var.index)

    # log the data if requested
    if log is True:
        data = np.log1p(data)

    # isolate batch
    batchData = adata.obs[batch]

    # convert to category
    batchData = batchData.astype('category')

    # make sure there are atleast two batches
    if len(batchData.unique()) < 2:
        raise Exception(
            "Sorry a minimum of 2 batches is required. Please check the '"
            + str(batch)
            + "' column"
        )

    # perform batch correction
    batchCorrected = pycombat(data.T, batchData).T

    # add as a specific layer
    adata.layers[label] = batchCorrected

    # replace original
    if replaceOriginal is True:
        if layer is None:
            adata.X = batchCorrected
        elif layer == 'raw':
            del adata.raw
            adata.raw = ad.AnnData(batchCorrected, obs=adata.obs)
        else:
            adata.layers[layer] = batchCorrected

    # return adata
    return adata