Skip to content

scimap_to_csv

Short Description

sm.hl.scimap_to_csv: This utility function facilitates exporting the contents of a scimap object (AnnData) to a CSV file, combining gene expression data from adata.X or adata.raw.X with cell metadata from adata.obs. It provides a streamlined way to save and share data for further analysis or documentation. Note that the export focuses on the expression matrix and associated cell annotations. It does not export information saved in adata.uns

Function

scimap_to_csv(adata, layer='raw', output_dir=None, file_name=None, CellID='CellID', verbose=True)

Parameters:

Name Type Description Default
adata AnnData

The annotated data matrix to export.

required
layer str

Specifies the layer to export: - 'raw': Exports the raw data. - 'log': Exports the data after applying a log transformation with np.log1p. - 'None': Exports the data in adata.X

'raw'
output_dir str

The directory where the CSV file will be saved. If not specified, the file is saved in the current working directory.

None
file_name str

The name of the output CSV file. If not provided, a default name scimap_to_csv_file.csv is used.

None
CellID str

The column name in the output CSV file that will contain the Cell IDs.

'CellID'
verbose bool

If True, prints messages about the export process.

True

Returns:

Name Type Description
DataFrame csv

The function does not return a value but saves the specified data to a CSV file in the designated directory.

Example
1
2
3
4
5
6
7
8
    # Export raw data to CSV
    sm.hl.scimap_to_csv(adata, layer='raw', output_dir='/path/to/save', file_name='raw_data.csv')

    # Export log-transformed data to CSV, with a custom CellID column name
    sm.hl.scimap_to_csv(adata, layer='log', output_dir='/path/to/save', file_name='log_data.csv', CellID='UniqueCellID')

    # Export scaled data to Cobject
    data = sm.hl.scimap_to_csv(adata, layer='scaled')
Source code in scimap/helpers/scimap_to_csv.py
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
def scimap_to_csv (adata, 
                   layer='raw', 
                   output_dir=None, 
                   file_name=None, 
                   CellID='CellID',
                   verbose=True):
    """
Parameters:
        adata (anndata.AnnData):  
            The annotated data matrix to export.

        layer (str, optional):  
            Specifies the layer to export:
            - 'raw': Exports the raw data.
            - 'log': Exports the data after applying a log transformation with `np.log1p`.
            - 'None': Exports the data  in `adata.X`

        output_dir (str, optional):  
            The directory where the CSV file will be saved. If not specified, the file is saved in the current working directory.

        file_name (str, optional):  
            The name of the output CSV file. If not provided, a default name `scimap_to_csv_file.csv` is used.

        CellID (str, optional):  
            The column name in the output CSV file that will contain the Cell IDs. 

        verbose (bool, optional):  
            If True, prints messages about the export process.

Returns:
        DataFrame (csv):  
            The function does not return a value but saves the specified data to a CSV file in the designated directory.

Example:
    ```python

        # Export raw data to CSV
        sm.hl.scimap_to_csv(adata, layer='raw', output_dir='/path/to/save', file_name='raw_data.csv')

        # Export log-transformed data to CSV, with a custom CellID column name
        sm.hl.scimap_to_csv(adata, layer='log', output_dir='/path/to/save', file_name='log_data.csv', CellID='UniqueCellID')

        # Export scaled data to Cobject
        data = sm.hl.scimap_to_csv(adata, layer='scaled')

    ```
    """

    # Load the andata object    
    if isinstance(adata, str):
        if file_name is None:
            imid = str(adata.rsplit('/', 1)[-1])
        else: 
            imid = str(file_name)
        adata = ad.read(adata)
    else:
        if file_name is None:
            imid = "scimap_to_csv_file.csv"
        else: 
            imid = str(file_name)
        adata = adata


    # Expression matrix
    if layer == 'raw':
        data = pd.DataFrame(adata.raw.X, index=adata.obs.index, columns=adata.var.index)
    elif layer is None:
        data = pd.DataFrame(adata.X, index=adata.obs.index, columns=adata.var.index)
    else:
        data = pd.DataFrame(adata.layers[layer], index=adata.obs.index, columns=adata.var.index)


# =============================================================================
#     # Expression matrix
#     if data_type == 'raw':
#         data = pd.DataFrame(adata.raw.X, index=adata.obs.index, columns=adata.var.index)
#     if data_type == 'log':
#         data = pd.DataFrame(np.log1p(adata.raw.X), index=adata.obs.index, columns=adata.var.index)
#     if data_type == 'scaled':
#         data = pd.DataFrame(adata.X, index=adata.obs.index, columns=adata.var.index)
# =============================================================================

    # Metadata
    meta = pd.DataFrame(adata.obs)

    # Merge the two dataframes
    merged = pd.concat([data, meta], axis=1, sort=False)

    # Add a column to save cell-id
    #merged['CellID'] = merged.index
    # make cellID the first column
    if CellID in merged.columns:
        first_column = merged.pop(CellID)
        merged.insert(0, CellID, first_column)
    else:
        merged['CellID'] = merged.index
        first_column = merged.pop(CellID)
        merged.insert(0, CellID, first_column)

    # reset index
    merged = merged.reset_index(drop=True)

    # Save data if requested
    if output_dir is not None:
        output_dir = pathlib.Path(output_dir)
        output_dir.mkdir(exist_ok=True, parents=True)
        merged.to_csv(output_dir / f'{imid}.csv', index=False)
    else:    
        # Return data
        return merged