scimap_to_csv

Short Description

sm.hl.scimap_to_csv: This utility function facilitates exporting the contents of a scimap object (AnnData) to a CSV file, combining gene expression data from adata.X or adata.raw.X with cell metadata from adata.obs. It provides a streamlined way to save and share data for further analysis or documentation. Note that the export focuses on the expression matrix and associated cell annotations. It does not export information saved in adata.uns

Function

`scimap_to_csv(adata, layer='raw', output_dir=None, file_name=None, CellID='CellID', verbose=True)`

Parameters:

Name	Type	Description	Default
`adata`	`AnnData`	The annotated data matrix to export.	required
`layer`	`str`	Specifies the layer to export: - 'raw': Exports the raw data. - 'log': Exports the data after applying a log transformation with `np.log1p`. - 'None': Exports the data in `adata.X`	`'raw'`
`output_dir`	`str`	The directory where the CSV file will be saved. If not specified, the file is saved in the current working directory.	`None`
`file_name`	`str`	The name of the output CSV file. If not provided, a default name `scimap_to_csv_file.csv` is used.	`None`
`CellID`	`str`	The column name in the output CSV file that will contain the Cell IDs.	`'CellID'`
`verbose`	`bool`	If True, prints messages about the export process.	`True`

Returns:

Name	Type	Description
`DataFrame`	`csv`	The function does not return a value but saves the specified data to a CSV file in the designated directory.

Example

    # Export raw data to CSV
    sm.hl.scimap_to_csv(adata, layer='raw', output_dir='/path/to/save', file_name='raw_data.csv')

    # Export log-transformed data to CSV, with a custom CellID column name
    sm.hl.scimap_to_csv(adata, layer='log', output_dir='/path/to/save', file_name='log_data.csv', CellID='UniqueCellID')

    # Export scaled data to Cobject
    data = sm.hl.scimap_to_csv(adata, layer='scaled')

Source code in scimap/helpers/scimap_to_csv.py

def scimap_to_csv(
    adata, layer='raw', output_dir=None, file_name=None, CellID='CellID', verbose=True
):
    """
    Parameters:
            adata (anndata.AnnData):
                The annotated data matrix to export.

            layer (str, optional):
                Specifies the layer to export:
                - 'raw': Exports the raw data.
                - 'log': Exports the data after applying a log transformation with `np.log1p`.
                - 'None': Exports the data  in `adata.X`

            output_dir (str, optional):
                The directory where the CSV file will be saved. If not specified, the file is saved in the current working directory.

            file_name (str, optional):
                The name of the output CSV file. If not provided, a default name `scimap_to_csv_file.csv` is used.

            CellID (str, optional):
                The column name in the output CSV file that will contain the Cell IDs.

            verbose (bool, optional):
                If True, prints messages about the export process.

    Returns:
            DataFrame (csv):
                The function does not return a value but saves the specified data to a CSV file in the designated directory.

    Example:
        ```python

            # Export raw data to CSV
            sm.hl.scimap_to_csv(adata, layer='raw', output_dir='/path/to/save', file_name='raw_data.csv')

            # Export log-transformed data to CSV, with a custom CellID column name
            sm.hl.scimap_to_csv(adata, layer='log', output_dir='/path/to/save', file_name='log_data.csv', CellID='UniqueCellID')

            # Export scaled data to Cobject
            data = sm.hl.scimap_to_csv(adata, layer='scaled')

        ```
    """

    # Load the andata object
    if isinstance(adata, str):
        if file_name is None:
            imid = str(adata.rsplit('/', 1)[-1])
        else:
            imid = str(file_name)
        adata = ad.read_h5ad(adata)
    else:
        if file_name is None:
            imid = "scimap_to_csv_file.csv"
        else:
            imid = str(file_name)
        adata = adata

    # Expression matrix
    if layer == 'raw':
        data = pd.DataFrame(adata.raw.X, index=adata.obs.index, columns=adata.var.index)
    elif layer is None:
        data = pd.DataFrame(adata.X, index=adata.obs.index, columns=adata.var.index)
    else:
        data = pd.DataFrame(
            adata.layers[layer], index=adata.obs.index, columns=adata.var.index
        )

    # =============================================================================
    #     # Expression matrix
    #     if data_type == 'raw':
    #         data = pd.DataFrame(adata.raw.X, index=adata.obs.index, columns=adata.var.index)
    #     if data_type == 'log':
    #         data = pd.DataFrame(np.log1p(adata.raw.X), index=adata.obs.index, columns=adata.var.index)
    #     if data_type == 'scaled':
    #         data = pd.DataFrame(adata.X, index=adata.obs.index, columns=adata.var.index)
    # =============================================================================

    # Metadata
    meta = pd.DataFrame(adata.obs)

    # Merge the two dataframes
    merged = pd.concat([data, meta], axis=1, sort=False)

    # Add a column to save cell-id
    # merged['CellID'] = merged.index
    # make cellID the first column
    if CellID in merged.columns:
        first_column = merged.pop(CellID)
        merged.insert(0, CellID, first_column)
    else:
        merged['CellID'] = merged.index
        first_column = merged.pop(CellID)
        merged.insert(0, CellID, first_column)

    # reset index
    merged = merged.reset_index(drop=True)

    # Save data if requested
    if output_dir is not None:
        output_dir = pathlib.Path(output_dir)
        output_dir.mkdir(exist_ok=True, parents=True)
        merged.to_csv(output_dir / f'{imid}.csv', index=False)
    else:
        # Return data
        return merged