Skip to content

scimap_to_csv

Short Description

sm.hl.scimap_to_csv: Helper function that allows users to save the contents of the scimap object as a csv file. Please not that anything that it only saves elements that are within adata.X or adata.raw.X and adata.obs.

Function

scimap_to_csv(adata, data_type='raw', output_dir=None, file_name=None, CellID='CellID')

Parameters:

Name Type Description Default
adata

AnnData object loaded into memory or path to AnnData object.

required
data_type string

Three options are available:
1) 'raw' - The raw data will be returned.
2) 'log' - The raw data converted to log scale using np.log1p will be returned.
3) 'scaled' - If you have scaled the data using the sm.pp.rescale, that will be returned. Please note, if you have not scaled the data, whatever is within adata.X will be returned.

'raw'
output_dir string

Path to output directory.

None
file_name string

Name the output csv file. Use in combination with output_dir parameter. If no file name is provided a default name scimap_to_csv_file.csv will be used.

None
CellID string

Name of the column which contains the CellID. Default is CellID.

'CellID'

Returns:

Name Type Description
merged DataFrame

A single dataframe containing the expression and metadata will be returned.

1
    data = sm.hl.scimap_to_csv (adata, data_type='raw')
Source code in scimap/helpers/_scimap_to_csv.py
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def scimap_to_csv (adata, data_type='raw', output_dir=None, file_name=None, CellID='CellID'):
    """
Parameters:
    adata : AnnData object loaded into memory or path to AnnData object.

    data_type (string):  
        Three options are available:  
        1) 'raw' - The raw data will be returned.  
        2) 'log' - The raw data converted to log scale using `np.log1p` will be returned.  
        3) 'scaled' - If you have scaled the data using the `sm.pp.rescale`, that will be
        returned. Please note, if you have not scaled the data, whatever is within
        `adata.X` will be returned.

    output_dir (string):  
        Path to output directory.

    file_name (string):  
        Name the output csv file. Use in combination with `output_dir` parameter. If no
        file name is provided a default name `scimap_to_csv_file.csv` will be used. 

    CellID (string):  
        Name of the column which contains the CellID. Default is `CellID`.  

Returns:
    merged (DataFrame):  
        A single dataframe containing the expression and metadata will be returned.

Example:
```python
    data = sm.hl.scimap_to_csv (adata, data_type='raw')
```
    """

    # Load the andata object    
    if isinstance(adata, str):
        if file_name is None:
            imid = str(adata.rsplit('/', 1)[-1])
        else: 
            imid = str(file_name)
        adata = ad.read(adata)
    else:
        if file_name is None:
            imid = "scimap_to_csv_file.csv"
        else: 
            imid = str(file_name)
        adata = adata

    # Expression matrix
    if data_type == 'raw':
        data = pd.DataFrame(adata.raw.X, index=adata.obs.index, columns=adata.var.index)
    if data_type == 'log':
        data = pd.DataFrame(np.log1p(adata.raw.X), index=adata.obs.index, columns=adata.var.index)
    if data_type == 'scaled':
        data = pd.DataFrame(adata.X, index=adata.obs.index, columns=adata.var.index)

    # Metadata
    meta = pd.DataFrame(adata.obs)

    # Merge the two dataframes
    merged = pd.concat([data, meta], axis=1, sort=False)

    # Add a column to save cell-id
    #merged['CellID'] = merged.index
    # make cellID the first column
    if CellID in merged.columns:
        first_column = merged.pop(CellID)
        merged.insert(0, CellID, first_column)
    else:
        merged['CellID'] = merged.index
        first_column = merged.pop(CellID)
        merged.insert(0, CellID, first_column)

    # reset index
    merged = merged.reset_index(drop=True)

    # Save data if requested
    if output_dir is not None:
        output_dir = pathlib.Path(output_dir)
        output_dir.mkdir(exist_ok=True, parents=True)
        merged.to_csv(output_dir / f'{imid}.csv', index=False)
    else:    
        # Return data
        return merged