cellpin.models.CellPin.impute

Contents

cellpin.models.CellPin.impute#

CellPin.impute(dataloader, obs_adata=None, mc_samples=50, mask_fraction=0.2, return_norm=False, norm_target_sum=1000.0, area_key=None, nb_count_samples=100, return_int=False, return_sparse=True, table_key='table')#

Impute with MC averaging and optional count-space normalisation.

Return type:

AnnData

Args:

dataloader: DataLoader to run inference on. obs_adata: Optional AnnData (or spatialdata.SpatialData) whose

.obs is copied to the output. If SpatialData, the AnnData is read from obs_adata.tables[table_key] and the result is returned as an updated SpatialData object. Must have the same number of observations.

mc_samples: Number of stochastic forward passes for MC averaging

(default 50; more → smoother but slower).

mask_fraction: Fraction of panel genes randomly zeroed per MC pass

to simulate missing measurements (default 0.2).

return_norm: If True, add a log-normalised layer

layers['imputed_norm'] (total-count or area normalised, then log1p-transformed).

norm_target_sum: Target total counts for normalisation

(default 1e3; only used when return_norm=True).

area_key: obs column with cell area for area-based normalisation.

Auto-detected as 'cell_area' when present; pass None for total-count normalisation (only used when return_norm=True).

nb_count_samples: Number of NB draws used to compute the MC estimate

of E[log1p(norm(X))] when return_norm=True (default 100). Because log1p is concave, Jensen’s inequality means log1p(norm(E[X])) > E[log1p(norm(X))]; sampling inside the transform corrects this bias. More samples → lower variance.

return_int: If True, round X to integer counts (int32). return_sparse: If True (default), store X, layers['imputed'],

and layers['imputed_norm'] as scipy.sparse.csr_matrix. Set to False to keep dense numpy arrays.

table_key: Table name to read/write when obs_adata is a SpatialData

object (default "table").

Returns:#

:

anndata.AnnData with X = imputed (float or int) counts, obsm['X_cellpin'] = embeddings, layers['imputed'] = copy of X, and optionally layers['imputed_norm']. var['is_measured'] marks genes present in obs_adata (all True when obs_adata is None). If obs_adata was a SpatialData object, returns the updated SpatialData with the result stored in sdata.tables[table_key].

Raises:#

ValueError: If obs_adata has the wrong number of cells, or if

area_key is specified but not found in adata.obs, or if any cell area is ≤ 0.