cellpin.models.CellPin.impute#
- CellPin.impute(dataloader, obs_adata=None, mc_samples=50, mask_fraction=0.2, return_norm=False, norm_target_sum=1000.0, area_key=None, nb_count_samples=100, return_int=False, return_sparse=True, table_key='table')#
Impute with MC averaging and optional count-space normalisation.
- Return type:
- Args:
dataloader: DataLoader to run inference on. obs_adata: Optional AnnData (or
spatialdata.SpatialData) whose.obsis copied to the output. If SpatialData, the AnnData is read fromobs_adata.tables[table_key]and the result is returned as an updated SpatialData object. Must have the same number of observations.- mc_samples: Number of stochastic forward passes for MC averaging
(default 50; more → smoother but slower).
- mask_fraction: Fraction of panel genes randomly zeroed per MC pass
to simulate missing measurements (default 0.2).
- return_norm: If
True, add a log-normalised layer layers['imputed_norm'](total-count or area normalised, then log1p-transformed).- norm_target_sum: Target total counts for normalisation
(default 1e3; only used when
return_norm=True).- area_key:
obscolumn with cell area for area-based normalisation. Auto-detected as
'cell_area'when present; passNonefor total-count normalisation (only used whenreturn_norm=True).- nb_count_samples: Number of NB draws used to compute the MC estimate
of
E[log1p(norm(X))]whenreturn_norm=True(default 100). Because log1p is concave, Jensen’s inequality meanslog1p(norm(E[X])) > E[log1p(norm(X))]; sampling inside the transform corrects this bias. More samples → lower variance.
return_int: If
True, roundXto integer counts (int32). return_sparse: IfTrue(default), storeX,layers['imputed'],and
layers['imputed_norm']asscipy.sparse.csr_matrix. Set toFalseto keep dense numpy arrays.- table_key: Table name to read/write when
obs_adatais a SpatialData object (default
"table").
Returns:#
- :
anndata.AnnDatawithX= imputed (float or int) counts,obsm['X_cellpin']= embeddings,layers['imputed']= copy ofX, and optionallylayers['imputed_norm'].var['is_measured']marks genes present inobs_adata(allTruewhenobs_adataisNone). Ifobs_adatawas a SpatialData object, returns the updated SpatialData with the result stored insdata.tables[table_key].
Raises:#
- ValueError: If
obs_adatahas the wrong number of cells, or if area_keyis specified but not found inadata.obs, or if any cell area is ≤ 0.