Any interest in getting SCTransform up on Scanpy?

Additional function parameters / changed functionality / changed defaults?
New analysis tool: A simple analysis tool you have been using and are missing in sc.tools?
New plotting function: A kind of plot you would like to seein sc.pl?
External tools: Do you know an existing package that should go into sc.external.*?
Other?

I recently ported SCTransform from R into python. Any interest in getting it onto Scanpy?

The original paper is here. It’s a variance-stabilizing transformation that overcomes some key drawbacks of previous, similar methods (e.g. overfitting caused by building regression models from individual genes as opposed to groups of similar genes). It also eliminates the need for pseudocounts, log transformations, or library size normalization.

My code is here.

Implementation notes (from the SCTransformPy README):

Poisson regression is done using the statsmodels package and parallelized with multiprocessing.
Improved Sheather & Jones bandwidth calculation is implemented by the KDEpy package.
Estimating the negative binomial dispersion factor, theta, using MLE was translated from the theta.ml function in R.
Pearson residuals are automatically clipped to be non-negative. This ensures that sparsity structure can be preserved in the data. Practically, the results do not change much when allowing for dense, negative values.

Anecdotally, it produces very similar results to the R implementation, though the code itself is still a little rough around the edges. I also have to do more formal quantitative benchmarking to ensure results are similar to those of the original package.

I thought I’d gauge interest here prior to working on making it scanpy-ready.

Issue Analytics

State:
Created 3 years ago
Reactions:22
Comments:25 (19 by maintainers)

Top GitHub Comments

19reactions

hurleyLicommented, Jun 2, 2021

Hi guys, I’ve been following this thread and it’s been quiet recently 😃 wondering if there’s any updates on incorporating ScTransform on Scanpy. Thanks!! 🙏🏼

4reactions

jlausecommented, Mar 4, 2021

Hi @atarashansky and everyone following this interesting discussion!

I just found this issue after posting quite a related PR yesterday (#1715) that came out of a discussion from the end of last year (berenslab/umi-normalization#1), and wanted to leave a note about that relation here:

In my PR, I implement normalization by analytic Pearson residuals based on a NB offset model, which is an improved/simplified version of the scTransform model that does not need regularization by smoothing anymore… This brings some theoretical advantages and we found it works well in practice (details in this preprint with @dkobak ).

One of the differences remaining between the two is how the overdispersion theta is treated (scTransform: fitted per gene, analytical residuals: fixed to one theta for all genes based on negative controls). I think fixing theta like that makes a lot of sense, but also thought about adding a function that learns a global theta from the data. With some modifications that could be another fruitful use-case of your theta.ml python implementation @atarashansky.

Also, I’m curious about the clipping of the Pearson residuals to [0, sqrt(n/30)] in your method. We also find that clipping is an important step for obtaining sensible analytical residuals, and I recently though a bit about motivating different cutoffs - so I’d be interested to learn what is behind your choice of sqrt(n/30)!

Looking forward to you thoughts on this 😃 Jan.

Top Results From Across the Web

Any interest in getting SCTransform up on Scanpy? #1643

Any interest in getting it onto Scanpy? The original paper is here. It's a variance-stabilizing transformation that overcomes some key drawbacks ...

Run scTransform from scanpy - Discourse

There has been some discussion of implementing scTransform in scanpy but I cannot ... Any interest in getting SCTransform up on Scanpy?

Lior Pachter on Twitter: "The term "sctransform" is both a ...

Scanpy is a scalable toolkit for analyzing single-cell gene expression data. It includes methods for preprocessing, visualization, clustering, ...

Scanpy – Single-Cell Analysis in Python — Scanpy 1.9.1 ...

Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, ...

Triku: a feature selection method based on nearest neighbors ...

Genes with an expected lower percentage of zeros tend to have an even ... showed a higher mixture degree in scanpy, seurat, and...