Any interest in getting SCTransform up on Scanpy?
See original GitHub issue- Additional function parameters / changed functionality / changed defaults?
- New analysis tool: A simple analysis tool you have been using and are missing in
sc.tools? - New plotting function: A kind of plot you would like to seein
sc.pl? - External tools: Do you know an existing package that should go into
sc.external.*? - Other?
I recently ported SCTransform from R into python. Any interest in getting it onto Scanpy?
The original paper is here. It’s a variance-stabilizing transformation that overcomes some key drawbacks of previous, similar methods (e.g. overfitting caused by building regression models from individual genes as opposed to groups of similar genes). It also eliminates the need for pseudocounts, log transformations, or library size normalization.
My code is here.
Implementation notes (from the SCTransformPy README):
- Poisson regression is done using the
statsmodelspackage and parallelized withmultiprocessing. - Improved Sheather & Jones bandwidth calculation is implemented by the
KDEpypackage. - Estimating the negative binomial dispersion factor,
theta, using MLE was translated from thetheta.mlfunction in R. - Pearson residuals are automatically clipped to be non-negative. This ensures that sparsity structure can be preserved in the data. Practically, the results do not change much when allowing for dense, negative values.
Anecdotally, it produces very similar results to the R implementation, though the code itself is still a little rough around the edges. I also have to do more formal quantitative benchmarking to ensure results are similar to those of the original package.
I thought I’d gauge interest here prior to working on making it scanpy-ready.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:22
- Comments:25 (19 by maintainers)
Top Related StackOverflow Question
Hi guys, I’ve been following this thread and it’s been quiet recently 😃 wondering if there’s any updates on incorporating ScTransform on Scanpy. Thanks!! 🙏🏼
Hi @atarashansky and everyone following this interesting discussion!
I just found this issue after posting quite a related PR yesterday (#1715) that came out of a discussion from the end of last year (berenslab/umi-normalization#1), and wanted to leave a note about that relation here:
In my PR, I implement normalization by analytic Pearson residuals based on a NB offset model, which is an improved/simplified version of the scTransform model that does not need regularization by smoothing anymore… This brings some theoretical advantages and we found it works well in practice (details in this preprint with @dkobak ).
One of the differences remaining between the two is how the overdispersion
thetais treated (scTransform: fitted per gene, analytical residuals: fixed to onethetafor all genes based on negative controls). I think fixingthetalike that makes a lot of sense, but also thought about adding a function that learns a globalthetafrom the data. With some modifications that could be another fruitful use-case of yourtheta.mlpython implementation @atarashansky.Also, I’m curious about the clipping of the Pearson residuals to
[0, sqrt(n/30)]in your method. We also find that clipping is an important step for obtaining sensible analytical residuals, and I recently though a bit about motivating different cutoffs - so I’d be interested to learn what is behind your choice ofsqrt(n/30)!Looking forward to you thoughts on this 😃 Jan.