How to use sparse.mm in float16 training pipeline

What is your question?

How can we assign certain operation (e.g. torch.sparse.mm) as float32 operation in float16 training setting?

Details and what I have tried

I am trying to train a model using

pl.Trainer(distributed_backend='ddp', precision=16, amp_level='01', gpus=2)

and I need to use sparse tensor multiplication in the forward loop. I got RuntimeError: "addmm_sparse_cuda" not implemented for 'Half' as reported in Pytorch issue #41069. However, this error remains even after I changed the variable type into float32.

I guess the apex or pytorch-lightening is still calling the sparse.mm with float16 setting. Is it possible to assign certain operation in the float16 training pipeline as float32 operation? Or if there is any alternative way that I can use torch.sparse.mm within float16 training process.

Reproduce

Initialize any model (e.g. the official MNIST demo), set

trainer = pl.Trainer(distributed_backend='ddp', precision=16, amp_level='01')

add following code in the forward function

a = torch.randn(3,2).float().cuda()
i = torch.LongTensor([[0, 1, 1],  [2, 0, 2]]) 
v = torch.FloatTensor([3, 4, 5]) 
b = torch.sparse.FloatTensor(i, v, torch.Size([2,3])).float().cuda()
c = torch.sparse.mm(b, a)

I cannot afford to do c= b.to_dense() @ a in practice, because of the limited GPU memory.

What’s your environment?

OS: Ubuntu 16.04
Packaging: conda
Pytorch: v1.6.0
Pytorch_lightning: v0.9.0
CUDA: 10.2

Issue Analytics

State:
Created 3 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

fuy34commented, Dec 29, 2020

Oho, I see. Please excuse me. I do not know why I kept typing “01”. It is “O1” for sure. I think I will temporally do the “dirty” way I mentioned above, while expecting the new feature from torch. Thank you!

0reactions

awaelchlicommented, Dec 29, 2020

okay, let me know if you run into more questions.

Top Results From Across the Web

How to use sparse.mm in float16 training pipeline · Issue #5282

Details and what I have tried I am trying to train a model using pl. ... How to use sparse.mm in float16 training...

Automatic Mixed Precision package - torch.amp - PyTorch

Mixed precision tries to match each op to its appropriate datatype. Ordinarily, “automatic mixed precision training” with datatype of torch.float16 uses torch.

tf.sparse.sparse_dense_matmul | TensorFlow v2.11.0

Use the adjoint of A in the matrix multiply. If A is complex, this is transpose(conj(A)). Otherwise it's transpose(A). adjoint_b, Use the ...

Mixed precision training using float16

In this tutorial you will walk through how one can train deep learning neural networks with mixed precision on supported hardware. You will...

TRAINING NEURAL NETWORKS WITH TENSOR CORES

o Default math mode for single-precision training on NVIDIA Ampere GPU Architecture ... Operations that use atomic adds (e.g. floating-point addition).