How to use sparse.mm in float16 training pipeline

See original GitHub issue

What is your question?

How can we assign certain operation (e.g. torch.sparse.mm) as float32 operation in float16 training setting?

Details and what I have tried

I am trying to train a model using

pl.Trainer(distributed_backend='ddp', precision=16, amp_level='01', gpus=2)

and I need to use sparse tensor multiplication in the forward loop. I got RuntimeError: "addmm_sparse_cuda" not implemented for 'Half' as reported in Pytorch issue #41069. However, this error remains even after I changed the variable type into float32.

I guess the apex or pytorch-lightening is still calling the sparse.mm with float16 setting. Is it possible to assign certain operation in the float16 training pipeline as float32 operation? Or if there is any alternative way that I can use torch.sparse.mm within float16 training process.

Reproduce

Initialize any model (e.g. the official MNIST demo), set

trainer = pl.Trainer(distributed_backend='ddp', precision=16, amp_level='01')

add following code in the forward function

a = torch.randn(3,2).float().cuda()
i = torch.LongTensor([[0, 1, 1],  [2, 0, 2]]) 
v = torch.FloatTensor([3, 4, 5]) 
b = torch.sparse.FloatTensor(i, v, torch.Size([2,3])).float().cuda()
c = torch.sparse.mm(b, a)

I cannot afford to do c= b.to_dense() @ a in practice, because of the limited GPU memory.

What’s your environment?

  • OS: Ubuntu 16.04
  • Packaging: conda
  • Pytorch: v1.6.0
  • Pytorch_lightning: v0.9.0
  • CUDA: 10.2

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
fuy34commented, Dec 29, 2020

Oho, I see. Please excuse me. I do not know why I kept typing “01”. It is “O1” for sure. I think I will temporally do the “dirty” way I mentioned above, while expecting the new feature from torch. Thank you!

0reactions
awaelchlicommented, Dec 29, 2020

okay, let me know if you run into more questions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to use sparse.mm in float16 training pipeline · Issue #5282
Details and what I have tried I am trying to train a model using pl. ... How to use sparse.mm in float16 training...
Read more >
Automatic Mixed Precision package - torch.amp - PyTorch
Mixed precision tries to match each op to its appropriate datatype. Ordinarily, “automatic mixed precision training” with datatype of torch.float16 uses torch.
Read more >
tf.sparse.sparse_dense_matmul | TensorFlow v2.11.0
Use the adjoint of A in the matrix multiply. If A is complex, this is transpose(conj(A)). Otherwise it's transpose(A). adjoint_b, Use the ...
Read more >
Mixed precision training using float16
In this tutorial you will walk through how one can train deep learning neural networks with mixed precision on supported hardware. You will...
Read more >
TRAINING NEURAL NETWORKS WITH TENSOR CORES
o Default math mode for single-precision training on NVIDIA Ampere GPU Architecture ... Operations that use atomic adds (e.g. floating-point addition).
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found