How to use sparse.mm in float16 training pipeline
See original GitHub issueWhat is your question?
How can we assign certain operation (e.g. torch.sparse.mm) as float32 operation in float16 training setting?
Details and what I have tried
I am trying to train a model using
pl.Trainer(distributed_backend='ddp', precision=16, amp_level='01', gpus=2)
and I need to use sparse tensor multiplication in the forward loop. I got RuntimeError: "addmm_sparse_cuda" not implemented for 'Half' as reported in Pytorch issue #41069. However, this error remains even after I changed the variable type into float32.
I guess the apex or pytorch-lightening is still calling the sparse.mm with float16 setting. Is it possible to assign certain operation in the float16 training pipeline as float32 operation? Or if there is any alternative way that I can use torch.sparse.mm within float16 training process.
Reproduce
Initialize any model (e.g. the official MNIST demo), set
trainer = pl.Trainer(distributed_backend='ddp', precision=16, amp_level='01')
add following code in the forward function
a = torch.randn(3,2).float().cuda()
i = torch.LongTensor([[0, 1, 1], [2, 0, 2]])
v = torch.FloatTensor([3, 4, 5])
b = torch.sparse.FloatTensor(i, v, torch.Size([2,3])).float().cuda()
c = torch.sparse.mm(b, a)
I cannot afford to do c= b.to_dense() @ a in practice, because of the limited GPU memory.
What’s your environment?
- OS: Ubuntu 16.04
- Packaging: conda
- Pytorch: v1.6.0
- Pytorch_lightning: v0.9.0
- CUDA: 10.2
Issue Analytics
- State:
- Created 3 years ago
- Comments:8 (5 by maintainers)
Top Related StackOverflow Question
Oho, I see. Please excuse me. I do not know why I kept typing “01”. It is “O1” for sure. I think I will temporally do the “dirty” way I mentioned above, while expecting the new feature from torch. Thank you!
okay, let me know if you run into more questions.