Unable to use GPU accelerated Optimum Onnx transformer model for inference

See original GitHub issue

System Info

Optimum Version: 1.5.0
Ubuntu 20.04 Linux 
Python version 3.8

Who can help?

@JingyaHuang @echarlaix When following the documentation on https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/gpu for 1.5.0 version optimum. We get the following error:


RuntimeError Traceback (most recent call last) <ipython-input-7-8429fcab1e09> in <module> 19 “education”, 20 “music”] —> 21 pred = onnx_z0(sequence_to_classify, candidate_labels, multi_class=False)

8 frames /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in bind_input(self, name, device_type, device_id, element_type, shape, buffer_ptr) 454 :param buffer_ptr: memory pointer to input data 455 “”" –> 456 self._iobinding.bind_input( 457 name, 458 C.OrtDevice(

RuntimeError: Error when binding input: There’s no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]

This is reproducible on google colab gpu instance as well. This is observed from 1.5.0 version only and 1.4.1 works as expected.

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

!pip install optimum[onnxruntime-gpu]==1.5.1 !pip install transformers onnx

from optimum.onnxruntime import ORTModelForSequenceClassification

ort_model = ORTModelForSequenceClassification.from_pretrained( “philschmid/tiny-bert-sst2-distilled”, from_transformers=True, provider=“CUDAExecutionProvider”, )

from optimum.pipelines import pipeline from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(“philschmid/tiny-bert-sst2-distilled”)

pipe = pipeline(task=“text-classification”, model=ort_model, tokenizer=tokenizer) result = pipe(“Both the music and visual were astounding, not to mention the actors performance.”) print(result)

Expected behavior

Inference fails due to device error, which is not expected.

Issue Analytics

  • State:closed
  • Created 9 months ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
fxmartycommented, Dec 13, 2022

For sure, thanks a lot! Don’t hesitate if you need any guidance!

0reactions
fxmartycommented, Dec 20, 2022

@smiraldr So as I understand in fact it was a device indexing issue, @JingyaHuang fixed it in https://github.com/huggingface/optimum/pull/613 . So your PR looks good as is, moving the discussion there!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Accelerated inference on NVIDIA GPUs - Hugging Face
Accelerated inference on NVIDIA GPUs. By default, ONNX Runtime runs inference on CPU devices. ... Use CUDA execution provider with floating-point models.
Read more >
Inference performance drop 22X on GPU hardware ... - GitHub
We expected that the performance results are closed between the transformer backend and optimum[onnxruntime-gpu] backend. But it turns out that optimum is 22X ......
Read more >
Accelerate Transformer inference on CPU with Optimum and ...
In this video, I show you how to accelerate Transformer inference with Optimum, an open source library by Hugging Face, and ONNX.
Read more >
Optimizing Transformers with Hugging Face Optimum
Apply graph optimization techniques to the ONNX model; Apply dynamic quantization using ORTQuantizer from Optimum; Test inference with the ...
Read more >
Deploying GPT-J and T5 with NVIDIA Triton Inference Server
For an introduction to the FasterTransformer library (Part 1), see Accelerated Inference for Large Transformer Models Using NVIDIA Triton ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found