CUDA driver and runtime could not be initialized with python celery environment

See original GitHub issue

直接执行

python3 tools/infer/predict_system.py   --image_dir="./doc/imgs/11.jpeg"   --det_model_dir="./inference/ch_ppocr_server_v1.1_det_infer/"    --rec_model_dir="./inference/ch_ppocr_server_v1.1_rec_infer/"   --cls_model_dir="./inference/ch_ppocr_mobile_v1.1_cls_infer/"   --use_angle_cls=True   --use_space_char=True

命令是正常的, 但是放到celery work中执行报如下错误

[2020-10-30 18:48:03,199: ERROR/ForkPoolWorker-2] Traceback (most recent call last):
  File "/home/paddle/paddle-ocr/PaddleOCR-develop/celery_app/ocr_task.py", line 60, in ocr
    dt_boxes, rec_res = ocr.text_system(img)
  File "/home/paddle/venv/lib/python3.7/site-packages/celery/local.py", line 143, in __getattr__
    return getattr(self._get_current_object(), name)
  File "/home/paddle/paddle-ocr/PaddleOCR-develop/celery_app/ocr_task.py", line 42, in text_system
    self._text_system = TextSystem(args)
  File "/home/paddle/paddle-ocr/PaddleOCR-develop/tools/infer/predict_system.py", line 41, in __init__
    self.text_detector = predict_det.TextDetector(args)
  File "/home/paddle/paddle-ocr/PaddleOCR-develop/tools/infer/predict_det.py", line 80, in __init__
    utility.create_predictor(args, mode="det")
  File "/home/paddle/paddle-ocr/PaddleOCR-develop/tools/infer/utility.py", line 144, in create_predictor
    predictor = create_paddle_predictor(config)
paddle.fluid.core_avx.EnforceNotMet:

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig>(paddle::AnalysisConfig const&)
1   std::unique_ptr<paddle::PaddlePredictor, std::default_delete<paddle::PaddlePredictor> > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
2   paddle::AnalysisConfig::fraction_of_gpu_memory_for_pool() const
3   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
4   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
5   paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

----------------------
Error Message Summary:
----------------------
ExternalError:  Cuda error(3), initialization error.
  [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:219)

环境如下: nvcc: NVIDIA ® Cuda compiler driver Copyright © 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243

libcudnn.so.7.6.5

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5

github_iconTop GitHub Comments

2reactions
admin0528commented, Apr 25, 2021

celery work的执行环境是不是和你命令执行是同一个cuda环境,命令执行时,设置–use_gpu = True尝试

见/ PaddlePaddle / PaddleOCR / blob / develop / tools / infer / utility。从deploy /模块中看,意思是GPU模式不支持单卡多进程或者多线程吗?

你好,请问这个问题解决了吗?

0reactions
f4z3k4scommented, Nov 30, 2021

I am experiencing exactly the same with celery. Works perfectly without celery, but whenever I run my script with celery, I get:

text-recognizer_1  | [2021-11-30 16:18:26,579: WARNING/ForkPoolWorker-9] text_recognition: started
text-recognizer_1  | [2021-11-30 16:18:26,594: WARNING/ForkPoolWorker-9] OCR started
text-recognizer_1  | [2021-11-30 16:18:26,607: ERROR/ForkPoolWorker-9] Task tasks.text_recognition[c8e455ab-f39d-4e97-9ba2-29eb65128045] raised unexpected: OSError("(External) CUDA error(3), initialization error. \n  [Hint: 'cudaErrorInitializationError'. The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:283)\n")
text-recognizer_1  | Traceback (most recent call last):
text-recognizer_1  |   File "/root/.cache/pypoetry/virtualenvs/text-recognizer-9TtSrW0h-py3.7/lib/python3.7/site-packages/celery/app/trace.py", line 451, in trace_task
text-recognizer_1  |     R = retval = fun(*args, **kwargs)
text-recognizer_1  |   File "/root/.cache/pypoetry/virtualenvs/text-recognizer-9TtSrW0h-py3.7/lib/python3.7/site-packages/celery/app/trace.py", line 734, in __protected_call__
text-recognizer_1  |     return self.run(*args, **kwargs)
text-recognizer_1  |   File "/app/src/tasks.py", line 21, in text_recognition
text-recognizer_1  |     result = ocr_engine.infer(image, source_language)
text-recognizer_1  |   File "/app/src/engines/engine_paddleocr.py", line 78, in infer
text-recognizer_1  |     recognitions = [self.run_inference(base64str, source_language=key) for key in self.engines.keys()]
text-recognizer_1  |   File "/app/src/engines/engine_paddleocr.py", line 78, in <listcomp>
text-recognizer_1  |     recognitions = [self.run_inference(base64str, source_language=key) for key in self.engines.keys()]
text-recognizer_1  |   File "/app/src/engines/engine_paddleocr.py", line 62, in run_inference
text-recognizer_1  |     result = engine.ocr(image, det=False, cls=False)
text-recognizer_1  |   File "/root/.cache/pypoetry/virtualenvs/text-recognizer-9TtSrW0h-py3.7/lib/python3.7/site-packages/paddleocr/paddleocr.py", line 283, in ocr
text-recognizer_1  |     rec_res, elapse = self.text_recognizer(img)
text-recognizer_1  |   File "/root/.cache/pypoetry/virtualenvs/text-recognizer-9TtSrW0h-py3.7/lib/python3.7/site-packages/paddleocr/tools/infer/predict_rec.py", line 251, in __call__
text-recognizer_1  |     self.input_tensor.copy_from_cpu(norm_img_batch)
text-recognizer_1  |   File "/root/.cache/pypoetry/virtualenvs/text-recognizer-9TtSrW0h-py3.7/lib/python3.7/site-packages/paddle/fluid/inference/wrapper.py", line 35, in tensor_copy_from_cpu
text-recognizer_1  |     self.copy_from_cpu_bind(data)
text-recognizer_1  | OSError: (External) CUDA error(3), initialization error. 
text-recognizer_1  |   [Hint: 'cudaErrorInitializationError'. The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:283)
text-recognizer_1  | 

Any updates on this possibly?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why am I getting 'cuMemAlloc failed: not initialized' even ...
I believe the problem you experience is related to CUDA contexts. As of CUDA 4.0 a CUDA context is required per process and...
Read more >
Application — Celery 5.2.7 documentation
The application is thread-safe so that multiple Celery applications with different configurations, components, and tasks can co-exist in the same process ...
Read more >
How to Setup Your Python Environment for Machine Learning ...
In this tutorial, we will cover the following steps: Download Anaconda; Install Anaconda; Start and Update Anaconda; Update scikit-learn Library ...
Read more >
Accelerating Applications with CUDA C/C++ - UL HPC Tutorials
This tutorial will cover the following aspects of CUDA programming: Write, compile and run C/C++ programs that both call CPU functions and launch...
Read more >
Bug listing with status RESOLVED with resolution OBSOLETE ...
... Bug:111388 - "pyfest ebuild-- python module for festival" status:RESOLVED ... Bug:181327 - "/etc/init.d/nfs script does not work in Gentoo/FreeBSD" ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found