parallel_apply never starts processing

See original GitHub issue

ISSUE: Progress on the parallel_apply never starts going up.

I am trying to use parallel_apply to populate new columns on a data frame. This takes about 50 minutes with normal apply, but every column is independent so it should be easily parallelizable.

I am using the following to initialize:

pandarallel.initialize(nb_workers=8, progress_bar=True, use_memory_fs=False)

OUTPUT:

INFO: Pandarallel will run on 8 workers.
INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers.

and this is my parallel_apply call:

allowed_types_list = ['...', '...', ..., '...']
data["allowed"] = data["type"].apply(lambda x: 1 if x in allowed_types_list else 0)

The shape of my dataframe is: (4717892, 8)

ISSUE: Progress on the parallel_apply never starts going up.

I tried similarly on a different function that takes around 5 second on apply, and same thing happens. I tried it on my local computer (running MacOS with an i9, using pipe for data transfer) and on Google Colab (here I had 4 cores, using memory file system for data transfer). Same behavior on both.

Am I missing something?

As a side note, is it possible to get the progress bars working on Google Colab?

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:10

github_iconTop GitHub Comments

0reactions
yangyxtcommented, Dec 8, 2022

Same issue here using pandarallel==1.6.1, python 3.9.5 pandas 1.4.2. However I encounter this by finding out the cputime of the computation node stop increasing. And I set progress_bar=True, use_memory_fs=False.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Parallel apply is not faster than regular apply pyhon
I would be using the python module called threading which runs the process on the same cpu but different threads.
Read more >
Make your Pandas apply functions faster using Parallel ...
Make your Pandas apply functions faster using Parallel Processing ... Let me first start with defining the function I want to use to...
Read more >
Parallel Vectorized Operations | R-bloggers
Essentially, R starts up n number of instances and sends subsets of the original data to be processed in those instances using its...
Read more >
Parallel Replication - MariaDB Knowledge Base
The documentation process is ongoing. ... Optimistic mode of in-order parallel replication provides a lot of opportunities for parallel apply on the replica ......
Read more >
4 Managing the Members of a Broker Configuration
4.6.2 Managing Parallel Apply with Redo Apply ... The former primary database is never automatically reinstated if a fast-start failover occurred because a ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found