improving the speed of to_csv
See original GitHub issueHello,
I dont know if that is possible, but it would great to find a way to speed up the to_csv method in Pandas.
In my admittedly large dataframe with 20 million observations and 50 variables, it takes literally hours to export the data to a csv file.
Reading the csv in Pandas is much faster though. I wonder what is the bottleneck here and what can be done to improve the data transfer.
Csv files are ubiquitous, and a great way to share data (without being too nerdy with hdf5and other subtleties). What do you think?
Issue Analytics
- State:
- Created 7 years ago
- Comments:17 (7 by maintainers)
Top Results From Across the Web
Pandas to_csv() slow saving large dataframe - Stack Overflow
Try without the chunksize kwag... It could be a lot of things, like quoting, value conversion, etc. Try to profile it and see...
Read more >Comparing speed and size of to_csv(), np.save(), to_hdf ...
Performance of speed and time complexity comparison between pandas read_csv(), read_hdf(), read_pickle() and numpy save and ...
Read more >The fastest way to read a CSV in Pandas - Python⇒Speed
Measured purely by CPU, fastparquet is by far the fastest. Whether it gives you an elapsed time improvement will depend on whether you...
Read more >Pandas To_Csv() Slow Saving Large Dataframe - ADocLib
to_csv () and you will notice a drastic improvement in the speed at which a csv file gets written. In this post, we'll...
Read more >to_csv() – datatable.Frame.
This format is around 3 times faster to write/read compared to usual decimal representation, so its use is recommended if you need maximum...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
So how about an answer to the initial question rather than getting off track.
I need cvs format
Can pandas improve speed of writing that file format?
duplicate of #3186
Well using 1/10 the rows, about 800MB in memory
about 8.5MB/sec in raw throughput, way below IO speeds, so obviously quite some room to improve. you might be interested in this blog here
Of course there IS really no reason at all to use CSV unless you are forced.