Slow Handling of executemany()
See original GitHub issueThere was a long standing, somewhat significant issue open at Google Code having to do with executemany() and MS SQL Server. It does not yet appear to be fixed. I’d like to call attention to it again. (insert unhelpful chide for leaving open issues at Google Code and not bringing them over)
Basically – executemany() is taking forever. The problem manifests itself when there are a lot of records (10k’s or 100k’s) and using MS-SQL Server.
When one uses the SQL Profiler, you can see, for every statement, an exec sp_prepexec, followed by an exec sp_unprepare.
Here’s a bit of sample code that makes it happen.
import pyodbc
dest_conn = pyodbc.connect('DSN=MyDSNToASQLServer;UID=someuser;PWD=password;')
dest_cursor = dest_conn.cursor()
sql = 'INSERT INTO MyTable (Name, Address, Phone) VALUES (?, ?, ?)'
data = []
## OMITTED
# Fill data[] with a lot of data 10k's of rows or more
##
# this takes forever when there are a lot of records
dest_cursor.executemany(sql, data)
dest_conn.commit()
Issue Analytics
- State:
- Created 7 years ago
- Reactions:4
- Comments:28 (9 by maintainers)
Top Results From Across the Web
Why is executemany slow in Python MySQLdb? - Stack Overflow
MySQL-python's implementation of executemany() matches the VALUES clause with a regular expression and then just clones the list of values ...
Read more >Inserting data with executemany is prohibitively slow
Hi, I'm calculating statistics for some 40000 items (about 10000 datapoints per item) and every 1000th item I'm dumping the stats to a...
Read more >Thread: [psycopg] speed concerns with executemany()
I'm getting more and more regular complaints among users of SQLAlchemy of the relatively slow speed of the cursor.executemany() call in ...
Read more >10.5.5 MySQLCursor.executemany() Method
In most cases, the executemany() method iterates through the sequence of parameters, each time passing the current parameters to the execute() method.
Read more >Python MySQL speeding up inserts - Instructobit
Generating test data Inserting data into table the slow way Time taken ... The executemany() cursor method can be used to bulk insert...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I have some preliminary code at https://github.com/v-chojas/pyodbc/tree/paramarray if you’d like to give it a try. @mkleehammer let me know if you’d like a PR. It has only been tested with SQL Server 2016 and the msodbcsql driver on Linux. To implement the data insertion efficiently, the changes are quite significant so I opted to leave most of the existing code alone and have the new implementation alongside it. Notable differences/improvements are—
SQLDescribeParam is used to obtain the correct SQL type and characteristics (e.g. size) of the column, which is then supplied to SQLBindParameter so that the driver sends the correct datatype to the server. This possibly eliminates an additional server-side conversion. In other words, instead of setting both SQL type and C type from Python type, I set the SQL type according to the server’s recommendation, the C type from the Python type (using the SQL type as a hint, in some cases), and let ODBC driver do conversion.
The entire parameter array is allocated at once and then each row is filled in, converted from Python object to the ODBC C representation. If the “schema” changes (e.g. the cell in a previous row was a PyInt and the current one a PyBytes), then I execute with the rows accumulated thus far, and continue with the current row by detecting and assuming the new schema.
As a side-effect, a few bugs I discovered in the existing implementation (execute() in a loop) while testing and comparing were coincidentally fixed:
Conversion from long integers would fail even if the value was representable in the destination column: e.g. decimal(38,0) can hold values near 2**127 but they failed to convert in the existing implementation. With the optimised implementation, the full precision is usable.
Used SQL_NUMERIC_STRUCT correctly to avoid a trip through (slow) string conversion. Decimal types now insert with full precision.
Inserted date/time objects into time(n) columns were missing fractional seconds.
Performance test results: SQL Server 2016 Microsoft ODBC Driver 13.1 for SQL Server (Linux - CentOS 6.8) Pythons 2.6.6 and 3.6.1
Times are for inserting 20K rows into a single-column table of the specified type, in seconds. There is at least an order of magnitude difference between the old and new executemany().
Type new executemany() old executemany() Python execute() loop bigint 0.4858 14.5929 13.4796 bit 0.4195 17.5971 13.3187 int 0.4584 14.3637 13.1119 money 0.4811 15.3528 14.9873 smallmoney 0.4576 14.1919 13.4547 smallint 0.395 14.4953 13.3151 tinyint 0.5302 15.5995 14.185 float 0.503 13.8713 13.8131 binary(27) 0.4403 15.0619 17.13 varbinary(20) 0.4516 14.0041 13.5609 date 0.4554 16.3541 13.4959 time(7) 0.3818 13.9678 14.1381 time(5) 0.4966 14.5185 13.2597 time(3) 0.4969 18.4535 15.0969 time(0) 0.4757 14.1939 14.8959 datetime 0.6387 15.4826 14.0731 datetime2(7) 0.3721 13.4521 14.8893 datetime2(5) 0.452 15.3164 15.4111 datetime2(3) 0.4881 14.23 14.3967 datetime2(0) 0.454 15.7699 15.502 uniqueidentifier 0.5077 13.9185 13.6002 decimal(38,0) 0.5451 error error decimal(13,5) 0.6565 14.1069 20.5524 decimal(19,8) 0.9849 15.2114 13.081 decimal(38,22) 1.0341 13.4947 16.9196 char(15) 0.5015 14.4498 12.8164 char(75) 0.5132 13.9494 13.3094 char(221) 1.1595 16.4242 14.9281 char(750) 1.0887 25.9653 15.1296 varchar(15) 0.4943 16.8314 13.322 varchar(75) 0.5822 18.0023 13.3582 varchar(221) 0.6063 16.1918 13.029 varchar(750) 0.8678 17.2184 13.947 nchar(15) 0.5257 13.7747 16.8729 nchar(75) 0.5039 13.975 13.5015 nchar(221) 0.7486 14.7194 15.1842 nchar(750) 2.7277 (timed out) (timed out)
I am currently working on an implementation using the parameter array feature of ODBC, which is showing some promising results in preliminary benchmarks:
Time in seconds taken to insert 20K rows into a table with single int column, SQL Server 2016 using Microsoft ODBC Driver 13: original executemany() : 10.8121 parameter array executemany() : 0.3118 execute() in a Python loop : 12.7683
I still need to do some more testing with the other datatypes; this is just a notification that I am looking into it.