String right truncation pyodbc.ProgrammingError when using `fast_executemany`

See original GitHub issue

Environment

To diagnose, we usually need to know the following, including version numbers. On Windows, be sure to specify 32-bit Python or 64-bit:

  • Python: 3.6.3
  • pyodbc: 4.0.23
  • OS: Debian Stretch
  • DB: SQL Server
  • driver: Microsoft ODBC Driver 17 for SQL Server libmsodbcsql-17.0.so.1.1

Issue

When executing a parameterized “UPDATE” statement with fast_executemany, the following exception is thrown.

ERROR - encoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_encode> returned a result with an error set)              
pyodbc.ProgrammingError: ('String data, right truncation: length 666 buffer 510', 'HY000')

The column widths in the target database have more than enough size to hold the data. This seems equivalent to Issue #337 except that in this case, I’m using 4.0.23 of pyodbc and MS ODBC Driver 17, which fixed the issue in that post but continue to be a problem for me.

I’ve trimmed down the data to just two rows and can consistently reproduce the problem with just those two rows.

I’m using SQL Alchemy to construct the statements and am using the connection.execute(stmt, [row1, row2]) method to execute.

Turning fast_executemany off fixes the problem, but unacceptably impacts performance.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

4reactions
seanfdnncommented, Jun 8, 2018

I’ve narrowed this down to be a variation of #280

When the SQL statement contains more than one statement (i.e. more complex transactions, try catch blocks, etc…) and fast_executemany=True, the SQLDescribeParam call doesn’t work and thus VARCHARs are allocated a default buffer size of 510 bytes. This seems to be a known behaviour of SQLDescribeParam:

When processing a batch of Transact-SQL statements, the driver also does not support calling SQLDescribeParam for parameter markers in statements after the first statement in the batch.

The following code will reproduce the error (note the “print” statement in the SQL, but this could be TRY/CATCH BLOCKS, etc…

import pyodbc
import sys

print('Python version ' + sys.version)
print('pyodbc version ' + pyodbc.version)

driver = 'ODBC Driver 17 for SQL Server'
server = 'localhost'
database = 'db'
user = 'sa'
password = 'password'

conn_string = f'DRIVER={{{driver}}};' \
                   f'PORT=1433;SERVER={server};' \
                   f'DATABASE={database};' \
                   f'UID={user};' \
                   f'PWD={password};' \
                   f'ColumnEncryption=Enabled;' \
                   f'TrustServerCertificate=yes;'

cnxn = pyodbc.connect(conn_string, autocommit=True)

crsr = cnxn.cursor()

crsr.fast_executemany = True

crsr.execute('DROP TABLE IF EXISTS tmp')
crsr.execute('CREATE TABLE tmp (txt VARCHAR(1024))')

value_to_insert = 'a' * 256

params = [(value_to_insert,), (value_to_insert,)]

sql = """
print 'do nothing';
INSERT INTO tmp (txt) VALUES (?);
"""

crsr.setinputsizes((512,))
crsr.executemany(sql, params)

print(crsr.execute('SELECT txt from tmp').fetchall())

(note that in the actual application, the SQL is significantly more complicated; this is the simplest case that will reproduce the error)

This produces the output with errors:

Python version 3.6.3 (default, Nov  6 2017, 10:55:10)
[GCC 6.3.0 20170516]
pyodbc version 4.0.23
pyodbc.ProgrammingError: ('String data, right truncation: length 512 buffer 510', 'HY000')

The above exception was the direct cause of the following exception:

SystemError: <built-in function utf_16_le_encode> returned a result with an error set

The above exception was the direct cause of the following exception:

SystemError: encoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_encode> returned a result with an error set)

The above exception was the direct cause of the following exception:

SystemError: <built-in function utf_16_le_encode> returned a result with an error set

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "pypirls/memerrortest.py", line 40, in <module>
    crsr.executemany(sql, params)
SystemError: encoding with 'utf-16le' codec failed (SystemError: <built-in function utf_16_le_encode> returned a result with an error set)

However, changing any one of the below will avoid the exception:

  • Removing the “print” statement from the SQL
  • Setting the number of ‘a’ characters in the inserted varchar to 255 (fits in buffer size)
  • Setting fast_executemany to False

Note that unlike the solution in the wiki, a temporary table is not being used, and ColumnEncryption=Enabled

It’s unclear to me why this problem happens when fast_executemany=True but not when False as I would think that SQLDescribeParameter would have the same behaviour in either case, but I’m going to guess it’s because in the latter case, a memory array doesn’t need to be pre-allocated.

Looking at @v-chojas 's setinputsizes PR – but as far as I can tell, it hasn’t been merged into master, and it doesn’t look like it is impacting the array allocation in params.cpp::ExecuteMulti? Any ideas for a workaround?

2reactions
seanfdnncommented, Jun 7, 2018

A common thread seems to be the buffer is capped at 510, which is 2 x 255 – which suggests fast_executemany sometimes assumes a maximum string length of 255 characters (510 bytes)

Read more comments on GitHub >

github_iconTop Results From Across the Web

('String data, right truncation: length 8 buffer 4', 'HY000')" error ...
I am using pyodbc 4.0.26 version. Inserting result-set (containing a decimal result) with "fastexecutemany" set to TRUE results in "pyodbc.
Read more >
How can I solve the issue: pyodbc.ProgrammingError: ('String ...
Is there a way to increase buffer? The issue: pyodbc.ProgrammingError: ('String data, right truncation: length 162 buffer 160', 'HY000'). My ...
Read more >
String right truncation pyodbc.ProgrammingError when using ...
The column widths in the target database have more than enough size to hold the data. This seems equivalent to Issue #337 except...
Read more >
How to Make Inserts Into SQL Server 100x faster with Pyodbc
I've been recently trying to load large datasets to a SQL Server database with Python. Usually, to speed up the inserts with pyodbc, ......
Read more >
Re: String data, right truncation - Microsoft Power BI Community
While Uploading dataset to SQL Server using R getting following error: "nanodbc/nanodbc.cpp:1655: 22001: [Microsoft][ODBC SQL Server Driver]String data, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found