Bug: Connection terminated unexpectedly (postgres)
See original GitHub issueEnvironment
Knex version: 0.20.1 Database + version: postgres:11.5 OS: Alpine Linux
Additional information:
- Database: I’ve confirmed the bug to happen in postgres versions at least up from
9.5. The database is running on a container, currentlypostgres:11.5. - Knex environment: Knex is currently running on node v12, though I’ve confirmed the bug to happen on v8 and v10 too. It’s running on a container, currently
node:12.13.0-alpine. The currentpgversion is 7.12.1; though I’ve seen this happen with previous versions, I haven’t documented this bit. - Knex configuration object:
({
client: 'pg',
connection: 'postgres://user:password@server:5432/db',
pool: { min: 2, max: 10 }
})
Bug
I am getting a Connection terminated unexpectedly error in production, after long periods of inactivity (think +1h) during which no queries are made, causing the first query after that period to fail. Any query made after that first failing query will succeed.
This was first reported on issue #3258. In that issue, it was reported happening on knex 0.16.5, so this has been going on at least since then. It was also stated that a downgrade to 0.13.0 fixes the issue, which met the suggestion that maybe newer knex versions logged the error while previous ones didn’t. I’ve confirmed this not to be the case. Downgrading to knex 0.13.0 does in fact fix the issue, with queries being successful after long periods of inactivity.
A typical error message will look like this:
- message:
select \"user\".* from \"user\" where \"email\" ilike $1 - Connection terminated unexpectedly - stack:
Error: Connection terminated unexpectedly\n at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:252:9)\n at Object.onceWrapper (events.js:299:28)\n at Connection.emit (events.js:210:5)\n at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:77:10)\n at Socket.emit (events.js:210:5)\n at TCP.<anonymous> (net.js:658:12)
node-postgres #1324 might help providing additional insight on the matter. #3046 might also be related.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:32
- Comments:54 (12 by maintainers)
Top Related StackOverflow Question
I have good news! After further investigating the issue, I believe I can point towards a solution -I’ll leave that for last, as there’s a bit to unpack.
Unfortunately, I haven’t been able to reproduce the error locally, even with the same setup and compose configuration as the remote server. It is quite a strange thing, as the only difference I can appreciate between both environments is I’m running
docker-composelocally, while the server is running a docker swarm. To reiterate, I’ve set up a repo that reproduces this error on a remote production server, but (somehow) not locally. I will post the logs below (last), so for the purpose of understanding them, I’ve made the repo available here. As you can see there, I’m making a first successful query right after the process starts, making a 2nd query after 20 minutes (which fails), and a 3rd query immediately after (which succeeds).Some additional info, just for the sake of future reference:
And now, for the useful bit: it turns out
pgis not emitting the connection error until after the query has been made, but before it has resolved. This is the sequence of events:dialects/postgres/index.js_queryis called with a pgconnection, which calls pg’sconnection.query.connectionemits an'error'event, which is received byconnection.on('error', cb)at knex’sdialects/postgres/index.js.connection.querycalls its callback with the same error that had just been emitted, hence knex’sdialects/postgres/index.js_queryrejects with said error.connectionemits an'end'event, which is received byconnection.on('end', cb)at knex’sdialects/postgres/index.js.acquireRawConnection) and succeed.So essentially, it could be argued that maybe there’s an issue with the containerized production environment/setup, and/or with pg, which is not reporting a connection termination until the time at which a query is made -though I don’t know whether reporting it earlier would be feasible. But regardless, there’s a (very much fixable) issue with knex: if a connection errors out in the time that goes between a query is made and its resolution, the query will fail. I would imagine this would be the case not just with postgres, but with other dialects too.
Hence I would suggest that, in this event, a new connection is acquired and the query retried. It might make sense to pass that error (make the query fail with it), though, in the event of the connection having being acquired just before it was made. So, to recap, I’d propose that if a query errors out and the connection was just acquired for the purpose of the query, it rejects with the error, but otherwise it checks whether the connection has failed, and if so, acquires a new connection and retries the query.
Last, the logs. These are for v0.20.1, but I have logs for every minor version starting on 0.13.0.
@lynxtaa I changed my pool settings to the following and the error has not occurred in a couple days, so I guess this fixes the problem. I’ll comment again if it resurfaces. Thanks.