Bug: Connection terminated unexpectedly (postgres)

Environment

Knex version: 0.20.1 Database + version: postgres:11.5 OS: Alpine Linux

Additional information:

Database: I’ve confirmed the bug to happen in postgres versions at least up from 9.5. The database is running on a container, currently postgres:11.5.
Knex environment: Knex is currently running on node v12, though I’ve confirmed the bug to happen on v8 and v10 too. It’s running on a container, currently node:12.13.0-alpine. The current pg version is 7.12.1; though I’ve seen this happen with previous versions, I haven’t documented this bit.
Knex configuration object:

({
  client: 'pg',
  connection: 'postgres://user:password@server:5432/db',
  pool: { min: 2, max: 10 }
})

Bug

I am getting a Connection terminated unexpectedly error in production, after long periods of inactivity (think +1h) during which no queries are made, causing the first query after that period to fail. Any query made after that first failing query will succeed.

This was first reported on issue #3258. In that issue, it was reported happening on knex 0.16.5, so this has been going on at least since then. It was also stated that a downgrade to 0.13.0 fixes the issue, which met the suggestion that maybe newer knex versions logged the error while previous ones didn’t. I’ve confirmed this not to be the case. Downgrading to knex 0.13.0 does in fact fix the issue, with queries being successful after long periods of inactivity.

A typical error message will look like this:

message: select \"user\".* from \"user\" where \"email\" ilike $1 - Connection terminated unexpectedly
stack: Error: Connection terminated unexpectedly\n at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:252:9)\n at Object.onceWrapper (events.js:299:28)\n at Connection.emit (events.js:210:5)\n at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:77:10)\n at Socket.emit (events.js:210:5)\n at TCP.<anonymous> (net.js:658:12)

node-postgres #1324 might help providing additional insight on the matter. #3046 might also be related.

Issue Analytics

State:
Created 4 years ago
Reactions:32
Comments:54 (12 by maintainers)

Top GitHub Comments

20reactions

rafamelcommented, Nov 10, 2019

I have good news! After further investigating the issue, I believe I can point towards a solution -I’ll leave that for last, as there’s a bit to unpack.

Unfortunately, I haven’t been able to reproduce the error locally, even with the same setup and compose configuration as the remote server. It is quite a strange thing, as the only difference I can appreciate between both environments is I’m running docker-compose locally, while the server is running a docker swarm. To reiterate, I’ve set up a repo that reproduces this error on a remote production server, but (somehow) not locally. I will post the logs below (last), so for the purpose of understanding them, I’ve made the repo available here. As you can see there, I’m making a first successful query right after the process starts, making a 2nd query after 20 minutes (which fails), and a 3rd query immediately after (which succeeds).

Some additional info, just for the sake of future reference:

It seems the idle time needed for the connection error to happen after querying is somewhere between 15 and 20 minutes. Queries after 15 minutes of inactivity will succeed; while after 20, they will fail.
knex 0.13.0 doesn’t show this issue, while every version after, including 0.14.0, does. However, on 0.14.0 a new connection won’t be acquired on the next query, so they will fail indefinitely, which is solved on 0.15.0.

And now, for the useful bit: it turns out pg is not emitting the connection error until after the query has been made, but before it has resolved. This is the sequence of events:

knex public api query is made.
knex’s dialects/postgres/index.js _query is called with a pg connection, which calls pg’s connection.query.
pg’s connection emits an 'error' event, which is received by connection.on('error', cb) at knex’s dialects/postgres/index.js.
pg’s connection.query calls its callback with the same error that had just been emitted, hence knex’s dialects/postgres/index.js _query rejects with said error.
pg’s connection emits an 'end' event, which is received by connection.on('end', cb) at knex’s dialects/postgres/index.js.
knex releases connection to pool.
The next time knex receives a query to be made, it will acquire a new pg connection (acquireRawConnection) and succeed.

So essentially, it could be argued that maybe there’s an issue with the containerized production environment/setup, and/or with pg, which is not reporting a connection termination until the time at which a query is made -though I don’t know whether reporting it earlier would be feasible. But regardless, there’s a (very much fixable) issue with knex: if a connection errors out in the time that goes between a query is made and its resolution, the query will fail. I would imagine this would be the case not just with postgres, but with other dialects too.

Hence I would suggest that, in this event, a new connection is acquired and the query retried. It might make sense to pass that error (make the query fail with it), though, in the event of the connection having being acquired just before it was made. So, to recap, I’d propose that if a query errors out and the connection was just acquired for the purpose of the query, it rejects with the error, but otherwise it checks whether the connection has failed, and if so, acquires a new connection and retries the query.

Last, the logs. These are for v0.20.1, but I have logs for every minor version starting on 0.13.0.

First query, right after the process starts:

2019-11-09T09:22:02.057558944Z Elapsed: 0 ms
2019-11-09T09:22:02.058104661Z 2019-11-09T09:22:02.057Z knex:client acquired connection from pool: __knexUid1
2019-11-09T09:22:02.059072595Z {
2019-11-09T09:22:02.059140593Z method: 'select',
2019-11-09T09:22:02.059150803Z options: {},
2019-11-09T09:22:02.059156884Z timeout: false,
2019-11-09T09:22:02.059162835Z cancelOnTimeout: false,
2019-11-09T09:22:02.059169678Z bindings: [ 'example@example.com' ],
2019-11-09T09:22:02.059176932Z __knexQueryUid: '0ab6fbe6-f4d0-497a-87b0-057033964950',
2019-11-09T09:22:02.059183233Z sql: 'select * from "user" where "email" = ?'
2019-11-09T09:22:02.059191860Z }
2019-11-09T09:22:02.059483679Z 2019-11-09T09:22:02.059Z knex:query select * from "user" where "email" = ? trx2
2019-11-09T09:22:02.059543883Z 2019-11-09T09:22:02.059Z knex:bindings [ 'example@example.com' ] trx2
2019-11-09T09:22:02.063050947Z 2019-11-09T09:22:02.062Z knex:client releasing connection to pool: __knexUid1
2019-11-09T09:22:02.063380849Z Result []

Second query, after 20 minutes:

2019-11-09T09:42:02.067012525Z Elapsed: 1200000 ms
2019-11-09T09:42:02.071066942Z 2019-11-09T09:42:02.069Z knex:client acquired connection from pool: __knexUid1
2019-11-09T09:42:02.071878431Z {
2019-11-09T09:42:02.071909690Z method: 'select',
2019-11-09T09:42:02.071916663Z options: {},
2019-11-09T09:42:02.071921983Z timeout: false,
2019-11-09T09:42:02.071926913Z cancelOnTimeout: false,
2019-11-09T09:42:02.071931722Z bindings: [ 'example@example.com' ],
2019-11-09T09:42:02.071936431Z __knexQueryUid: '229e5c4a-9ed3-4d82-a25e-b048e7cb9c0d',
2019-11-09T09:42:02.071941269Z sql: 'select * from "user" where "email" = ?'
2019-11-09T09:42:02.071946579Z }
2019-11-09T09:42:02.072055515Z 2019-11-09T09:42:02.071Z knex:query select * from "user" where "email" = ? trx2
2019-11-09T09:42:02.072125507Z 2019-11-09T09:42:02.072Z knex:bindings [ 'example@example.com' ] trx2
2019-11-09T09:42:02.081042312Z 2019-11-09T09:42:02.080Z knex:client releasing connection to pool: __knexUid1
2019-11-09T09:42:02.082578957Z Error: Connection terminated unexpectedly
2019-11-09T09:42:02.082619874Z at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:252:9)
2019-11-09T09:42:02.082627337Z at Object.onceWrapper (events.js:299:28)
2019-11-09T09:42:02.082633188Z at Connection.emit (events.js:210:5)
2019-11-09T09:42:02.082638899Z at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:77:10)
2019-11-09T09:42:02.082645932Z at Socket.emit (events.js:210:5)
2019-11-09T09:42:02.082650612Z at TCP.<anonymous> (net.js:658:12)

Third query, right after the 2nd:

2019-11-09T09:42:02.084190783Z Elapsed: 0 ms
2019-11-09T09:42:02.084868650Z Connection Error: Connection ended unexpectedly
2019-11-09T09:42:02.097410334Z 2019-11-09T09:42:02.097Z knex:client acquired connection from pool: __knexUid3
2019-11-09T09:42:02.098051381Z {
2019-11-09T09:42:02.098071179Z method: 'select',
2019-11-09T09:42:02.098076118Z options: {},
2019-11-09T09:42:02.098080576Z timeout: false,
2019-11-09T09:42:02.098085165Z cancelOnTimeout: false,
2019-11-09T09:42:02.098089503Z bindings: [ 'example@example.com' ],
2019-11-09T09:42:02.098094572Z __knexQueryUid: '6e702fd9-b0d5-4c8e-9ee7-0b6404fec6d6',
2019-11-09T09:42:02.098099522Z sql: 'select * from "user" where "email" = ?'
2019-11-09T09:42:02.098104000Z }
2019-11-09T09:42:02.098367957Z 2019-11-09T09:42:02.098Z knex:query select * from "user" where "email" = ? undefined
2019-11-09T09:42:02.098583893Z 2019-11-09T09:42:02.098Z knex:bindings [ 'example@example.com' ] undefined
2019-11-09T09:42:02.103675485Z 2019-11-09T09:42:02.103Z knex:client releasing connection to pool: __knexUid3
2019-11-09T09:42:02.104059078Z Result []

8reactions

david-wbcommented, Nov 5, 2020

@lynxtaa I changed my pool settings to the following and the error has not occurred in a couple days, so I guess this fixes the problem. I’ll comment again if it resurfaces. Thanks.

pool: {
    min: 0,
    max: 5,
    acquireTimeoutMillis: 60000,
    idleTimeoutMillis: 600000,
}

Top Results From Across the Web

node-postgres, Connection terminated unexpectedly

I can connect using the psql client, but I get the error Connection terminated unexpectedly while trying to run this (with same connection...

7.18.2: "Connection terminated unexpectedly" when using ...

7.18.2: "Connection terminated unexpectedly" when using client.query with a pool when pool has been idle for 10 minutes (running in AWS Lambda) #2112....

PostgreSQL "connection terminated" - quite frustrated

When trying to connect, I'm using the url “postgresql://username@127.0.0.1/{database}”. I then launch on the terminal “observable-database-proxy ...

Persistent connection to Postgres database - Fly.io

It works fine but every 30 minutes Connection terminated unexpectedly error occurs. My question is, are there any platform time limits for ...

1623183 - Postgres Connection termination kills services

The script got a little more complicated! The error occurs when the connection dies while the client is checked out of the pool,...