failed to start postgres (reaped unknown pid : postmaster is not running )

See original GitHub issue

Hey Guys, During server upgrades, when patroni instances are restarted, this happens on secondary nodes:

2017-12-06 00:40:29,638 INFO: Lock owner: postgres-patroni-2; I am postgres-patroni-0
2017-12-06 00:40:29,665 INFO: starting as a secondary
2017-12-06 00:40:29,705 INFO: postmaster pid=428
2017-12-06 00:40:29 UTC [428]: [1-1] 5a273c7d.1ac 0     LOG:  redirecting log output to logging collector process
2017-12-06 00:40:29 UTC [428]: [2-1] 5a273c7d.1ac 0     HINT:  Future log output will appear in directory "../pg_log".
2017-12-06 00:40:29,924 INFO reaped unknown pid 428
2017-12-06 00:40:29,924 INFO reaped unknown pid 429
2017-12-06 00:40:30,706 ERROR: postmaster is not running
2017-12-06 00:40:30,711 INFO: Lock owner: postgres-patroni-2; I am postgres-patroni-0
2017-12-06 00:40:30,725 INFO: failed to start postgres

It starts fine if I delete all postgres data on these nodes… however I want replication to kick in and resume where it has finished (before the server upgrade).

Any ideas why this happens and who kills postmaster process ? Supervisord? Why?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:14

github_iconTop GitHub Comments

2reactions
CyberDem0ncommented, Sep 27, 2019

Hi @bappr,

that’s quite an old image, it was build more than a year ago…

You can try to figure out why postgres if failing by execing into the pod and looking into logs which are located in the /home/postgres/pgdata/pgroot/pg_log. Since it is a Friday now, the current log file is postgres-5.csv.

Since your master is alive you can rebuild replicas with the help of patronictl:

$ kubectl exec cluster-name-X bash
root@cluster-name-X:/home/postgres# su postgres
postgres@cluster-name-X:~$ patronictl list # check cluster status
postgres@cluster-name-X:~$ patronictl reinit cluster-name cluster-name-X

The last command will wipe PGDATA and take a fresh pg_basebackup from the master.

1reaction
CyberDem0ncommented, Dec 9, 2019

Any reason why the pg_control got fucked ?

You’ll have to figure it out from the logs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

failed to start postgres (reaped unknown pid - Bountysource
It starts fine if I delete all postgres data on these nodes... however I want replication to kick in and resume where it...
Read more >
PostgreSQL stale 'postmaster.pid' error - Danielle McCarthy
Open your terminal and make sure you're in the home directory. · Navigate to the Postgres directory. cd Library/Application\ Support/Postgres · Type ls...
Read more >
Обсуждение: BUG #14945: postmaster deadlock ... - Postgres Pro
The message that it's writing indicates that it failed to start an autovacuum ... to the other side of the pipe) has exited...
Read more >
Subprocesses — Supervisor 4.2.4 documentation - Supervisord
The process could not be started successfully. UNKNOWN (1000). The process is in an unknown state (supervisord programming error). Each process run ......
Read more >
1", port 5432 failed: fatal: role "postgres" does not exist - You ...
13. It would appear that you have not created a user account for the application. In psql: CREATE USER myapp WITH PASSWORD 'thepassword';...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found