1 of 3 nodes failing with "no recovery target specified"
See original GitHub issueHi,
I have been struggling for a couple of days now getting a 3 node setup with patroni/postgres running properly. I’ve been following this guide: https://www.linode.com/docs/databases/postgresql/create-a-highly-available-postgresql-cluster-using-patroni-and-haproxy/.
The master and a single secondary node start fine every time, but a third node always fails to start postgres with the following:
Jul 16 14:30:45 devpostgres03 patroni[10951]: 2018-07-16 14:30:45,213 INFO: Lock owner: devpostgres01; I am devpostgres03 Jul 16 14:30:45 devpostgres03 patroni[10951]: 2018-07-16 14:30:45,228 INFO: Lock owner: devpostgres01; I am devpostgres03 Jul 16 14:30:45 devpostgres03 patroni[10951]: 2018-07-16 14:30:45,255 INFO: Local timeline=14 lsn=0/60005F8 Jul 16 14:30:45 devpostgres03 patroni[10951]: 2018-07-16 14:30:45,269 INFO: master_timeline=22 Jul 16 14:30:45 devpostgres03 patroni[10951]: 2018-07-16 14:30:45,271 INFO: master: history=1 0/19C4218 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 2 0/19C4638 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 3 0/40000D0 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 4 0/4000410 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 5 0/4000830 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 6 0/4000C50 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 7 0/4000F90 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 8 0/40011F0 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 9 0/4001370 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 10 0/40015D0 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 11 0/6000178 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 12 0/60002F8 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 13 0/6000478 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 14 0/60007B8 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 15 0/6000938 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 16 0/80000D0 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 17 0/8000330 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 18 0/80004B0 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 19 0/8000630 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 20 0/8000970 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 21 0/8000AF0 no recovery target specified Jul 16 14:30:45 devpostgres03 patroni[10951]: 2018-07-16 14:30:45,280 INFO: starting as a secondary Jul 16 14:30:46 devpostgres03 patroni[10951]: 2018-07-16 14:30:46,004 INFO: postmaster pid=11404 Jul 16 14:30:46 devpostgres03 patroni[10951]: < 2018-07-16 14:30:46.008 UTC >LOG: redirecting log output to logging collector process Jul 16 14:30:46 devpostgres03 patroni[10951]: < 2018-07-16 14:30:46.008 UTC >HINT: Future log output will appear in directory "pg_log". Jul 16 14:30:46 devpostgres03 patroni[10951]: 10.10.10.213:5432 - no response Jul 16 14:30:47 devpostgres03 patroni[10951]: 2018-07-16 14:30:47,020 ERROR: postmaster is not running Jul 16 14:30:47 devpostgres03 patroni[10951]: 2018-07-16 14:30:47,056 INFO: Lock owner: devpostgres01; I am devpostgres03 Jul 16 14:30:47 devpostgres03 patroni[10951]: 2018-07-16 14:30:47,061 INFO: failed to start postgres
Here is my configuration:
`scope: postgres namespace: /db/ name: devpostgres03
restapi: listen: 10.10.10.213:8008 connect_address: 10.10.10.213:8008 etcd: host: 10.10.10.214:2379
bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true recovery_conf: recovery_target: immediate initdb: - encoding: UTF8 - data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 10.10.10.211/0 md5
- host replication replicator 10.10.10.213/0 md5
- host replication replicator 10.10.10.212/0 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql: listen: 10.10.10.213:5432 connect_address: 10.10.10.213:5432 data_dir: /data/patroni pgpass: /tmp/pgpass authentication: replication: username: replicator password: rep-pass superuser: username: postgres password: secretpassword parameters: unix_socket_directories: ‘.’
tags: nofailover: false noloadbalance: false clonefrom: false nosync: false`
And finally, I am starting via systemd with the following service file:
`[Unit] Description=Patroni runners to orchestrate a high-availability PostgreSQL After=syslog.target network.target
[Service] Type=simple User=postgres Group=postgres ExecStart=/usr/bin/patroni /etc/patroni/patroni.yml KillMode=process TimeoutSec=30 Restart=no
[Install] WantedBy=multi-user.target`
I tried specifing the recovery_target key in my patroni config but I still receive this issue constantly on one of the slaves. Any help is hugely appreciated as I’ve been struggling for a while on this.
Thanks,
Carey
Issue Analytics
- State:
- Created 5 years ago
- Comments:10 (3 by maintainers)
Top Related StackOverflow Question
Looks like your replica has not being initialized properly (perhaps something happened during the basebackup that made it abort prematurely). You can run
patronictl -c patroni.yaml reinit postgres devpostgres03to reinitialize this node and see whether it would fix the issue.I also noticed there is no archive_command and restore_command in your configuration files. This means that if the replica falls out-of-date while streaming your changes (because, for instance, it cannot keep up with the ratio of changes or there is a network issue and the master has already recycled the WAL segments still required for the replica) it won’t be able to continue without reinitializing. However, it looks like your issue is not caused by this, as in the case of the missing WAL segments the error message will be different.
yes, i made a mistake, thought postgres was the user required to reinitialise the cluster instead of clusterName “postgres-cluster”. Use the following command to fix this issue “2020-04-14 05:43:51,739 INFO: master: history=1 0/678BA08 no recovery target specified”
docker exec -it UAT_postgres2.1.z7i0uuwifizmap7rhgq900fvx patronictl -c patroni.yml reinit postgres-cluster postgres2
±--------------------±-------------±-------------±-------±--------±—±----------±----------------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | Pending restart | ±--------------------±-------------±-------------±-------±--------±—±----------±----------------+ | postgres-cluster | postgres1 | postgres1 | | running | 10 | 0 | * | | postgres-cluster | postgres2 | postgres2 | | running | 10 | 659 | * | | postgres-cluster | postgres3 | postgres3 | Leader | running | 10 | 0 | * | ±--------------------±-------------±-------------±-------±--------±—±----------±----------------+ Are you sure you want to reinitialize members postgres2? [y/N]: y Success: reinitialize for member postgres2
current Status
Tue Apr 14 07:21:01 UTC 2020 root@UAT:~# docker exec -it UAT_postgres2.1.z7i0uuwifizmap7rhgq900fvx patronictl -c patroni.yml list ±--------------------±-------------±-------------±-------±--------±—±----------±----------------+ | Cluster | Member | Host | Role | State | TL | Lag in MB | Pending restart | ±--------------------±-------------±-------------±-------±--------±—±----------±----------------+ | postgres-cluster | postgres1 | postgres1 | | running | 10 | 0 | * | | postgres-cluster | postgres2 | postgres2 | | running | 10 | 0 | * | | postgres-cluster | postgres3 | postgres3 | Leader | running | 10 | 0 | * | ±--------------------±-------------±-------------±-------±--------±—±----------±----------------+
I will work on your suggestions for second query.
One last thing, i didn’t understand in this table why for all postgres services there is Pending restart marked, even if i recreated all services (containers) of this cluster(I am running this cluster in Docker-swarm mode), all of them are still marked as pending restart.
Is this something to worry about. It wasn’t coming when i was testing on brand new setup.
As far as i remember this happened/is coming only once i join my new postgres-cluster to an existing postgres database.
Thanks CyberDem0n