How to properly restore cluster using WAL-G (WAL-E) to a specific time (PITR) on Postgres 12?

See original GitHub issue

I’ve gone through manual pages, resolved issues, and many tutorials, but unfortunately, I didn’t find complex and working examples, how to restore Patroni cluster to a specific time (PITR). So with little help I’ll prepare some tutorial to cover this topic.

There is no recovery.conf in Postgres 12, so Patroni should handle configuration and paste it into postgresql.conf.

In pure Postgres you can restore database to a specific time by this row: recovery_target_time: '2020-05-25 11:20:00'

  1. Setup environment I’m using custom boostrap method pitr_recovery.
scope: postgres
namespace: /db/
name: postgresql2

restapi:
    listen: 10.0.1.8:8008
    connect_address: 10.0.1.8:8008

etcd:
    host: 10.0.1.7:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true

    method: pitr_recovery
    pitr_recovery:
        keep_existing_recovery_conf: true
        recovery_conf:
            restore_command: 'envdir /etc/wal-g.d/env wal-g wal-fetch "%f" "%p"'
            recovery_target_timeline: latest
            recovery_target_action: promote
            recovery_target_time: '2020-05-25 11:20:00'

    initdb:
    - encoding: UTF8
    - locale: en_US.UTF-8
    - data-checksums

    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 10.0.1.4/0 md5
    - host replication replicator 10.0.1.8/0 md5
    - host replication replicator 10.0.1.6/0 md5
    - host all all 0.0.0.0/0 md5

postgresql:
    listen: 10.0.1.8:5432
    connect_address: 10.0.1.8:5432
    data_dir: /data/patroni
    pgpass: /tmp/pgpass
    authentication:
        replication:
            username: replicator
            password: rep-pass
        superuser:
            username: postgres
            password: secretpassword
    parameters:
        unix_socket_directories: '/var/run/postgresql'
        shared_preload_libraries: 'pg_stat_statements'
        archive_mode: 'on'
        archive_timeout: 300s
        archive_command: 'envdir /etc/wal-g.d/env wal-g wal-push %p'
    recovery_conf:
        restore_command: 'envdir /etc/wal-g.d/env wal-g wal-fetch "%f" "%p"'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false
  1. Steps
  • stop all Patroni nodes
  • deleted Patroni cluster (patroni remove)
  • deleted content of data directory (/data/patroni)
  • checkout backup with wal-g (wal-e) envdir /etc/wal-g.d/env wal-g backup-fetch /data/patroni LATEST
  • start Patroni node (master)
  • db is recovered to the last backup, recovery_target_time is ignored (in postgresql.conf)

Could someone help me with that?

Secondary mission 😃

I’ve also tried put checkout to command section, but it doesn’t work at all and thows error.

    method: pitr_recovery
    pitr_recovery:
        command: 'envdir /etc/wal-g.d/env wal-g backup-fetch /data/patroni LATEST'
        keep_existing_recovery_conf: true
        recovery_conf:
            restore_command: 'envdir /etc/wal-g.d/env wal-g wal-fetch "%f" "%p"'
            recovery_target_timeline: latest
            recovery_target_action: promote
            recovery_target_time: '2020-05-25 11:20:00'
patroni.service - Runners to orchestrate a high-availability PostgreSQL
   Loaded: loaded (/etc/systemd/system/patroni.service; disabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2020-06-01 08:26:45 UTC; 1min 36s ago
  Process: 3564 ExecStart=/usr/local/bin/patroni /etc/patroni.yml (code=exited, status=1/FAILURE)
 Main PID: 3564 (code=exited, status=1/FAILURE)

Jun 01 08:26:45 pg-2 patroni[3564]:     info = self._run_cycle()
Jun 01 08:26:45 pg-2 patroni[3564]:   File "/usr/local/lib/python2.7/dist-packages/patroni/ha.py", line 1277, in _run_cycle
Jun 01 08:26:45 pg-2 patroni[3564]:     return self.post_bootstrap()
Jun 01 08:26:45 pg-2 patroni[3564]:   File "/usr/local/lib/python2.7/dist-packages/patroni/ha.py", line 1173, in post_bootstrap
Jun 01 08:26:45 pg-2 patroni[3564]:     self.cancel_initialization()
Jun 01 08:26:45 pg-2 patroni[3564]:   File "/usr/local/lib/python2.7/dist-packages/patroni/ha.py", line 1168, in cancel_initialization
Jun 01 08:26:45 pg-2 patroni[3564]:     raise PatroniException('Failed to bootstrap cluster')
Jun 01 08:26:45 pg-2 patroni[3564]: patroni.exceptions.PatroniException: 'Failed to bootstrap cluster'
Jun 01 08:26:45 pg-2 systemd[1]: patroni.service: Main process exited, code=exited, status=1/FAILURE
Jun 01 08:26:45 pg-2 systemd[1]: patroni.service: Failed with result 'exit-code'.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:5

github_iconTop GitHub Comments

9reactions
jiripsotacommented, Jun 8, 2020

Thank you for help! Now it’s working, so here is my configuration file and scripts. It’s working combination for Postgresql 12 + Patroni + WAL-G (maintained successor of WAL-E) + Azure / AWS

scope: postgres
namespace: /db/
name: postgresql1

restapi:
    listen: 10.0.1.4:8008
    connect_address: 10.0.1.4:8008

etcd:
    host: 10.0.1.7:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true

    method: clone_with_walg
    clone_with_walg:
        command: /home/postgres/clone_with_walg.sh
        recovery_conf:
            restore_command: envdir /etc/wal-g.d/env wal-g wal-fetch "%f" "%p"
            recovery_target_timeline: latest
            recovery_target_action: promote
            recovery_target_time: ''

    initdb:
    - encoding: UTF8
    - data-checksums

    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 10.0.1.4/0 md5
    - host replication replicator 10.0.1.8/0 md5
    - host replication replicator 10.0.1.6/0 md5
    - host all all 0.0.0.0/0 md5

postgresql:
    listen: 10.0.1.4:5432
    connect_address: 10.0.1.4:5432
    data_dir: /data/patroni
    pgpass: /tmp/pgpass
    authentication:
        replication:
            username: replicator
            password: rep-pass
        superuser:
            username: postgres
            password: secretpassword
    parameters:
        unix_socket_directories: '/var/run/postgresql'
        shared_preload_libraries: 'pg_stat_statements'
        archive_mode: 'on'
        archive_timeout: 300s
        archive_command: 'envdir /etc/wal-g.d/env wal-g wal-push %p'
    recovery_conf:
        restore_command: 'envdir /etc/wal-g.d/env wal-g wal-fetch "%f" "%p"'

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

And easy script clone_with_walg.sh:

#!/bin/bash

mkdir -p /data/patroni
envdir /etc/wal-g.d/env wal-g backup-fetch /data/patroni LATEST

Warning: your /data directory must be writable for postgres user

Steps to PITR:

  1. Stop Patroni on all replica nodes and finally on master sudo systemctl stop patroni

  2. Update configuration file /etc/patroni.yml recovery_target_time: ‘2020-06-08 08:52:00’

  3. Remove cluster from etcd patronictl -c /etc/patroni.yml remove postgres

  4. Backup and remove data directory on master /data/patroni

  5. Run Patroni on master - it automatically call script clone_with_walg.sh sudo systemctl start patroni

0reactions
nevroscommented, Jun 18, 2020

@jiripsota Thank you for providing the additional details in your last comment, I spent many hours trying to figure out how to do this. Was not able to easily find this information.

Read more comments on GitHub >

github_iconTop Results From Across the Web

docs/patroni/postgresql-backups-wale-walg.md · master - GitLab
To restore to a given point of time or to the latest available point, ... by WAL-G takes ~9 hours with Postgres cluster...
Read more >
26.3. Continuous Archiving and Point-in-Time Recovery (PITR)
To recover successfully using continuous archiving (also called “online backup” by many database vendors), you need a continuous sequence of archived WAL ...
Read more >
Continuous PostgreSQL Backups using WAL-G - Supabase
1. Configure environment variables · 2. Enable WAL archiving · 3. Restart the database · 4. Create your first physical backup · 5....
Read more >
docs/patroni/postgresql-backups-wale-walg.md - GitLab
To restore from WAL-E backups, either WAL-G or WAL-E can be used. In "gitlab-restore", the default is WAL-G, as it gives 3-4 times...
Read more >
How To Set Up Continuous Archiving and Perform Point-In ...
In this first step, you need to configure your PostgreSQL 12 cluster to archive the cluster's WAL files in a directory different from...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found