ERROR: Error when fetching backup: pg_basebackup exited with code=1

See original GitHub issue

Hi, Can someone please guide me on this. I am configuring Postgres 12.1 with Patroni. I have a cluster with 7 nodes.Everytime I scale up, i end up with this situation. The master/Leader starts and works but the slave/replica either stops or ends up in ‘creating replica’ mode.

root@103d123f2d5c:/bp2/src# patronictl -c pg_patroni.yml list
+---------+--------+----------------+--------+------------------+----+-----------+
| Cluster | Member |      Host      |  Role  |      State       | TL | Lag in MB |
+---------+--------+----------------+--------+------------------+----+-----------+
|  blue0  |  pg_0  | 127.0.0.1:5432 | Leader |     running      |  1 |           |
|  blue0  |  pg_1  | 127.0.0.1:5432 |        |     stopped      |    |   unknown |
|  blue0  |  pg_2  | 127.0.0.1:5432 |        |     stopped      |    |   unknown |
|  blue0  |  pg_3  | 127.0.0.1:5432 |        |     stopped      |    |   unknown |
|  blue0  |  pg_4  | 127.0.0.1:5432 |        |     stopped      |    |   unknown |
|  blue0  |  pg_5  | 127.0.0.1:5432 |        |     stopped      |    |   unknown |
|  blue0  |  pg_6  | 127.0.0.1:5432 |        | creating replica |    |   unknown |
+---------+--------+----------------+--------+------------------+----+-----------+

The logs constantly shows error such as these: 

2020-01-02 23:04:27,006 DEBUG: Sending request(xid=717): SetData(path='/bp/blue0/members/pg_1', data=b'{"conn_url":"postgres://127.0.0.1:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","state":"stopped","role":"uninitialized","version":"1.6.3"}', version=-1)
2020-01-02 23:04:27,011 DEBUG: Received response(xid=717): ZnodeStat(czxid=12885154300, mzxid=12885161237, ctime=1577999766021, mtime=1578006267006, version=652, cversion=0, aversion=0, ephemeralOwner=31334655534829047, dataLength=150, numChildren=0, pzxid=12885154300)
2020-01-02 23:04:27,011 INFO: trying to bootstrap from leader 'pg_0'
2020-01-02 23:04:27,025 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:27,025 WARNING: Trying again in 5 seconds
2020-01-02 23:04:32,037 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:32,037 ERROR: failed to bootstrap from leader 'pg_0'
2020-01-02 23:04:32,037 INFO: Removing data directory: /bp2/data/psql
2020-01-02 23:04:37,004 INFO: Lock owner: pg_0; I am pg_1
2020-01-02 23:04:37,006 DEBUG: Sending request(xid=718): SetData(path='/bp/blue0/members/pg_1', data=b'{"conn_url":"postgres://127.0.0.1:5432/postgres","api_url":"http://127.0.0.1:8008/patroni","state":"stopped","role":"uninitialized","version":"1.6.3"}', version=-1)
2020-01-02 23:04:37,011 DEBUG: Received response(xid=718): ZnodeStat(czxid=12885154300, mzxid=12885161248, ctime=1577999766021, mtime=1578006277007, version=653, cversion=0, aversion=0, ephemeralOwner=31334655534829047, dataLength=150, numChildren=0, pzxid=12885154300)
2020-01-02 23:04:37,012 INFO: trying to bootstrap from leader 'pg_0'
2020-01-02 23:04:37,022 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:37,023 WARNING: Trying again in 5 seconds
2020-01-02 23:04:42,036 ERROR: Error when fetching backup: pg_basebackup exited with code=1
2020-01-02 23:04:42,036 ERROR: failed to bootstrap from leader 'pg_0'
2020-01-02 23:04:42,036 INFO: Removing data directory: /bp2/data/psql

and here is my yaml file:

scope: blue0
namespace: /bp/
name: pg_1

log:
  level: DEBUG
  traceback_level: debug
  dir: /bp2/log/
   
restapi:
  listen: 127.0.0.1:8008
  connect_address: 127.0.0.1:8008

zookeeper:
  hosts: zookeeper:2181

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
  initdb: 
    - encoding: UTF8
    - data-checksums
  pg_hba: 
    - host replication replicator 127.0.0.1/32 md5
    - host all all 0.0.0.0/0 md5
  users:
    sbpadmin:
        password: sbpadminpw
        options:
            - createrole
            - createdb
    bpadmin:
        password: bpadminpw
        options:
            - replication
    wbpadmin:
        password: wbpadminpw
        options:
            - rewind

postgresql:
  listen: 127.0.0.1:5432
  connect_address: 127.0.0.1:5432
  config_dir: /bp2/data/psql
  data_dir: /bp2/data/psql
  bin_dir: /usr/lib/postgresql/12/bin/

  pgpass: /bp2/log/pgpass
  authentication:
    replication:
      username: bpadmin
      password: bpadminpw
    superuser:
      username: sbpadmin
      password: sbpadminpw
    rewind:
      username: wbpadmin
      password: wbpadminpw
  parameters:
    unix_socket_directories: '.'
tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

Issue Analytics

State:
Created 4 years ago
Comments:10

Top GitHub Comments

2reactions

sandeepkalracommented, Jan 10, 2020

Thanks. I think I found the issue, but do not know if my fix is correct or not. There were 2 problems

There was no user called ‘replicator’. 2- The pg_hba entry wasn’t correct (permissions wise). I changed pg_hba block to the following:

pg_hba:
 host replication $rep_username $localhost_ip/32 md5
 host all   $rep_username $localhost_ip/32 md5
 host replication $rep_username 0.0.0.0/0 md5
 host all   $rep_username 0.0.0.0/0 md5
 host all all 0.0.0.0/0 md5

the $rep_username and $localhost_ip were replaced with the correct username and host-ip.

After this, the replication completes for replicas.

Thanks,

1reaction

CyberDem0ncommented, Jan 4, 2020

Why do you set listen and connect_address to 127.0.0.1? How nodes will discover each other?

Top Results From Across the Web

Error on 2-Replica Cluster - Bootstrap from leader

JIC, the PRIMARY POD is hippo-instance1-tdxx. My assumption is (maybe I'm completely wrong), that with replicas=2 we have an HA setup with 1...

PGUpgrade script not working on patroni standby leaders

PGUpgrade script not working on patroni standby leaders ... 12:46:42,416 ERROR: Error when fetching backup: pg_basebackup exited with code=1.

Re: pg_basebackup: return value 1: reason? - PostgreSQL

> > I tried to run pg_basebackup. Return value is 1. > > > > How to find out its reason? > >...

Thread: pg_basebackup: return value 1: reason?

I think it failed to start process to fetching wal logs created during backup: but neither on master node neither on pg_basebackup output ......

26.3. Continuous Archiving and Point-in-Time Recovery (PITR)

You should ensure that any error condition or request to a human operator is ... 1.16 and later exit with 1 if a...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

ERROR: Error when fetching backup: pg_basebackup exited with code=1

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post