Thread hang cause "waiting for leader to bootstrap"

See original GitHub issue

Hi,

We meet an issue during “helm install”. After analysis, I think there is a chance python thread hang and cause “waiting for leader to bootstrap”. So I would like to report this issue, I don’t know if you could do something to improve this part or not?

Patroni v1.6.3 Python 2.7

— LOG —

2020-03-31T19:07:05.066891092Z Skip service level restore action.
2020-03-31T19:07:05.093194223Z /entrypoint.sh: dir data changed for postgresql
2020-03-31T19:07:05.096706161Z /entrypoint.sh: dir /var/lib/postgresql/data/pgdata changed owner for postgresql
2020-03-31T19:07:05.117268451Z ls: cannot access '/var/lib/postgresql/data/pgdata/pg_replslot/': No such file or directory
2020-03-31T19:07:05.119851734Z /entrypoint.sh: create dir done, uid=26(postgres) gid=26(postgres) groups=26(postgres),0(root)
2020-03-31T19:07:05.685056852Z 2020-03-31 19:07:05,684 INFO: postgres connection_string is postgres://192.168.21.199:5432/postgres
2020-03-31T19:07:05.68508475Z 2020-03-31 19:07:05,684 INFO: No PostgreSQL configuration items changed, nothing to reload.
2020-03-31T19:07:05.686310355Z 2020-03-31 19:07:05,686 INFO: Selected address family is 2
2020-03-31T19:07:05.68791133Z 2020-03-31 19:07:05,687 INFO: Postgres stop: success: True, signaled: False, block_callbacks: False
2020-03-31T19:07:05.688267346Z 2020-03-31 19:07:05,687 INFO: Lock owner: None; I am testapp-db-pg-0
2020-03-31T19:07:05.688286198Z 2020-03-31 19:07:05,688 INFO: waiting for leader to bootstrap
2020-03-31T19:07:15.688363227Z 2020-03-31 19:07:15,687 INFO: Postgres stop: success: True, signaled: False, block_callbacks: False
2020-03-31T19:07:15.688460201Z 2020-03-31 19:07:15,688 INFO: Lock owner: None; I am testapp-db-pg-0

If we take a look source code, I find this ha.py

                else:
                    ret = self._async_executor.try_run_async('bootstrap', self.state_handler.bootstrap.bootstrap,
                                                             args=(self.patroni.config['bootstrap'],))
                    return ret or 'trying to bootstrap a new cluster'

async_executor.py

    def run_async(self, func, args=()):
        Thread(target=self.run, args=(func, args)).start()

    def try_run_async(self, action, func, args=()):
        prev = self.schedule(action)
        if prev is None:
            return self.run_async(func, args)
        return 'Failed to run {0}, {1} is already in progress'.format(action, prev)

As we didn’t see “trying to bootstrap a new cluster” printout, I think the python thread had some kind of run-time problem.

Do you have any suggestions?

BRs, Fan Liu

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:8

github_iconTop GitHub Comments

1reaction
CyberDem0ncommented, Apr 2, 2020

The only place which keeps the information about the initialized cluster is configmap or endpoint on K8s and the /config key for any other DCS. If you still get waiting for leader to bootstrap message - that means <cluster-name>-config configmap or endpoint is still there. Nothing else is possible.

0reactions
qinggueecommented, Apr 17, 2020

Thanks for the info @CyberDem0n You are right, especially on K8s. Restart just happens by many reason.

BRs, Fan Liu

Read more comments on GitHub >

github_iconTop Results From Across the Web

Thread: Patroni configuration issue - Postgres Professional
Waiting for leader to bootstrap ​​ yml -- start this when p0 is down. ideally when it is started as replica, it would...
Read more >
Upgrade patroni to 2.0.x (#5870) · Issues - GitLab
Replicas are waiting for checkpoint indication via member key of the leader in DCS. The key is normally updated only once per HA...
Read more >
Patroni
Changing the bootstrap section in the Patroni configuration takes no effect once the cluster has been bootstrapped. Page 52. 52. Please capita.
Read more >
Consumer not receiving messages, kafka console, new ...
I my MAC box I was facing the same issue of console-consumer not consuming any messages when used the command kafka-console-consumer --bootstrap-server ...
Read more >
Patroni - PGCon
2019-03-07 12:14:33,864 INFO: doing crash recovery in a single user mode ... with url: /v2/keys/service/demo/leader (Caused by.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found