Multiple returns from minions when multi-master is used.

See original GitHub issue

Greetings,

I have looked at a number of issues like this on here and other sites and have not seen a true solution to the issue, instead most seem to focus on the different symptoms and not the under lying cause… Unfortunately, that means none of the existing solutions has really fixed the problem. We first saw this issue in Salt 2014.1.10 and continue to see it in 2015.8.x

So the basic problem: When you have multi-master configured, per the Salt documentation, you can get into a situation where each Salt command generates multiple returns for random minions. A simple example of this;

jwells@saltsyndic:~$ salt \* test.ping
random-minion-001: true
random-minion-005: true
random-minion-005: true
random-minion-003: true
random-minion-001: true
random-minion-002: true

When we run the same command from out Salt M-O-M (Master-Of-Masters) cluster this effect is consistent for the Salt Syndic / Master boxes and more random for the minions.

Our original structure was;

  • 1 x Salt M-O-M (Running Ubuntu, Salt Master, Salt Minion) – CNAME saltmaster
  • 2 x Salt Syndic / Master (Running Ubuntu, Salt Master, Salt Minion, and Salt Minion) – CNAME salt
  • ~750 Salt Minion (Running Ubuntu and Salt Minion)

The Salt Minion nodes were configured to connect to both of the Salt Syndic / Master nodes using a single DNS CNAME and when that didn’t work we converted to a VIP (Same result and fail-over was unstable);

master: salt

We found a discussion somewhere that indicated that the minion config needed to use two different names (Same result, though it failed over gracefully);

master:
  salt-001
  salt-002

In another discussion we saw that it was caused by multiple instances of Salt Syndic running, so we created a script to periodically kill all instances and restart it. Same result.

Another discussion stated that it was because we didn’t have master_id set, or because we had order_masters, or because we didn’t have it set… At that point, we went to a single Salt M-O-M, and a single Salt Syndic / Master at each site.

We recently started the process of upgrading to Salt 2015.8.x and, due to other issues, we decided to go back to the multi-master configuration. And see this issue is still present. 😦

Our setup;

  • 2 x Salt M-O-M (Running Ubuntu, Salt Master, Salt Minion, and Gluster to share /etc/salt/pki/master)
  • 2 x Salt Syndic / Master (Running Ubuntu, Salt Master, Salt Minion, Salt Syndic, and Gluster to share /etc/salt/pki/master)
  • N x Salt Minion (Running Ubuntu, Salt Minion)

The Salt M-O-M, are both running the same versions / configurations as verified by Salt and md5sum. The relevant portions of the Salt Master configurations are (minion_id == saltmom-001 or saltmom-002 respectively);

order_masters: true
master_id: <minion_id value>

Just for completeness, here is the relevant portion of the Salt Minion configuration;

master:
  - saltmom-001
  - saltmom-002

The Salt Syndic / Master are near identical to the Salt M-O-M configuration, but they have an extra entry for Syndic (minion_id == salt-001 or salt-002 respectively);

order_masters: true
master_id: <minion_id value>

syndic_master:
  - saltmom-001
  - saltmom-002

And the relevant portion of the Salt Minion configuration;

master:
  - salt-001
  - salt-002

With just this configuration in place, if I do a ‘salt * test.ping’, from either of the Salt M-O-M boxes, I would expect to get back something like the following;

jwells@saltmom-001:~$ salt \* test.ping --output=yaml
saltmom-001: true
saltmom-002: true
salt-001: true
salt-002: true

Instead what I get back is something like;

jwells@saltmom-001:~$ salt \* test.ping --output=yaml
saltmom-001: true
salt-001: true
saltmom-002: true
salt-001: true
salt-002: true
salt-002: true

And Salt Master logs on the Salt M-O-M show something like;

2015-11-11 11:25:12,535 [salt.utils.job   ][INFO    ][12496] Got return from salt-002 for job 20151111112511812434
2015-11-11 11:25:12,577 [salt.utils.job   ][INFO    ][12513] Got return from salt-002 for job 20151111112511812434
2015-11-11 11:25:12,579 [salt.loaded.int.returner.local_cache][ERROR   ][12513] An extra return was detected from minion salt-002, please verify the minion, this could be a replay attack

Now, if we add minions to the Salt Syndic / Master boxes, we will get the same duplicate responses on the Salt M-O-M, but not if we call it from the Salt Syndic / Master boxes. For completeness, their relevant configuration is;

master:
  - salt-001
  - salt-002

When we look at the Salt Master logs on the Salt Syndic / Master, we see the following;

2015-11-11 11:25:12,535 [salt.minion      ][INFO    ][27256] Returning information for job: 20151111112511812434
2015-11-11 11:25:12,571 [salt.minion      ][INFO    ][27256] Returning information for job: 20151111112511812434

So the Salt Syndic is returning the correct information, but then if we look at the other Salt Syndic / Master’s syndic log file we find the same jobs being returned. So the Salt M-O-M get’s two copies of the same return, marks one of them as “An extra return” and will display it in the output.

Finally, I should point out that this is not completely consistent. We do get the occasional entry like this;

jwells@saltmom-001:~$ salt \* test.ping --output=yaml
saltmom-001: true
salt-001: Minion did not return. [Not connected]
saltmom-002: true
salt-001: true
salt-002: true
salt-002: true

All of the nodes in the new environment have the same versions of Ubuntu packages, Python, and Salt;

jwells@saltmom:~$ sudo salt-run --versions-report
Salt Version:
             Salt: 2015.8.0

Dependency Versions:
         Jinja2: 2.7.2
       M2Crypto: Not Installed
           Mako: 0.9.1
         PyYAML: 3.11
          PyZMQ: 14.4.0
         Python: 2.7.6 (default, Jun 22 2015, 17:58:13)
           RAET: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.0.4
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 1.5
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
        libnacl: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.3.0
   mysql-python: Not Installed
  pycparser: Not Installed
   pycrypto: 2.6.1
     pygit2: Not Installed
   python-gnupg: Not Installed
          smmap: Not Installed
        timelib: Not Installed

System Versions:
           dist: Ubuntu 14.04 trusty
        machine: x86_64
        release: 3.13.0-66-generic
         system: Ubuntu 14.04 trusty

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:17 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
basepicommented, Nov 18, 2015

Yes. The issue is that the minion doesn’t keep track of jids it has already executed. It just executes a job every time it receives a job. So when you pub a job from the master of masters through two syndic masters, with a minion underneath connected to both, the minion receives to jobs, and so it runs and returns two jobs.

This is a fairly common use case, so we need to at some point add this filtering mechanism, I think. I’m pretty sure we already have an issue open for it somewhere, but I can’t find it off the top of my head.

0reactions
Ch3LLcommented, Nov 9, 2017

@0x6c can you please open a new issue since this issue is closed and fixed, with additional details such as version this is affecting you in and configuration setup

Read more comments on GitHub >

github_iconTop Results From Across the Web

Multi Master Tutorial
The multi-master system allows for redundancy of Salt masters and facilitates multiple points of communication out to minions. When using a multi-master setup, ......
Read more >
Salt multi master: does it work with multiple masters offline
I am trying to run a multi-master setup in our dev environment. The idea is that every dev team has their own salt...
Read more >
SaltStack Enterprise with Master Failover (50122496)
Multimaster configurations. You can configure minions to connect to multiple masters in one of two ways: simultaneously - all masters are active ...
Read more >
avoid sharing keys in a multi master setup
possiblity there might be. ... getting all this to work. ... and the same key-pair on the minion to verify master keys. Once...
Read more >
Patching latest salt 3000/3002 minion throws an error ...
After patching salt minion(s) to the latest version, starting any of them ... ][15942] Error while bringing up minion for multi-master.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found