When systemd-python is not installed, systemd notification falls back to
using the systemd-notify for service notification. This cannot be used
however unless the unit has NotifyAccess=all set.
The particular use case for this is when Salt is installed using pip. We
don't put systemd-python into the requirements.txt because we can't be
sure that the minion supports systemd, so pip installs won't necessarily
have systemd-python available.
When running the tests with the tcp transport, we are not as forgiving
with the minion connection process as we are in ZMQ. In ZMQ, we attempt
to connect to the master. If it isn't up yet, we wait and try again. In
TCP, we try to connect to the master once, realize it's not up (because
the master process takes longer to spin up than the minions) and crash
and bail out.
This just gives the master a little more time to come up by having the
minions try to connect a couple more times.
This fixes the bug #36866 where minion gets __master_disconnected right
after connect because '::1' isn't in the list of connected masters that
is ['127.0.0.1'].
Remove the entry from the instance map so
that a closed entry may not be reused.
This forces this operation even if the reference
count of the entry has not yet gone to zero.
Signed-off-by: Sergey Kizunov <sergey.kizunov@ni.com>
It has been observed that when running this command:
```
salt "*" test.ping
```
sometimes the command would return `Minion did not return. [No response]`
for some of the minions even though the minions did indeed respond
(reproduced running Windows salt-master on Python 3 using the TCP
transport).
After investigating this further, it seems that there is a race condition
where if the response via event happens before events are being listened
for, the response is lost. For instance, in
`salt.client.LocalClient.cmd_cli` which is what is invoked in the command
above, it won't start listening for events until `get_cli_event_returns`
which invokes `get_iter_returns` which invokes `get_returns_no_block`
which invokes `self.event.get_event` which will connect to the event bus
if it hasn't connected yet (which is the case the first time it hits
this code). But events may be fired anytime after `self.pub()` is
executed which occurs before this code.
We need to ensure that events are being listened for before it is
possible they return. We also want to avoid issue #31454 which is what
PR #36024 fixed but in turn caused this issue. This is the approach I
have taken to try to tackle this issue:
It doesn't seem possible to generically discern if events can be
returned by a given function that invokes `run_job` and contains an
event searching function such as `get_cli_event_returns`. So for all
such functions that could possibly need to search the event bus, we
do the following:
- Record if the event bus is currently being listened to.
- When invoking `run_job`, ensure that `listen=True` so that `self.pub()`
will ensure that the event bus is listed to before sending the payload.
- When all possible event bus activities are concluded, if the event
bus was not originally being listened to, stop listening to it. This is
designed so that issue #31454 does not reappear. We do this via a
try/finally block in all instances of such code.
Signed-off-by: Sergey Kizunov <sergey.kizunov@ni.com>