It would appear that if an attribute error is raised when trying to detect a class atter,
that the test suite does not run the class teardown method but continues regardless. This
fixes the class attr error which then allows the teardown to run. Prior to this, if the
teardown did not run, the entire suite would hang out shutdown because it was blocked
on waiting for a ioloop to terminate.
When running the tests with the tcp transport, we are not as forgiving
with the minion connection process as we are in ZMQ. In ZMQ, we attempt
to connect to the master. If it isn't up yet, we wait and try again. In
TCP, we try to connect to the master once, realize it's not up (because
the master process takes longer to spin up than the minions) and crash
and bail out.
This just gives the master a little more time to come up by having the
minions try to connect a couple more times.
This fixes the bug #36866 where minion gets __master_disconnected right
after connect because '::1' isn't in the list of connected masters that
is ['127.0.0.1'].
Remove the entry from the instance map so
that a closed entry may not be reused.
This forces this operation even if the reference
count of the entry has not yet gone to zero.
Signed-off-by: Sergey Kizunov <sergey.kizunov@ni.com>
It has been observed that when running this command:
```
salt "*" test.ping
```
sometimes the command would return `Minion did not return. [No response]`
for some of the minions even though the minions did indeed respond
(reproduced running Windows salt-master on Python 3 using the TCP
transport).
After investigating this further, it seems that there is a race condition
where if the response via event happens before events are being listened
for, the response is lost. For instance, in
`salt.client.LocalClient.cmd_cli` which is what is invoked in the command
above, it won't start listening for events until `get_cli_event_returns`
which invokes `get_iter_returns` which invokes `get_returns_no_block`
which invokes `self.event.get_event` which will connect to the event bus
if it hasn't connected yet (which is the case the first time it hits
this code). But events may be fired anytime after `self.pub()` is
executed which occurs before this code.
We need to ensure that events are being listened for before it is
possible they return. We also want to avoid issue #31454 which is what
PR #36024 fixed but in turn caused this issue. This is the approach I
have taken to try to tackle this issue:
It doesn't seem possible to generically discern if events can be
returned by a given function that invokes `run_job` and contains an
event searching function such as `get_cli_event_returns`. So for all
such functions that could possibly need to search the event bus, we
do the following:
- Record if the event bus is currently being listened to.
- When invoking `run_job`, ensure that `listen=True` so that `self.pub()`
will ensure that the event bus is listed to before sending the payload.
- When all possible event bus activities are concluded, if the event
bus was not originally being listened to, stop listening to it. This is
designed so that issue #31454 does not reappear. We do this via a
try/finally block in all instances of such code.
Signed-off-by: Sergey Kizunov <sergey.kizunov@ni.com>