fleet/changes/13527-applying-policies-at-scale
Lucas Manuel Rodriguez 9142c5de79
Prevent thundering herd when applying large number of policies on large number of hosts (#13552)
#13527

(Adding @mna to double check the changes in the async implementation of
policy result storage)

This PR also adds the osquery-perf changes needed to define the count of
macOS and Windows hosts.

- [X] Changes file added for user-visible changes in `changes/` or
`orbit/changes/`.
See [Changes
files](https://fleetdm.com/docs/contributing/committing-changes#changes-files)
for more information.
- ~[ ] Documented any API changes (docs/Using-Fleet/REST-API.md or
docs/Contributing/API-for-contributors.md)~
- ~[ ] Documented any permissions changes (docs/Using
Fleet/manage-access.md)~
- [X] Input data is properly validated, `SELECT *` is avoided, SQL
injection is prevented (using placeholders for values in statements)
- [X] Added support on fleet's osquery simulator `cmd/osquery-perf` for
new osquery data ingestion features.
- [X] Added/updated tests
- [X] Manual QA for all new/changed functionality
  - ~For Orbit and Fleet Desktop changes:~
- ~[ ] Manual QA must be performed in the three main OSs, macOS, Windows
and Linux.~
- ~[ ] Auto-update manual QA, from released version of component to new
version (see [tools/tuf/test](../tools/tuf/test/README.md)).~

Test with 80k hosts: 70k simulated macOS, 10k simulated Windows.
Apply Windows policies first, then apply macOS policies:
```
fleetctl apply -f ee/cis/win-10/cis-policy-queries.yml

# Leave running for some time

fleetctl apply -f ee/cis/macos-13/cis-policy-queries.yml
```

After applying CIS policies previous to these changes:
![Screenshot 2023-08-23 at 11 36
18](https://github.com/fleetdm/fleet/assets/2073526/72c1dc7d-e601-4248-be35-93c85b749f5d)

After applying these changes and applying the same policies:
![Screenshot 2023-08-28 at 15 42
57](https://github.com/fleetdm/fleet/assets/2073526/6b6d76b8-6acb-4893-a913-bf603a68f1a4)
2023-08-31 10:58:50 -03:00

5 lines
455 B
Plaintext

* Improved performance at scale when applying hundreds of policies to thousands of hosts via `fleetctl apply`.
IMPORTANT: In previous versions of Fleet there's a performance issue (thundering herd) when applying hundreds of
policies on a large number of hosts. To avoid this, make sure to deploy this version of Fleet, and make sure Fleet
is running for at least 1h (or the configured `FLEET_OSQUERY_POLICY_UPDATE_INTERVAL`) before applying the policies.