fleet/docs/02-Deploying/05-Load-testing.md

# Load testing

The following document outlines the most recent results of a semi-annual load test of the Fleet server. 

These tests are conducted by the Fleet team, using [osquery-perf](https://github.com/fleetdm/fleet/tree/main/cmd/osquery-perf); a free and open source tool, to generate realistic traffic to the Fleet server.

This document reports the minimum resources for successfully running Fleet with 1,000 hosts and 150,000 hosts.

## Test parameters

The Fleet load tests are conducted with a Fleet server that contains 2 packs, with ~6 queries each, and 6 labels.

A test is deemed successful when the Fleet server is able to receive and make requests to the specified number of hosts without over utilizing the specified resources. In addition, a successful test must report that the Fleet server can run a live query against the specified number of hosts.

## Results

### 1,000 hosts

With the following infrastructure, 1,000 hosts successfully communicate with Fleet. The Fleet server is able to run live queries against all hosts.

|Fleet instances| CPU Units       |RAM             |
|-------|-------------------------|----------------|
| 1 Fargate task | 256 CPU Units  |512 MB of memory|

|&#8203;| Version                 |Instance type |
|-------|-------------------------|--------------|
| Redis | 5.0.6                   |cache.m5.large|
| MySQL | 5.7.mysql_aurora.2.10.0 | db.t4g.medium|

### 150,000 hosts

With the infrastructure listed below, 150,000 hosts successfully communicate with Fleet. The Fleet server is able to run live queries against all hosts.

|Fleet instance | CPU Units       |RAM             |
|-------|-------------------------|----------------|
| 25 Fargate tasks | 1024 CPU units  |2048 MB of memory|

|&#8203;| Version                 |Instance type |
|-------|-------------------------|--------------|
| Redis | 5.0.6                   |cache.m5.large|
| MySQL | 5.7.mysql_aurora.2.10.0 | db.t4g.medium|

The above setup auto scaled based on CPU usage. After a while, the task count ended up in 25 instances even while live querying or adding a new label.

## How we are simulating osquery

The simulation is run by using [osquery-perf](https://github.com/fleetdm/fleet/tree/main/cmd/osquery-perf), a free and open source tool, to generate realistic traffic to the Fleet server.

The following command enrolls and simulates 150,000 hosts on Fleet:

```bash
go run cmd/osquery-perf/agent.go -enroll_secret <secret here> -host_count 150000 -server_url <server URL here> -node_key_file nodekeys
```

After the hosts have been enrolled, you can add `-only_already_enrolled` to make sure the node keys from the file are used and no enrollment happens. This resumes the execution of all the simulated hosts.

## Infrastructure setup

The deployment of Fleet was done through the example [terraform provided in the repo](https://github.com/fleetdm/fleet/tree/main/tools/terraform) with the following command:

```bash
terraform apply \ 
  -var domain_fleetctl=<your domain here> \
  -var domain_fleetdm=<alternative domain here> \ 
  -var s3_bucket=<log bucket name> \
  -var fleet_image="fleetdm/fleet:<tag targeted>" \
  -var vulnerabilities_path="" \
  -var fleet_max_capacity=100 \ 
  -var fleet_min_capacity=5
```

## Limitations

The [osquery-perf](https://github.com/fleetdm/fleet/tree/main/cmd/osquery-perf) tool doesn't simulate all data that's included when a real device communicates to a Fleet instance. For example, system users and software inventory data are not yet simulated by osquery-perf.
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00			`# Load testing`

Update load testing documentation (#2979) * update load testing, mobile table styles * testing table layouts * final tables * update table styles * Update basic-documentation.less * revert unintentional change * Update basic-documentation.less 2021-11-17 02:48:33 +00:00			`The following document outlines the most recent results of a semi-annual load test of the Fleet server.`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Update load testing documentation (#2979) * update load testing, mobile table styles * testing table layouts * final tables * update table styles * Update basic-documentation.less * revert unintentional change * Update basic-documentation.less 2021-11-17 02:48:33 +00:00			`These tests are conducted by the Fleet team, using [osquery-perf](https://github.com/fleetdm/fleet/tree/main/cmd/osquery-perf); a free and open source tool, to generate realistic traffic to the Fleet server.`
Edits to Fleet public load testing documentation (#2535) - Add a summary to the top of the document - Rename "Baseline Test" section to "Test parameters" - Rename "Bare minimum setup" section to "1,000 hosts" - Several smaller edits that call out the number of hosts tested and the results (did Fleet work?) 2021-10-15 17:52:25 +00:00
Add top level links to "Contributing" section and move up Load testing "Results" (#2779) - Add top level links to the "Seeding Data" and "API for contributors" doc pages - Move "Results" section in "Loading testing" closer to the top of document 2021-11-05 14:03:05 +00:00			`This document reports the minimum resources for successfully running Fleet with 1,000 hosts and 150,000 hosts.`

Edits to Fleet public load testing documentation (#2535) - Add a summary to the top of the document - Rename "Baseline Test" section to "Test parameters" - Rename "Bare minimum setup" section to "1,000 hosts" - Several smaller edits that call out the number of hosts tested and the results (did Fleet work?) 2021-10-15 17:52:25 +00:00			`## Test parameters`

			`The Fleet load tests are conducted with a Fleet server that contains 2 packs, with ~6 queries each, and 6 labels.`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Update load testing documentation (#2979) * update load testing, mobile table styles * testing table layouts * final tables * update table styles * Update basic-documentation.less * revert unintentional change * Update basic-documentation.less 2021-11-17 02:48:33 +00:00			`A test is deemed successful when the Fleet server is able to receive and make requests to the specified number of hosts without over utilizing the specified resources. In addition, a successful test must report that the Fleet server can run a live query against the specified number of hosts.`

Edits to Fleet public load testing documentation (#2535) - Add a summary to the top of the document - Rename "Baseline Test" section to "Test parameters" - Rename "Bare minimum setup" section to "1,000 hosts" - Several smaller edits that call out the number of hosts tested and the results (did Fleet work?) 2021-10-15 17:52:25 +00:00			`## Results`

			`### 1,000 hosts`

Update load testing documentation (#2979) * update load testing, mobile table styles * testing table layouts * final tables * update table styles * Update basic-documentation.less * revert unintentional change * Update basic-documentation.less 2021-11-17 02:48:33 +00:00			`With the following infrastructure, 1,000 hosts successfully communicate with Fleet. The Fleet server is able to run live queries against all hosts.`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Update load testing documentation (#2979) * update load testing, mobile table styles * testing table layouts * final tables * update table styles * Update basic-documentation.less * revert unintentional change * Update basic-documentation.less 2021-11-17 02:48:33 +00:00			`\|Fleet instances\| CPU Units \|RAM \|`
			`\|-------\|-------------------------\|----------------\|`
			`\| 1 Fargate task \| 256 CPU Units \|512 MB of memory\|`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Update load testing documentation (#2979) * update load testing, mobile table styles * testing table layouts * final tables * update table styles * Update basic-documentation.less * revert unintentional change * Update basic-documentation.less 2021-11-17 02:48:33 +00:00			`\|\| Version \|Instance type \|`
			`\|-------\|-------------------------\|--------------\|`
			`\| Redis \| 5.0.6 \|cache.m5.large\|`
			`\| MySQL \| 5.7.mysql_aurora.2.10.0 \| db.t4g.medium\|`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Edits to Fleet public load testing documentation (#2535) - Add a summary to the top of the document - Rename "Baseline Test" section to "Test parameters" - Rename "Bare minimum setup" section to "1,000 hosts" - Several smaller edits that call out the number of hosts tested and the results (did Fleet work?) 2021-10-15 17:52:25 +00:00			`### 150,000 hosts`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Edits to Fleet public load testing documentation (#2535) - Add a summary to the top of the document - Rename "Baseline Test" section to "Test parameters" - Rename "Bare minimum setup" section to "1,000 hosts" - Several smaller edits that call out the number of hosts tested and the results (did Fleet work?) 2021-10-15 17:52:25 +00:00			`With the infrastructure listed below, 150,000 hosts successfully communicate with Fleet. The Fleet server is able to run live queries against all hosts.`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Update load testing documentation (#2979) * update load testing, mobile table styles * testing table layouts * final tables * update table styles * Update basic-documentation.less * revert unintentional change * Update basic-documentation.less 2021-11-17 02:48:33 +00:00			`\|Fleet instance \| CPU Units \|RAM \|`
			`\|-------\|-------------------------\|----------------\|`
			`\| 25 Fargate tasks \| 1024 CPU units \|2048 MB of memory\|`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Update load testing documentation (#2979) * update load testing, mobile table styles * testing table layouts * final tables * update table styles * Update basic-documentation.less * revert unintentional change * Update basic-documentation.less 2021-11-17 02:48:33 +00:00			`\|\| Version \|Instance type \|`
			`\|-------\|-------------------------\|--------------\|`
			`\| Redis \| 5.0.6 \|cache.m5.large\|`
			`\| MySQL \| 5.7.mysql_aurora.2.10.0 \| db.t4g.medium\|`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Update load testing documentation (#2979) * update load testing, mobile table styles * testing table layouts * final tables * update table styles * Update basic-documentation.less * revert unintentional change * Update basic-documentation.less 2021-11-17 02:48:33 +00:00			`The above setup auto scaled based on CPU usage. After a while, the task count ended up in 25 instances even while live querying or adding a new label.`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Add top level links to "Contributing" section and move up Load testing "Results" (#2779) - Add top level links to the "Seeding Data" and "API for contributors" doc pages - Move "Results" section in "Loading testing" closer to the top of document 2021-11-05 14:03:05 +00:00			`## How we are simulating osquery`

Update load testing documentation (#2979) * update load testing, mobile table styles * testing table layouts * final tables * update table styles * Update basic-documentation.less * revert unintentional change * Update basic-documentation.less 2021-11-17 02:48:33 +00:00			`The simulation is run by using [osquery-perf](https://github.com/fleetdm/fleet/tree/main/cmd/osquery-perf), a free and open source tool, to generate realistic traffic to the Fleet server.`
Add top level links to "Contributing" section and move up Load testing "Results" (#2779) - Add top level links to the "Seeding Data" and "API for contributors" doc pages - Move "Results" section in "Loading testing" closer to the top of document 2021-11-05 14:03:05 +00:00
			`The following command enrolls and simulates 150,000 hosts on Fleet:`

			```bash
			`go run cmd/osquery-perf/agent.go -enroll_secret <secret here> -host_count 150000 -server_url <server URL here> -node_key_file nodekeys`
			```

			After the hosts have been enrolled, you can add `-only_already_enrolled` to make sure the node keys from the file are used and no enrollment happens. This resumes the execution of all the simulated hosts.

			`## Infrastructure setup`

			`The deployment of Fleet was done through the example [terraform provided in the repo](https://github.com/fleetdm/fleet/tree/main/tools/terraform) with the following command:`

			```bash
			`terraform apply \`
			`-var domain_fleetctl=<your domain here> \`
			`-var domain_fleetdm=<alternative domain here> \`
			`-var s3_bucket=<log bucket name> \`
			`-var fleet_image="fleetdm/fleet:<tag targeted>" \`
			`-var vulnerabilities_path="" \`
			`-var fleet_max_capacity=100 \`
			`-var fleet_min_capacity=5`
			```

Edits to Fleet public load testing documentation (#2535) - Add a summary to the top of the document - Rename "Baseline Test" section to "Test parameters" - Rename "Bare minimum setup" section to "1,000 hosts" - Several smaller edits that call out the number of hosts tested and the results (did Fleet work?) 2021-10-15 17:52:25 +00:00			`## Limitations`
Add infra for loadtest (#2218) * Add infra for loadtest * Move loadtest stuff to a new file and parametrize fleet min/max capacity * wip * wip * wip * wip * wip * wip * wip * Update to be ready for review * Update link and other variables needed * Address review comments and update links 2021-10-14 15:04:27 +00:00
Edits to Fleet public load testing documentation (#2535) - Add a summary to the top of the document - Rename "Baseline Test" section to "Test parameters" - Rename "Bare minimum setup" section to "1,000 hosts" - Several smaller edits that call out the number of hosts tested and the results (did Fleet work?) 2021-10-15 17:52:25 +00:00			`The [osquery-perf](https://github.com/fleetdm/fleet/tree/main/cmd/osquery-perf) tool doesn't simulate all data that's included when a real device communicates to a Fleet instance. For example, system users and software inventory data are not yet simulated by osquery-perf.`