Add documentation on Fleet performance (#86)

- Document scaling.
- Document debugging steps/tools.
- Update issue template to request debug archive.
This commit is contained in:
Zach Wasserman 2020-12-02 09:46:02 -08:00 committed by GitHub
parent 7d299ca6f7
commit 47b4f07afb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 56 additions and 0 deletions

View File

@ -19,6 +19,11 @@ Please provide as much context as you can about your use case.
### If this is a UI issue: What browser are you using?
### If this is a performance issue: Please attach the debug archive.
<!--
Follow the steps documented in https://github.com/fleetdm/fleet/blob/master/docs/infrastructure/performance.md#debugging-performance-issues to generate a debug archive
-->
### What did you do?
### What did you expect to see?

View File

@ -33,6 +33,8 @@ Learn how to work with the osquery file carving functionality to extract file co
Check out the [Frequently Asked Questions](./faq.md), which include troubleshooting steps for the most common issues experience by Fleet users.
For performance concerns, see the [performance guide](./performance.md).
## Security
Fleet developers have documented how Fleet handles the [OWASP Top 10](./owasp-top-10.md).

View File

@ -0,0 +1,49 @@
# Fleet Server Performance
Fleet is designed to scale to hundreds of thousands of online hosts. The Fleet server scales horizontally to support higher load.
## Horizontal Scaling
Scaling Fleet horizontally is as simple as running more Fleet server processes connected to the same MySQL and Redis backing stores. Typically, operators front Fleet server nodes with a load balancer that will distribute requests to the servers. All APIs in Fleet are designed to work in this arrangement by simply configuring clients to connect to the load balancer.
## Availability
The Fleet/osquery system is resilient to loss of availability. Osquery agents will continue executing the existing configuration and buffering result logs during downtime due to lack of network connectivity, server maintenance, or any other reason. Buffering in osquery can be configured with the `--buffered_log_max` flag.
Note that short downtimes are expected during [Fleet server upgrades](./updating-fleet.md) that require database migrations.
## Monitoring
More information on monitoring Fleet servers with Prometheus and other tools is available in the [Monitoring Fleet](./monitoring-alerting.md) documentation.
## Debugging Performance Issues
### MySQL & Redis
If performance issues are encountered with the MySQL and Redis servers, use the extensive resources available online to optimize and understand these problems. Please [file an issue](https://github.com/fleetdm/fleet/issues/new/choose) with details about the problem so that Fleet developers can work to fix them.
### Fleet Server
For performance issues in the Fleet server process, please [file an issue](https://github.com/fleetdm/fleet/issues/new/choose) with details about the scenario, and attach a debug archive. Debug archives can also be submitted confidentially through other support channels.
#### Generate Debug Archive (Fleet 3.4.0+)
Use the `fleetctl archive` command to generate an archive of Fleet's full suite of debug profiles. See the [fleetctl setup guide](../cli/setup-guide.md) for details on configuring `fleetctl`.
The generated `.tar.gz` archive will be available in the current directory.
##### Targeting Individual Servers
In most configurations, the `fleetctl` client is configured to make requests to a load balancer that will proxy the requests to each server instance. This can be problematic when trying to debug a performance issue on a specific server. To target an individual server, create a new `fleetctl` context that uses the direct address of the server.
For example:
```sh
fleetctl config set --context server-a --address https://server-a:8080
fleetctl login --context server-a
fleetctl debug archive --context server-a
```
##### Confidential Information
The `fleetctl archive` command retrieves information generated by Go's [`net/http/pprof`](https://golang.org/pkg/net/http/pprof/) package. In most scenarios this should not include sensitive information, however it does include command line arguments to the Fleet server. If the Fleet server receives sensitive credentials via CLI argument (not environment variables or config file), this information should be scrubbed from the archive in the `cmdline` file.