Shorten "simple" query API period to 25s (#3775)

This helps the period stay under the default request timeouts for most
load balancers.

Some default timeouts:
* AWS ALB - 60s
* Nginx - 60s
* GCP LB - 30s
This commit is contained in:
Zach Wasserman 2022-01-19 17:48:57 -08:00 committed by GitHub
parent fc44970f49
commit 4a70cd69fa
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 17 additions and 7 deletions

1
changes/query-timeout Normal file
View File

@ -0,0 +1 @@
* Reduce the default period of the "simple" live query API (`/api/v1/queries/run`) to 25 seconds to remain below load balancer timeouts.

View File

@ -2475,10 +2475,17 @@ Deletes the queries specified by ID. Returns the count of queries successfully d
### Run live query
Runs one or more live queries against the specified hosts and responds with the results
over a fixed period of 90 seconds.
Run one or more live queries against the specified hosts and responds with the results
collected after 25 seconds.
If you are using this API to run multiple queries at the same time, they are started simultaneously. Response time is capped at 90 seconds from when the API request was received, regardless of how many queries you are running, and regardless whether all results have been gathered or not.
If multiple queries are provided, they run concurrently. Response time is capped at 25 seconds from
when the API request was received, regardless of how many queries you are running, and regardless
whether all results have been gathered or not. This API does not return any results until the fixed
time period elapses, at which point all of the collected results are returned.
The fixed time period is configurable via environment variable on the Fleet server (eg.
`FLEET_LIVE_QUERY_REST_PERIOD=90s`). If setting a higher value, be sure that you do not exceed your
load balancer timeout.
> WARNING: This API endpoint collects responses in-memory (RAM) on the Fleet compute instance handling this request, which can overflow if the result set is large enough. This has the potential to crash the process and/or cause an autoscaling event in your cloud provider, depending on how Fleet is deployed.
@ -2489,8 +2496,8 @@ If you are using this API to run multiple queries at the same time, they are sta
| Name | Type | In | Description |
| --------- | ------ | ---- | --------------------------------------------- |
| query_ids | array | body | **Required**. The IDs of the queries to run as live queries. |
| host_ids | array | body | **Required**. The IDs of the hosts to run the live queries against. |
| query_ids | array | body | **Required**. The IDs of the saved queries to run. |
| host_ids | array | body | **Required**. The IDs of the hosts to target. |
#### Example

View File

@ -36,13 +36,15 @@ func (r runLiveQueryResponse) error() error { return r.Err }
func runLiveQueryEndpoint(ctx context.Context, request interface{}, svc fleet.Service) (interface{}, error) {
req := request.(*runLiveQueryRequest)
// The period used here should always be less than the request timeout for any load
// balancer/proxy between Fleet and the API client.
period := os.Getenv("FLEET_LIVE_QUERY_REST_PERIOD")
if period == "" {
period = "90s"
period = "25s"
}
duration, err := time.ParseDuration(period)
if err != nil {
duration = 90 * time.Second
duration = 25 * time.Second
logging.WithExtras(ctx, "live_query_rest_period_err", err)
}