diff --git a/changes/query-timeout b/changes/query-timeout new file mode 100644 index 000000000..498204776 --- /dev/null +++ b/changes/query-timeout @@ -0,0 +1 @@ +* Reduce the default period of the "simple" live query API (`/api/v1/queries/run`) to 25 seconds to remain below load balancer timeouts. diff --git a/docs/01-Using-Fleet/03-REST-API.md b/docs/01-Using-Fleet/03-REST-API.md index 6c852127a..67010e785 100644 --- a/docs/01-Using-Fleet/03-REST-API.md +++ b/docs/01-Using-Fleet/03-REST-API.md @@ -2475,10 +2475,17 @@ Deletes the queries specified by ID. Returns the count of queries successfully d ### Run live query -Runs one or more live queries against the specified hosts and responds with the results -over a fixed period of 90 seconds. +Run one or more live queries against the specified hosts and responds with the results +collected after 25 seconds. -If you are using this API to run multiple queries at the same time, they are started simultaneously. Response time is capped at 90 seconds from when the API request was received, regardless of how many queries you are running, and regardless whether all results have been gathered or not. +If multiple queries are provided, they run concurrently. Response time is capped at 25 seconds from +when the API request was received, regardless of how many queries you are running, and regardless +whether all results have been gathered or not. This API does not return any results until the fixed +time period elapses, at which point all of the collected results are returned. + +The fixed time period is configurable via environment variable on the Fleet server (eg. +`FLEET_LIVE_QUERY_REST_PERIOD=90s`). If setting a higher value, be sure that you do not exceed your +load balancer timeout. > WARNING: This API endpoint collects responses in-memory (RAM) on the Fleet compute instance handling this request, which can overflow if the result set is large enough. This has the potential to crash the process and/or cause an autoscaling event in your cloud provider, depending on how Fleet is deployed. @@ -2489,8 +2496,8 @@ If you are using this API to run multiple queries at the same time, they are sta | Name | Type | In | Description | | --------- | ------ | ---- | --------------------------------------------- | -| query_ids | array | body | **Required**. The IDs of the queries to run as live queries. | -| host_ids | array | body | **Required**. The IDs of the hosts to run the live queries against. | +| query_ids | array | body | **Required**. The IDs of the saved queries to run. | +| host_ids | array | body | **Required**. The IDs of the hosts to target. | #### Example diff --git a/server/service/live_queries.go b/server/service/live_queries.go index c99c11bfa..91953ccaa 100644 --- a/server/service/live_queries.go +++ b/server/service/live_queries.go @@ -36,13 +36,15 @@ func (r runLiveQueryResponse) error() error { return r.Err } func runLiveQueryEndpoint(ctx context.Context, request interface{}, svc fleet.Service) (interface{}, error) { req := request.(*runLiveQueryRequest) + // The period used here should always be less than the request timeout for any load + // balancer/proxy between Fleet and the API client. period := os.Getenv("FLEET_LIVE_QUERY_REST_PERIOD") if period == "" { - period = "90s" + period = "25s" } duration, err := time.ParseDuration(period) if err != nil { - duration = 90 * time.Second + duration = 25 * time.Second logging.WithExtras(ctx, "live_query_rest_period_err", err) }