Shorten "simple" query API period to 25s (#3775)

This helps the period stay under the default request timeouts for most load balancers. Some default timeouts: * AWS ALB - 60s * Nginx - 60s * GCP LB - 30s
2024-11-06 08:55:24 +00:00 · 2022-01-19 17:48:57 -08:00 · 2022-01-19 17:48:57 -08:00 · 4a70cd69fa
commit 4a70cd69fa
parent fc44970f49
3 changed files with 17 additions and 7 deletions
--- a/changes/query-timeout
+++ b/changes/query-timeout
@ -0,0 +1 @@
+* Reduce the default period of the "simple" live query API (`/api/v1/queries/run`) to 25 seconds to remain below load balancer timeouts.
--- a/docs/01-Using-Fleet/03-REST-API.md
+++ b/docs/01-Using-Fleet/03-REST-API.md
@ -2475,10 +2475,17 @@ Deletes the queries specified by ID. Returns the count of queries successfully d

 ### Run live query

-Runs one or more live queries against the specified hosts and responds with the results
-over a fixed period of 90 seconds.
+Run one or more live queries against the specified hosts and responds with the results
+collected after 25 seconds.

-If you are using this API to run multiple queries at the same time, they are started simultaneously.  Response time is capped at 90 seconds from when the API request was received, regardless of how many queries you are running, and regardless whether all results have been gathered or not.
+If multiple queries are provided, they run concurrently. Response time is capped at 25 seconds from
+when the API request was received, regardless of how many queries you are running, and regardless
+whether all results have been gathered or not. This API does not return any results until the fixed
+time period elapses, at which point all of the collected results are returned.
+
+The fixed time period is configurable via environment variable on the Fleet server (eg.
+`FLEET_LIVE_QUERY_REST_PERIOD=90s`). If setting a higher value, be sure that you do not exceed your
+load balancer timeout.

 > WARNING: This API endpoint collects responses in-memory (RAM) on the Fleet compute instance handling this request, which can overflow if the result set is large enough.  This has the potential to crash the process and/or cause an autoscaling event in your cloud provider, depending on how Fleet is deployed.

@ -2489,8 +2496,8 @@ If you are using this API to run multiple queries at the same time, they are sta

 | Name      | Type   | In   | Description                                   |
 | --------- | ------ | ---- | --------------------------------------------- |
-| query_ids | array  | body | **Required**. The IDs of the queries to run as live queries.               |
-| host_ids  | array  | body | **Required**. The IDs of the hosts to run the live queries against. |
+| query_ids | array  | body | **Required**. The IDs of the saved queries to run. |
+| host_ids  | array  | body | **Required**. The IDs of the hosts to target. |

 #### Example

--- a/server/service/live_queries.go
+++ b/server/service/live_queries.go
@ -36,13 +36,15 @@ func (r runLiveQueryResponse) error() error { return r.Err }
 func runLiveQueryEndpoint(ctx context.Context, request interface{}, svc fleet.Service) (interface{}, error) {
 	req := request.(*runLiveQueryRequest)

+	// The period used here should always be less than the request timeout for any load
+	// balancer/proxy between Fleet and the API client.
 	period := os.Getenv("FLEET_LIVE_QUERY_REST_PERIOD")
 	if period == "" {
-		period = "90s"
+		period = "25s"
 	}
 	duration, err := time.ParseDuration(period)
 	if err != nil {
-		duration = 90 * time.Second
+		duration = 25 * time.Second
 		logging.WithExtras(ctx, "live_query_rest_period_err", err)
 	}
				`@ -0,0 +1 @@`
				* Reduce the default period of the "simple" live query API (`/api/v1/queries/run`) to 25 seconds to remain below load balancer timeouts.