There are many ways to deploy Fleet. Last time, we looked at deploying [Fleet on Render](https://fleetdm.com/deploy/deploying-fleet-on-render). This time, we’re going to deploy Fleet on AWS with Terraform IaC (infrastructure as code).
- Clone [Fleet](https://github.com/fleetdm/fleet) or copy the [Terraform files](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws)
To bootstrap our [remote state](https://www.terraform.io/docs/language/state/remote.html) resources, we’ll create a S3 bucket and DynamoDB table using the values defined in `remote-state/main.tf`. We'll override the `prefix` terraform variable to get unique resources and the `region` variable to use the same region set in the AWS CLI (you can run `aws configure get region` to see what region is set).
After typing `yes` you should have a new S3 bucket named `<your_org>-fleet-terraform-remote-state` And the table `<your_org>-fleet-terraform-state-lock`. Keep these handy because we’ll need them in the following steps.
You may see a warning during this process. It is safe to ignore.
By default, both RDS & Elasticache are encrypted at rest and encrypted in transit. The S3 buckets are also server-side encrypted using AWS managed KMS keys.
### Networking
For more details on the networking configuration take a look at https://github.com/terraform-aws-modules/terraform-aws-vpc. In the configuration Fleet provides
we are creating public and private subnets in addition to separate data layer for RDS and Elasticache. The configuration also defaults
to using a single NAT Gateway.
### Backups
RDS daily snapshots are enabled by default and retention is set to 30 days. If there is ever a need a snapshot identifier can be supplied via terraform variable (`rds_initial_snapshot`)
in order to create the database from a previous snapshot.
Next, we’ll update the terraform setup in the `/aws` directory's [main.tf](https://github.com/fleetdm/fleet/tree/main/infrastructure/dogfood/terraform/aws/main.tf) to use the S3 Bucket and DynamoDB referenced above:
We’ll also need a `tfvars` file to make some environment-specific variable overrides. Create a file in the `/aws` directory named `prod.tfvars`, and copy/paste the variables below:
Feel free to use whatever values you would like for the `osquery_results_s3_bucket` and `osquery_status_s3_bucket`. Just keep in mind that they need to be unique across AWS. We're setting the initial capacity for `fleet` to `0` to prevent the fleet service from attempting to start until setup is complete. Note that your AWS CLI region should be set to the same region you intend to provision the resources. All regions are compatible.
You should see the planned output, and you will need to confirm the creation. Review this output, and type `yes` when you are ready. Note this will take up to 30 minutes to apply.
During this process, terraform will create a `hosted zone` with an `NS` record for your domain and request a certificate from [AWS Certificate Manager (ACM)](https://aws.amazon.com/certificate-manager/). While the process is running, you'll need to add the `NS` records to your domain as well.
Let’s say we own `queryops.com` and have an ACM certificate issued to it. We want to host Fleet at `fleet.queryops.com` so in this case, we’ll need to hand nameserver authority over to `fleet.queryops.com` before ACM will verify via DNS and issue the certificate. To make this work, we need to create an `NS` record on `queryops.com`, and put the same `NS` records that get created after terraform creates the `fleet.queryops.com` hosted zone.
Replace `<backend_security_group_id>`, `<ecs_cluster_name>`, and `<fleet-migration-task-revision>` with the corresponding values from the previous output. Replace `<private_subnet_id>` with one of the private subnets from the previous output.
At this point, you can go to your Fleet domain and start [using Fleet](https://fleetdm.com/docs/using-fleet).
## Start the Fleet service
Now that Fleet has everything it needs, we're ready to start the service.
First, we'll need to edit our production variables to increase Fleet's capacity and allow the service to start. In the `prod.tvars` file, update `fleet_min_capacity` and `fleet_max_capacity`:
Once the process completes, your Fleet instance is ready to use! Check out the documentation for more details on [using Fleet](https://fleetdm.com/docs/using-fleet).
Setting up all the required infrastructure to run a dedicated web service in AWS can be a daunting task. The Fleet team’s goal is to provide a solid base to build from. As most AWS environments have their own specific needs and requirements, this base is intended to be modified and tailored to your specific needs.
- if the migration task has not run successfully before the Fleet backend attempts to start it will cause the container to repeatedly fail and this can exceed docker pull request rate limits
- scale down the fleet backend to zero tasks and let the pull request limit reset, this can take from 15 minutes to an hour
- attempt to run migrations and then scale the Fleet backend back up
It is possible to run into multiple AWS scaling limitations depending on the size of the Fleet deployment, frequency of queries, and amount of data returned.
The Fleet backend is designed to scale horizontally (this is also enabled by default using target-tracking autoscaling policies out-of-the-box).
However, it is still possible to run into AWS scaling limitations such as:
This particular issue would only be encountered for the largest of Fleet deployments and can occur because of high volume of data and/or number of hosts, if you notice these errors in the application logs or from the AWS Firehose console try the following:
1. Check the service limits https://docs.aws.amazon.com/firehose/latest/dev/limits.html
2. evaluate the amount of data returned using Fleet's live query feature
3. reduce the frequency of scheduled queries
4. reduce the amount of data returned for scheduled queries (Snapshot vs Differential queries https://osquery.readthedocs.io/en/stable/deployment/logging/)
More troubleshooting tips can be found here https://fleetdm.com/docs/deploying/faq