mirror of
https://github.com/empayre/fleet.git
synced 2024-11-06 08:55:24 +00:00
Reference Architectures (#3712)
Added reference architectures using https://docs.gitlab.com/ee/administration/reference_architectures/ as inspiration. - updated terraform based on some feedback of usage - pinned fleet docker version in terraform as to no get unexpected upgrades when applying - updated some documentation around apply migration tasks
This commit is contained in:
parent
9dd6968c5d
commit
d650423be0
302
docs/02-Deploying/06-Reference-Architectures.md
Normal file
302
docs/02-Deploying/06-Reference-Architectures.md
Normal file
@ -0,0 +1,302 @@
|
||||
# Reference Architectures
|
||||
|
||||
You can easily run Fleet on a single VPS that would be capable of supporting hundreds if not thousands of hosts, but
|
||||
this page details an [opinionated view](https://github.com/fleetdm/fleet/tree/main/tools/terraform) of running Fleet in a production environment, as
|
||||
well as different configuration strategies to enable High Availability (HA).
|
||||
|
||||
## Availability Components
|
||||
|
||||
There are a few strategies that can be used to ensure high availability:
|
||||
- Database HA
|
||||
- Traffic load balancing
|
||||
|
||||
### Database HA
|
||||
|
||||
Fleet recommends RDS Aurora MySQL when running on AWS. More details about backups/snapshots can be found
|
||||
[here](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.Managing.Backups.html). It is also
|
||||
possible to dynamically scale read replicas to increase performance and [enable database fail-over](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Concepts.AuroraHighAvailability.html).
|
||||
It is also possible to use [Aurora Global](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-global-database.html) to
|
||||
span multiple regions for more advanced configurations(_not included in the [reference terraform](https://github.com/fleetdm/fleet/tree/main/tools/terraform)_).
|
||||
|
||||
In some cases adding a read replica can increase database performance for specific access patterns. In scenarios when automating the API or with `fleetctl`
|
||||
there can be benefits to read performance.
|
||||
|
||||
### Traffic load balancing
|
||||
Load balancing enables distributing request traffic over many instances of the backend application. Using AWS Application
|
||||
Load Balancer can also [offload SSL termination](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/create-https-listener.html), freeing Fleet to spend the majority of it's allocated compute dedicated
|
||||
to its core functionality. More details about ALB can be found [here](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html).
|
||||
|
||||
_**Note if using [terraform reference architecture](https://github.com/fleetdm/fleet/tree/main/tools/terraform#terraform) all configurations can dynamically scale based on load(cpu/memory) and all configurations
|
||||
assume On-Demand pricing (savings are available through Reserved Instances). Calculations do not take into account NAT gateway charges or other networking related ingress/egress costs.**_
|
||||
|
||||
### Example Configuration breakpoints
|
||||
#### [Up to 1000 hosts](https://calculator.aws/#/estimate?id=ae7d7ddec64bb979f3f6611d23616b1dff0e8dbd)
|
||||
|
||||
| Fleet instances | CPU Units | RAM |
|
||||
|-----------------|---------------|-----|
|
||||
| 1 Fargate task | 512 CPU Units | 4GB |
|
||||
|
||||
| Dependencies | Version | Instance type |
|
||||
|--------------|-------------------------|---------------|
|
||||
| Redis | 6 | t4g.small |
|
||||
| MySQL | 5.7.mysql_aurora.2.10.0 | db.t3.small |
|
||||
|
||||
#### [Up to 25000 hosts](https://calculator.aws/#/estimate?id=4a3e3168275967d1e79a3d1fcfedc5b17d67a271)
|
||||
|
||||
| Fleet instances | CPU Units | RAM |
|
||||
|-----------------|---------------|-----|
|
||||
| 10 Fargate task | 1024 CPU Units | 4GB |
|
||||
|
||||
| Dependencies | Version | Instance type |
|
||||
|--------------|-------------------------|---------------|
|
||||
| Redis | 6 | m6g.large |
|
||||
| MySQL | 5.7.mysql_aurora.2.10.0 | db.r6g.large |
|
||||
|
||||
|
||||
#### [Up to 150000 hosts](https://calculator.aws/#/estimate?id=6a852ef873c0902f0c953045dec3e29fcd32aef8)
|
||||
|
||||
| Fleet instances | CPU Units | RAM |
|
||||
|-----------------|----------------|-----|
|
||||
| 30 Fargate task | 1024 CPU Units | 4GB |
|
||||
|
||||
| Dependencies | Version | Instance type | Nodes |
|
||||
|--------------|-------------------------|----------------|-------|
|
||||
| Redis | 6 | m6g.large | 3 |
|
||||
| MySQL | 5.7.mysql_aurora.2.10.0 | db.m6g.8xlarge | 1 |
|
||||
|
||||
|
||||
## Cloud Providers
|
||||
|
||||
### AWS
|
||||
|
||||
AWS reference architecture can be found [here](https://github.com/fleetdm/fleet/tree/main/tools/terraform). This configuration includes:
|
||||
|
||||
- VPC
|
||||
- Subnets
|
||||
- Public & Private
|
||||
- ACLs
|
||||
- Security Groups
|
||||
- ECS as the container orchestrator
|
||||
- Fargate for underlying compute
|
||||
- Task roles via IAM
|
||||
- RDS Aurora MySQL 5.7
|
||||
- Elasticache Redis Engine
|
||||
- Firehose osquery log destination
|
||||
- S3 bucket sync to allow further ingestion/processing
|
||||
- [Monitoring via Cloudwatch alarms](https://github.com/fleetdm/fleet/tree/main/tools/terraform/monitoring)
|
||||
|
||||
Some AWS services used in the provider reference architecture are billed as pay-per-use such as Firehose. This means that osquery scheduled query frequency can have
|
||||
a direct correlation to how much these services cost, something to keep in mind when configuring Fleet in AWS.
|
||||
|
||||
#### AWS Terraform CI/CD IAM Permissions
|
||||
The following permissions are the minimum required to apply AWS terraform resources:
|
||||
```json
|
||||
{
|
||||
"Version": "2012-10-17",
|
||||
"Statement": [
|
||||
{
|
||||
"Effect": "Allow",
|
||||
"Action": [
|
||||
"ec2:*",
|
||||
"cloudwatch:*",
|
||||
"s3:*",
|
||||
"lambda:*",
|
||||
"ecs:*",
|
||||
"rds:*",
|
||||
"rds-data:*",
|
||||
"secretsmanager:*",
|
||||
"pi:*",
|
||||
"ecr:*",
|
||||
"iam:*",
|
||||
"aps:*",
|
||||
"vpc:*",
|
||||
"kms:*",
|
||||
"elasticloadbalancing:*",
|
||||
"ce:*",
|
||||
"cur:*",
|
||||
"logs:*",
|
||||
"cloudformation:*",
|
||||
"ssm:*",
|
||||
"sns:*",
|
||||
"elasticache:*",
|
||||
"application-autoscaling:*",
|
||||
"acm:*",
|
||||
"route53:*",
|
||||
"dynamodb:*",
|
||||
"kinesis:*",
|
||||
"firehose:*"
|
||||
],
|
||||
"Resource": "*"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### GCP
|
||||
|
||||
Coming soon
|
||||
|
||||
### Azure
|
||||
|
||||
Coming soon
|
||||
|
||||
### Render
|
||||
|
||||
Using [Render's IAC](https://render.com/docs/infrastructure-as-code) see [the repository](https://github.com/edwardsb/fleet-on-render) for full details.
|
||||
```yaml
|
||||
services:
|
||||
- name: fleet
|
||||
plan: standard
|
||||
type: web
|
||||
env: docker
|
||||
healthCheckPath: /healthz
|
||||
envVars:
|
||||
- key: FLEET_MYSQL_ADDRESS
|
||||
fromService:
|
||||
name: fleet-mysql
|
||||
type: pserv
|
||||
property: hostport
|
||||
- key: FLEET_MYSQL_DATABASE
|
||||
fromService:
|
||||
name: fleet-mysql
|
||||
type: pserv
|
||||
envVarKey: MYSQL_DATABASE
|
||||
- key: FLEET_MYSQL_PASSWORD
|
||||
fromService:
|
||||
name: fleet-mysql
|
||||
type: pserv
|
||||
envVarKey: MYSQL_PASSWORD
|
||||
- key: FLEET_MYSQL_USERNAME
|
||||
fromService:
|
||||
name: fleet-mysql
|
||||
type: pserv
|
||||
envVarKey: MYSQL_USER
|
||||
- key: FLEET_REDIS_ADDRESS
|
||||
fromService:
|
||||
name: fleet-redis
|
||||
type: pserv
|
||||
property: hostport
|
||||
- key: FLEET_SERVER_TLS
|
||||
value: false
|
||||
- key: PORT
|
||||
value: 8080
|
||||
|
||||
- name: fleet-mysql
|
||||
type: pserv
|
||||
env: docker
|
||||
repo: https://github.com/render-examples/mysql
|
||||
branch: mysql-5
|
||||
disk:
|
||||
name: mysql
|
||||
mountPath: /var/lib/mysql
|
||||
sizeGB: 10
|
||||
envVars:
|
||||
- key: MYSQL_DATABASE
|
||||
value: fleet
|
||||
- key: MYSQL_PASSWORD
|
||||
generateValue: true
|
||||
- key: MYSQL_ROOT_PASSWORD
|
||||
generateValue: true
|
||||
- key: MYSQL_USER
|
||||
value: fleet
|
||||
|
||||
- name: fleet-redis
|
||||
type: pserv
|
||||
env: docker
|
||||
repo: https://github.com/render-examples/redis
|
||||
disk:
|
||||
name: redis
|
||||
mountPath: /var/lib/redis
|
||||
sizeGB: 10
|
||||
```
|
||||
|
||||
### Digital Ocean
|
||||
|
||||
Using Digital Ocean's [App Spec](https://docs.digitalocean.com/products/app-platform/concepts/app-spec/) to deploy on the App on the [App Platform](https://docs.digitalocean.com/products/app-platform/)
|
||||
```yaml
|
||||
alerts:
|
||||
- rule: DEPLOYMENT_FAILED
|
||||
- rule: DOMAIN_FAILED
|
||||
databases:
|
||||
- cluster_name: fleet-redis
|
||||
engine: REDIS
|
||||
name: fleet-redis
|
||||
production: true
|
||||
version: "6"
|
||||
- cluster_name: fleet-mysql
|
||||
db_name: fleet
|
||||
db_user: fleet
|
||||
engine: MYSQL
|
||||
name: fleet-mysql
|
||||
production: true
|
||||
version: "8"
|
||||
domains:
|
||||
- domain: demo.fleetdm.com
|
||||
type: PRIMARY
|
||||
envs:
|
||||
- key: FLEET_MYSQL_ADDRESS
|
||||
scope: RUN_TIME
|
||||
value: ${fleet-mysql.HOSTNAME}:${fleet-mysql.PORT}
|
||||
- key: FLEET_MYSQL_PASSWORD
|
||||
scope: RUN_TIME
|
||||
value: ${fleet-mysql.PASSWORD}
|
||||
- key: FLEET_MYSQL_USERNAME
|
||||
scope: RUN_TIME
|
||||
value: ${fleet-mysql.USERNAME}
|
||||
- key: FLEET_MYSQL_DATABASE
|
||||
scope: RUN_TIME
|
||||
value: ${fleet-mysql.DATABASE}
|
||||
- key: FLEET_REDIS_ADDRESS
|
||||
scope: RUN_TIME
|
||||
value: ${fleet-redis.HOSTNAME}:${fleet-redis.PORT}
|
||||
- key: FLEET_SERVER_TLS
|
||||
scope: RUN_AND_BUILD_TIME
|
||||
value: "false"
|
||||
- key: FLEET_REDIS_PASSWORD
|
||||
scope: RUN_AND_BUILD_TIME
|
||||
value: ${fleet-redis.PASSWORD}
|
||||
- key: FLEET_REDIS_USE_TLS
|
||||
scope: RUN_AND_BUILD_TIME
|
||||
value: "true"
|
||||
jobs:
|
||||
- envs:
|
||||
- key: DATABASE_URL
|
||||
scope: RUN_TIME
|
||||
value: ${fleet-redis.DATABASE_URL}
|
||||
image:
|
||||
registry: fleetdm
|
||||
registry_type: DOCKER_HUB
|
||||
repository: fleet
|
||||
tag: latest
|
||||
instance_count: 1
|
||||
instance_size_slug: basic-xs
|
||||
kind: PRE_DEPLOY
|
||||
name: fleet-migrate
|
||||
run_command: fleet prepare --no-prompt=true db
|
||||
source_dir: /
|
||||
name: fleet
|
||||
region: nyc
|
||||
services:
|
||||
- envs:
|
||||
- key: FLEET_VULNERABILITIES_DATABASES_PATH
|
||||
scope: RUN_TIME
|
||||
value: /home/fleet
|
||||
- key: FLEET_BETA_SOFTWARE_INVENTORY
|
||||
scope: RUN_TIME
|
||||
value: "1"
|
||||
health_check:
|
||||
http_path: /healthz
|
||||
http_port: 8080
|
||||
image:
|
||||
registry: fleetdm
|
||||
registry_type: DOCKER_HUB
|
||||
repository: fleet
|
||||
tag: latest
|
||||
instance_count: 1
|
||||
instance_size_slug: basic-xs
|
||||
name: fleet
|
||||
routes:
|
||||
- path: /
|
||||
run_command: fleet serve
|
||||
source_dir: /
|
||||
```
|
@ -13,9 +13,11 @@ terraform {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
provider "aws" {
|
||||
region = "us-east-2"
|
||||
}
|
||||
|
||||
data "aws_caller_identity" "current" {}
|
||||
data "aws_region" "current" {}
|
||||
|
||||
@ -278,7 +280,7 @@ resource "aws_cloudwatch_metric_alarm" "redis-current-connections" {
|
||||
alarm_name = "redis-current-connections-${each.key}-${terraform.workspace}"
|
||||
alarm_description = "Redis current connections for node ${each.key}"
|
||||
comparison_operator = "LessThanLowerOrGreaterThanUpperThreshold"
|
||||
evaluation_periods = "3"
|
||||
evaluation_periods = "5"
|
||||
threshold_metric_id = "e1"
|
||||
alarm_actions = [aws_sns_topic.cloudwatch_alarm_topic.arn]
|
||||
ok_actions = [aws_sns_topic.cloudwatch_alarm_topic.arn]
|
||||
@ -286,7 +288,7 @@ resource "aws_cloudwatch_metric_alarm" "redis-current-connections" {
|
||||
|
||||
metric_query {
|
||||
id = "e1"
|
||||
expression = "ANOMALY_DETECTION_BAND(m1)"
|
||||
expression = "ANOMALY_DETECTION_BAND(m1,20)"
|
||||
label = "Current Connections (Expected)"
|
||||
return_data = "true"
|
||||
}
|
||||
@ -297,7 +299,7 @@ resource "aws_cloudwatch_metric_alarm" "redis-current-connections" {
|
||||
metric {
|
||||
metric_name = "CurrConnections"
|
||||
namespace = "AWS/ElastiCache"
|
||||
period = "300"
|
||||
period = "600"
|
||||
stat = "Average"
|
||||
unit = "Count"
|
||||
|
||||
|
@ -1,23 +1,7 @@
|
||||
resource "aws_route53_zone" "dogfood_fleetctl_com" {
|
||||
name = var.domain_fleetctl
|
||||
}
|
||||
|
||||
resource "aws_route53_zone" "dogfood_fleetdm_com" {
|
||||
name = var.domain_fleetdm
|
||||
}
|
||||
|
||||
resource "aws_route53_record" "dogfood_fleetctl_com" {
|
||||
zone_id = aws_route53_zone.dogfood_fleetctl_com.zone_id
|
||||
name = var.domain_fleetctl
|
||||
type = "A"
|
||||
|
||||
alias {
|
||||
name = aws_alb.main.dns_name
|
||||
zone_id = aws_alb.main.zone_id
|
||||
evaluate_target_health = false
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_route53_record" "dogfood_fleetdm_com" {
|
||||
zone_id = aws_route53_zone.dogfood_fleetdm_com.zone_id
|
||||
name = var.domain_fleetdm
|
||||
@ -30,15 +14,6 @@ resource "aws_route53_record" "dogfood_fleetdm_com" {
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_acm_certificate" "dogfood_fleetctl_com" {
|
||||
domain_name = var.domain_fleetctl
|
||||
validation_method = "DNS"
|
||||
|
||||
lifecycle {
|
||||
create_before_destroy = true
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_acm_certificate" "dogfood_fleetdm_com" {
|
||||
domain_name = var.domain_fleetdm
|
||||
validation_method = "DNS"
|
||||
@ -48,23 +23,6 @@ resource "aws_acm_certificate" "dogfood_fleetdm_com" {
|
||||
}
|
||||
}
|
||||
|
||||
resource "aws_route53_record" "dogfood_fleetctl_com_validation" {
|
||||
for_each = {
|
||||
for dvo in aws_acm_certificate.dogfood_fleetctl_com.domain_validation_options : dvo.domain_name => {
|
||||
name = dvo.resource_record_name
|
||||
record = dvo.resource_record_value
|
||||
type = dvo.resource_record_type
|
||||
}
|
||||
}
|
||||
|
||||
allow_overwrite = true
|
||||
name = each.value.name
|
||||
records = [each.value.record]
|
||||
ttl = 60
|
||||
type = each.value.type
|
||||
zone_id = aws_route53_zone.dogfood_fleetctl_com.zone_id
|
||||
}
|
||||
|
||||
resource "aws_route53_record" "dogfood_fleetdm_com_validation" {
|
||||
for_each = {
|
||||
for dvo in aws_acm_certificate.dogfood_fleetdm_com.domain_validation_options : dvo.domain_name => {
|
||||
@ -82,11 +40,6 @@ resource "aws_route53_record" "dogfood_fleetdm_com_validation" {
|
||||
zone_id = aws_route53_zone.dogfood_fleetdm_com.zone_id
|
||||
}
|
||||
|
||||
resource "aws_acm_certificate_validation" "dogfood_fleetctl_com" {
|
||||
certificate_arn = aws_acm_certificate.dogfood_fleetctl_com.arn
|
||||
validation_record_fqdns = [for record in aws_route53_record.dogfood_fleetctl_com_validation : record.fqdn]
|
||||
}
|
||||
|
||||
resource "aws_acm_certificate_validation" "dogfood_fleetdm_com" {
|
||||
certificate_arn = aws_acm_certificate.dogfood_fleetdm_com.arn
|
||||
validation_record_fqdns = [for record in aws_route53_record.dogfood_fleetdm_com_validation : record.fqdn]
|
||||
|
@ -66,21 +66,11 @@ Replace `cert_arn` with the **certificate ARN** that applies to your environment
|
||||
|
||||
### Migrating the DB
|
||||
|
||||
After applying terraform run the following to migrate the database:
|
||||
After applying terraform run the following to migrate the database(`<private_subnet_id>` and `<desired_security_group>` can be obtained from the terraform output after applying, any value will suffice):
|
||||
```
|
||||
aws ecs run-task --cluster fleet-backend --task-definition fleet-migrate:<latest_version> --launch-type FARGATE --network-configuration "awsvpcConfiguration={subnets=[<private_subnet_id>],securityGroups=[<desired_security_group>]}"
|
||||
```
|
||||
|
||||
### Connecting a Host
|
||||
### Conecting a host
|
||||
|
||||
Build orbit:
|
||||
|
||||
```
|
||||
fleetctl package --type=msi --fleet-url=<alb_dns> --enroll-secret=<secret>
|
||||
```
|
||||
|
||||
Run orbit:
|
||||
|
||||
```
|
||||
"C:\Program Files\Orbit\bin\orbit\orbit.exe" --root-dir "C:\Program Files\Orbit\." --log-file "C:\Program Files\Orbit\orbit-log.txt" --fleet-url "http://<alb_dns>" --enroll-secret-path "C:\Program Files\Orbit\secret.txt" --update-url "https://tuf.fleetctl.com" --orbit-channel "stable" --osqueryd-channel "stable"
|
||||
```
|
||||
Use your Route53 entry as your `fleet-url` [following these details.](https://fleetdm.com/docs/using-fleet/adding-hosts)
|
@ -60,7 +60,7 @@ variable "database_name" {
|
||||
|
||||
variable "fleet_image" {
|
||||
description = "the name of the container image to run"
|
||||
default = "fleetdm/fleet"
|
||||
default = "fleetdm/fleet:v4.8.0"
|
||||
}
|
||||
|
||||
variable "software_inventory" {
|
||||
|
Loading…
Reference in New Issue
Block a user