fleet/infrastructure/sandbox
Zachary Winnerman 82ba1a00a2
Demo packaging (#7020)
* checkin for testing

* Initial work on packaging, still need to configure fleet to use it

* Add the terraform stuff for installers

* Add iam permissions for packaging

* Add environment variables for installers to fleet

* Implement review fixes

* Add an extra state for provisioned, but not ready for customers

* Add secretsmanager stuff for apple

* fixup

* fixup

* Bugfixes

* fixup

* fixup and added some stuff to the readdme

* Add link to openapi.json in readme
2022-08-05 11:41:41 -04:00
..
JITProvisioner Change fleet org to "Fleet Sandbox" (#7042) 2022-08-03 16:53:27 -04:00
Monitoring Fix monitoring IAM permissions (#6926) 2022-07-27 15:46:36 -04:00
PreProvisioner Demo packaging (#7020) 2022-08-05 11:41:41 -04:00
SharedInfrastructure Demo packaging (#7020) 2022-08-05 11:41:41 -04:00
.gitignore Fleet Sandbox (#5079) 2022-07-19 13:56:53 -05:00
.terraform.lock.hcl Demo packaging (#7020) 2022-08-05 11:41:41 -04:00
backend-prod.conf Fleet Sandbox (#5079) 2022-07-19 13:56:53 -05:00
main.tf Demo packaging (#7020) 2022-08-05 11:41:41 -04:00
readme.md Demo packaging (#7020) 2022-08-05 11:41:41 -04:00

Terraform for the Fleet Demo Environment

This folder holds the infrastructure code for Fleet's demo environment.

This readme itself is intended for infrastructure developers. If you aren't an infrastructure developer, please see https://sandbox.fleetdm.com/openapi.json for documentation.

Instance state machine

provisioned -> unclaimed -> claimed -> [destroyed]

provisioned means an instance was "terraform apply'ed" but no installers were generated. unclaimed means its ready for a customer. claimed means its already in use by a customer. [destroyed] isn't a state you'll see in dynamodb, but it means that everything has been torn down.

Bugs

  1. module.shared-infrastructure.kubernetes_manifest.targetgroupbinding is bugged sometimes, if it gives issues just comment it out
  2. on a fresh apply, module.shared-infrastructure.aws_acm_certificate.main will have to be targeted first, then a normal apply can follow
  3. If errors happen, see if applying again will fix it
  4. There is a secret for apple signing whos values are not provided by this code. If you destroy/apply this secret, then it will have to be filled in manually.

Maintenance commands

Referesh fleet instances

for i in $(aws dynamodb scan --table-name sandbox-prod-lifecycle | jq -r '.Items[] | select(.State.S == "unclaimed") | .ID.S'); do helm uninstall $i; aws dynamodb delete-item --table-name sandbox-prod-lifecycle --key "{\"ID\": {\"S\": \"${i}\"}}"; done

Cleanup instances that are running, but not tracked

for i in $((aws dynamodb scan --table-name sandbox-prod-lifecycle | jq -r '.Items[] | .ID.S'; aws dynamodb scan --table-name sandbox-prod-lifecycle | jq -r '.Items[] | .ID.S'; helm list | tail -n +2 | cut -f 1) | sort | uniq -u); do helm uninstall $i; done

Cleanup instances that failed to provision

for i in $(aws dynamodb scan --table-name sandbox-prod-lifecycle | jq -r '.Items[] | select(.State.S == "provisioned") | .ID.S'); do helm uninstall $i; aws dynamodb delete-item --table-name sandbox-prod-lifecycle --key "{\"ID\": {\"S\": \"${i}\"}}"; done

TODOs

  1. JITProvisioner needs to return proper errors
  2. Create and use a different kms key for installers
  3. Sane scale levels for prod
  4. Allow for parallel spinup of sandbox instances (preprovisioner)
  5. https://redis.io/commands/flushdb/ during the teardown process
  6. name state machines something random and track the new name in dynamodb