fleet/orbit/pkg
Roberto Dip fa89dff66f
fix race in orbit test (#16589)
The scheduled test run
https://github.com/fleetdm/fleet/actions/runs/7764392848 failed with a
panic because `TestWindowsMDMEnrollmentPrevented` timed out:

```
2024-02-03T05:05:26.3041218Z === RUN   TestWindowsMDMEnrollmentPrevented
2024-02-03T05:05:26.3044251Z === RUN   TestWindowsMDMEnrollmentPrevented/{RenewEnrollmentProfile:false_RotateDiskEncryptionKey:false_NeedsMDMMigration:false_NeedsProgrammaticWindowsMDMEnrollment:true_WindowsMDMDiscoveryEndpoint:http://example.com/_NeedsProgrammaticWindowsMDMUnenrollment:false_PendingScriptExecutionIDs:[]_EnforceBitLockerEncryption:false}
2024-02-03T05:05:26.3047208Z coverage: 2.5% of statements in github.com/fleetdm/fleet/v4/...
2024-02-03T05:05:26.3047963Z panic: test timed out after 1h0m0s
2024-02-03T05:05:26.3048482Z running tests:
2024-02-03T05:05:26.3049005Z 	TestWindowsMDMEnrollmentPrevented (59m52s)
2024-02-03T05:05:26.3052172Z 	TestWindowsMDMEnrollmentPrevented/{RenewEnrollmentProfile:false_RotateDiskEncryptionKey:false_NeedsMDMMigration:false_NeedsProgrammaticWindowsMDMEnrollment:true_WindowsMDMDiscoveryEndpoint:http://example.com/_NeedsProgrammaticWindowsMDMUnenrollment:false_PendingScriptExecutionIDs:[]_EnforceBitLockerEncryption:false} (59m52s)
[...]
2024-02-03T05:05:26.3068624Z goroutine 69 [chan receive]:
2024-02-03T05:05:26.3069997Z github.com/fleetdm/fleet/v4/orbit/pkg/update.TestWindowsMDMEnrollmentPrevented.func2.1({{0xe3ada3, 0x12}, {0x0, 0x0}, {0xe37311, 0xc}})
2024-02-03T05:05:26.3072376Z 	/home/runner/work/fleet/fleet/orbit/pkg/update/notifications_test.go:295 +0x65
2024-02-03T05:05:26.3074514Z github.com/fleetdm/fleet/v4/orbit/pkg/update.(*windowsMDMEnrollmentConfigFetcher).attemptEnrollment(0xc0000f8cf0, {0x0, 0x0, 0x0, 0x1, {0xe3ada3, 0x12}, 0x0, {0x0, 0x0, ...}, ...})
```

I was able to reproduce locally 1/4th of the times, after putting the
following print statements:

```diff
                        if cfg.NeedsProgrammaticWindowsMDMEnrollment {
                                fetcher.execEnrollFn = func(args WindowsMDMEnrollmentArgs) error {
-                                       <-chProceed    // will be unblocked only when allowed
+                                       fmt.Println("fetcher.execEnrollFn A: ", apiCallCount)
+                                       <-chProceed // will be unblocked only when allowed
+                                       fmt.Println("fetcher.execEnrollFn B: ", apiCallCount)
                                        apiCallCount++ // no need for sync, single-threaded call of this func is guaranteed by the fetcher's mutex
                                        return apiErr
                                }
@@ -301,7 +303,9 @@ func TestWindowsMDMEnrollmentPrevented(t *testing.T) {
                                }
                        } else {
                                fetcher.execUnenrollFn = func(args WindowsMDMEnrollmentArgs) error {
-                                       <-chProceed    // will be unblocked only when allowed
+                                       fmt.Println("fetcher.execUnenrollFn A: ", apiCallCount)
+                                       <-chProceed // will be unblocked only when allowed
+                                       fmt.Println("fetcher.execUnenrollFn B: ", apiCallCount)
                                        apiCallCount++ // no need for sync, single-threaded call of this func is guaranteed by the fetcher's mutex
                                        return apiErr
                                }
@@ -317,23 +321,33 @@ func TestWindowsMDMEnrollmentPrevented(t *testing.T) {

                        started := make(chan struct{})
                        go func() {
+                               fmt.Println("before close started")
                                close(started)
+                               fmt.Println("aftre close started")

                                // the first call will block in enroll/unenroll func
+                               fmt.Println("before inner fetchergetconfig")
                                cfg, err := fetcher.GetConfig()
+                               fmt.Println("after inner fetchergetconfig")
                                assertResult(cfg, err)
                        }()

+                       fmt.Println("before started")
                        <-started
+                       fmt.Println("after started")
                        // this call will happen while the first call is blocked in
                        // enroll/unenrollfn, so it won't call the API (won't be able to lock the
                        // mutex). However it will still complete successfully without being
                        // blocked by the other call in progress.
+                       fmt.Println("before first fetchergetconfig")
                        cfg, err := fetcher.GetConfig()
+                       fmt.Println("before first fetchergetconfig")
                        assertResult(cfg, err)

                        // unblock the first call and wait for it to complete
+                       fmt.Println("before close chProceed 1")
                        close(chProceed)
+                       fmt.Println("after close chProceed 2")
                        time.Sleep(100 * time.Millisecond)
```

This is the output I've got every time the test hung:

```
before started
before close started
aftre close started
after started
before first fetchergetconfig
before inner fetchergetconfig
after inner fetchergetconfig
fetcher.execEnrollFn A:  0
```

And this is the output when the tests passed

```
before started
before close started
aftre close started
before inner fetchergetconfig
fetcher.execUnenrollFn A:  0
after started
before first fetchergetconfig
before first fetchergetconfig
before close chProceed 1
after close chProceed 2
fetcher.execUnenrollFn B:  0
after inner fetchergetconfig
fetcher.execUnenrollFn A:  1
fetcher.execUnenrollFn B:  1
```

Note how the deadlock occurs when `GetConfig` is called first outside of
the goroutine. I added some logic to prevent this, but I'm confident
there must be a better way to accomplish the same. cc: @mna you're the
king of concurrency, do you have any ideas?
2024-02-05 12:06:25 -03:00
..
augeas update augeas lense "simplevars.aug" (#12922) 2023-07-25 09:22:41 -07:00
bitlocker make sure we report the correct error during BitLocker encryption (#16096) 2024-01-15 12:31:15 -03:00
build Add user agent to Orbit HTTP client (#5429) 2022-05-02 11:03:49 -07:00
constant Remotely configure fleetd update channels (#15848) 2024-01-02 17:59:40 -03:00
cryptoinfo Add Kolide osquery tables 2023-11-01 20:11:35 -06:00
dataflatten Add Kolide osquery tables 2023-11-01 20:11:35 -06:00
execuser launch Nudge using /usr/bin/open (#10051) 2023-02-23 14:48:40 -03:00
go-paniclog Fix Fleet Desktop bugs on Windows (#16402) 2024-01-29 18:52:55 -03:00
insecure Make creation of http.Client uniform across the codebase (#3097) 2021-11-24 15:56:54 -05:00
keystore Enroll secret in macOS keychain and Windows Credential Manager (#16068) 2024-01-16 06:51:37 -06:00
logging environment variable to disable orbit enroll logs (#13519) 2023-08-25 15:25:07 -06:00
osquery Allow custom osquery database on fleetd (#16554) 2024-02-05 09:41:06 -03:00
osservice 8009 fleet desktop icon duplication (#8017) 2022-10-13 10:58:37 -03:00
packaging Allow custom osquery database on fleetd (#16554) 2024-02-05 09:41:06 -03:00
platform Fix Fleet Desktop bugs on Windows (#16402) 2024-01-29 18:52:55 -03:00
process Fleetctl to package .app bundles for osquery (and changes for orbit to support them) (#4393) 2022-03-15 16:04:12 -03:00
profiles Update fleetd for macOS hosts to look for custom end user email field in Fleet MDM enrollment profile (#15761) 2024-01-02 17:45:11 -03:00
scripts Queued scripts feature (#16300) 2024-01-29 11:37:54 -03:00
table Remotely configure fleetd update channels (#15848) 2024-01-02 17:59:40 -03:00
token Fixing fleetd to NOT make unnecessary duplicate call to orbit/device_token endpoint. (#15543) 2023-12-10 17:00:24 -06:00
update fix race in orbit test (#16589) 2024-02-05 12:06:25 -03:00
user fix: don't attempt to launch fleet desktop until the user is logged into GUI (#16090) 2024-01-17 10:00:28 -05:00
useraction Update end user migration copy (#14158) 2023-09-27 12:10:25 -04:00
windows Add Kolide osquery tables 2023-11-01 20:11:35 -06:00