fleet/orbit/pkg/update
Roberto Dip fa89dff66f
fix race in orbit test (#16589)
The scheduled test run
https://github.com/fleetdm/fleet/actions/runs/7764392848 failed with a
panic because `TestWindowsMDMEnrollmentPrevented` timed out:

```
2024-02-03T05:05:26.3041218Z === RUN   TestWindowsMDMEnrollmentPrevented
2024-02-03T05:05:26.3044251Z === RUN   TestWindowsMDMEnrollmentPrevented/{RenewEnrollmentProfile:false_RotateDiskEncryptionKey:false_NeedsMDMMigration:false_NeedsProgrammaticWindowsMDMEnrollment:true_WindowsMDMDiscoveryEndpoint:http://example.com/_NeedsProgrammaticWindowsMDMUnenrollment:false_PendingScriptExecutionIDs:[]_EnforceBitLockerEncryption:false}
2024-02-03T05:05:26.3047208Z coverage: 2.5% of statements in github.com/fleetdm/fleet/v4/...
2024-02-03T05:05:26.3047963Z panic: test timed out after 1h0m0s
2024-02-03T05:05:26.3048482Z running tests:
2024-02-03T05:05:26.3049005Z 	TestWindowsMDMEnrollmentPrevented (59m52s)
2024-02-03T05:05:26.3052172Z 	TestWindowsMDMEnrollmentPrevented/{RenewEnrollmentProfile:false_RotateDiskEncryptionKey:false_NeedsMDMMigration:false_NeedsProgrammaticWindowsMDMEnrollment:true_WindowsMDMDiscoveryEndpoint:http://example.com/_NeedsProgrammaticWindowsMDMUnenrollment:false_PendingScriptExecutionIDs:[]_EnforceBitLockerEncryption:false} (59m52s)
[...]
2024-02-03T05:05:26.3068624Z goroutine 69 [chan receive]:
2024-02-03T05:05:26.3069997Z github.com/fleetdm/fleet/v4/orbit/pkg/update.TestWindowsMDMEnrollmentPrevented.func2.1({{0xe3ada3, 0x12}, {0x0, 0x0}, {0xe37311, 0xc}})
2024-02-03T05:05:26.3072376Z 	/home/runner/work/fleet/fleet/orbit/pkg/update/notifications_test.go:295 +0x65
2024-02-03T05:05:26.3074514Z github.com/fleetdm/fleet/v4/orbit/pkg/update.(*windowsMDMEnrollmentConfigFetcher).attemptEnrollment(0xc0000f8cf0, {0x0, 0x0, 0x0, 0x1, {0xe3ada3, 0x12}, 0x0, {0x0, 0x0, ...}, ...})
```

I was able to reproduce locally 1/4th of the times, after putting the
following print statements:

```diff
                        if cfg.NeedsProgrammaticWindowsMDMEnrollment {
                                fetcher.execEnrollFn = func(args WindowsMDMEnrollmentArgs) error {
-                                       <-chProceed    // will be unblocked only when allowed
+                                       fmt.Println("fetcher.execEnrollFn A: ", apiCallCount)
+                                       <-chProceed // will be unblocked only when allowed
+                                       fmt.Println("fetcher.execEnrollFn B: ", apiCallCount)
                                        apiCallCount++ // no need for sync, single-threaded call of this func is guaranteed by the fetcher's mutex
                                        return apiErr
                                }
@@ -301,7 +303,9 @@ func TestWindowsMDMEnrollmentPrevented(t *testing.T) {
                                }
                        } else {
                                fetcher.execUnenrollFn = func(args WindowsMDMEnrollmentArgs) error {
-                                       <-chProceed    // will be unblocked only when allowed
+                                       fmt.Println("fetcher.execUnenrollFn A: ", apiCallCount)
+                                       <-chProceed // will be unblocked only when allowed
+                                       fmt.Println("fetcher.execUnenrollFn B: ", apiCallCount)
                                        apiCallCount++ // no need for sync, single-threaded call of this func is guaranteed by the fetcher's mutex
                                        return apiErr
                                }
@@ -317,23 +321,33 @@ func TestWindowsMDMEnrollmentPrevented(t *testing.T) {

                        started := make(chan struct{})
                        go func() {
+                               fmt.Println("before close started")
                                close(started)
+                               fmt.Println("aftre close started")

                                // the first call will block in enroll/unenroll func
+                               fmt.Println("before inner fetchergetconfig")
                                cfg, err := fetcher.GetConfig()
+                               fmt.Println("after inner fetchergetconfig")
                                assertResult(cfg, err)
                        }()

+                       fmt.Println("before started")
                        <-started
+                       fmt.Println("after started")
                        // this call will happen while the first call is blocked in
                        // enroll/unenrollfn, so it won't call the API (won't be able to lock the
                        // mutex). However it will still complete successfully without being
                        // blocked by the other call in progress.
+                       fmt.Println("before first fetchergetconfig")
                        cfg, err := fetcher.GetConfig()
+                       fmt.Println("before first fetchergetconfig")
                        assertResult(cfg, err)

                        // unblock the first call and wait for it to complete
+                       fmt.Println("before close chProceed 1")
                        close(chProceed)
+                       fmt.Println("after close chProceed 2")
                        time.Sleep(100 * time.Millisecond)
```

This is the output I've got every time the test hung:

```
before started
before close started
aftre close started
after started
before first fetchergetconfig
before inner fetchergetconfig
after inner fetchergetconfig
fetcher.execEnrollFn A:  0
```

And this is the output when the tests passed

```
before started
before close started
aftre close started
before inner fetchergetconfig
fetcher.execUnenrollFn A:  0
after started
before first fetchergetconfig
before first fetchergetconfig
before close chProceed 1
after close chProceed 2
fetcher.execUnenrollFn B:  0
after inner fetchergetconfig
fetcher.execUnenrollFn A:  1
fetcher.execUnenrollFn B:  1
```

Note how the deadlock occurs when `GetConfig` is called first outside of
the goroutine. I added some logic to prevent this, but I'm confident
there must be a better way to accomplish the same. cc: @mna you're the
king of concurrency, do you have any ideas?
2024-02-05 12:06:25 -03:00
..
badgerstore Update go-tuf dependency (#3837) 2022-02-10 08:16:36 -08:00
filestore test: use T.TempDir to create temporary test directory (#6080) 2022-06-13 10:20:38 -03:00
config_fetcher.go Enable installation and auto-updates of Nudge via Orbit (#9605) 2023-02-10 17:03:43 -03:00
disk_encryption.go allow to rotate disk encryption key from My Device (#10592) 2023-03-20 16:14:07 -03:00
execcmd_darwin.go don't automatically kickstart softwareupdated in Orbit (#12072) 2023-06-02 12:33:40 -03:00
execcmd_stub.go don't automatically kickstart softwareupdated in Orbit (#12072) 2023-06-02 12:33:40 -03:00
execcmd.go Kickstart sofwareupdated periodically from fleetd/orbit to work around a macOS bug (#9465) 2023-01-24 10:14:17 -05:00
execwinapi_stub.go Merging Bitlocker feature branch (#14350) 2023-10-06 19:04:33 -03:00
execwinapi_windows.go Merging Bitlocker feature branch (#14350) 2023-10-06 19:04:33 -03:00
execwinapi.go use OrbitNodeKey for windows mdm enrollment authentication instead of HostUUID (#13503) 2023-08-29 14:50:13 +01:00
file.go Add 'orbit/' from commit 'ab3047bb39f1e2be331d1ff18b4eb768619033c4' 2021-08-04 16:58:25 -03:00
flag_runner_test.go create and send Nudge configuration to hosts (#9491) 2023-01-25 17:03:40 -03:00
flag_runner.go Downgrade osquery-go due to panics in Shutdown and add more logging (#15017) 2023-11-13 18:29:45 -03:00
hash_test.go chore: remove refs to deprecated io/ioutil (#14485) 2023-10-27 15:28:54 -03:00
hash.go Fix update checks for orbit at startup (#3835) 2022-02-23 14:58:07 -03:00
notifications_test.go fix race in orbit test (#16589) 2024-02-05 12:06:25 -03:00
notifications.go attempt to decrypt the disk before performing a BitLocker encryption (#16097) 2024-01-16 12:45:23 -03:00
nudge_test.go Add backoff functionality for fleetd updates (#15489) 2023-12-08 19:43:56 -03:00
nudge.go device_token endpoint improvements (#15849) 2023-12-28 14:20:36 -06:00
options_darwin.go fix SELinux issue (#5335) 2022-05-02 12:18:59 -06:00
options_linux.go fix SELinux issue (#5335) 2022-05-02 12:18:59 -06:00
options_windows.go Fleetctl to package .app bundles for osquery (and changes for orbit to support them) (#4393) 2022-03-15 16:04:12 -03:00
options.go add migration support to FD and orbit (#11741) 2023-05-18 14:21:54 -03:00
runner_test.go Fixing tests due to known exec after write Linux issue. (#16243) 2024-01-21 10:40:41 -06:00
runner.go Fixing tests due to known exec after write Linux issue. (#16243) 2024-01-21 10:40:41 -06:00
swift_dialog_test.go prevent panic when orbit is run with updates disabled (#12654) 2023-07-06 14:43:10 -03:00
swift_dialog.go device_token endpoint improvements (#15849) 2023-12-28 14:20:36 -06:00
testing_utils.go Enable installation and auto-updates of Nudge via Orbit (#9605) 2023-02-10 17:03:43 -03:00
update_test.go add migration support to FD and orbit (#11741) 2023-05-18 14:21:54 -03:00
update.go Add backoff functionality for fleetd updates (#15489) 2023-12-08 19:43:56 -03:00