fix(ci): Fix consistent iOS E2E flakiness on Cirrus Labs runners#5752
fix(ci): Fix consistent iOS E2E flakiness on Cirrus Labs runners#5752
Conversation
After the react-native-test job was moved from GitHub-hosted macos-26 to Cirrus Labs Tart VMs (macos-tahoe-xcode:26.2.0), iOS simulators take longer to fully boot in the new virtualised environment. With `wait_for_boot` defaulting to false, Maestro was racing to connect before the simulator was ready, causing different failures on each run. - Add `wait_for_boot: true` to `futureware-tech/simulator-action` so the job blocks until the simulator has fully completed booting before Maestro connects. - Bump `MAESTRO_DRIVER_STARTUP_TIMEOUT` from 120s to 180s to give additional headroom for the Cirrus Labs runner environment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Semver Impact of This PR⚪ None (no version bump detected) 📋 Changelog PreviewThis is how your changes will appear in the changelog. This PR will not appear in the changelog. 🤖 This preview updates automatically when you update the PR. |
Android (legacy) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 86584b7+dirty | 463.83 ms | 500.31 ms | 36.48 ms |
| 9a81842+dirty | 412.23 ms | 416.56 ms | 4.33 ms |
| c637fc7+dirty | 433.70 ms | 467.76 ms | 34.06 ms |
| d73150f+dirty | 411.21 ms | 465.86 ms | 54.65 ms |
| fa7bb7e+dirty | 350.37 ms | 377.02 ms | 26.65 ms |
| 3bd3f0d+dirty | 447.21 ms | 472.31 ms | 25.10 ms |
| 88890fe+dirty | 350.94 ms | 365.74 ms | 14.80 ms |
| 95aaf8a | 437.89 ms | 419.45 ms | -18.44 ms |
| c0842e7+dirty | 527.76 ms | 566.69 ms | 38.93 ms |
| 1e7a472+dirty | 348.80 ms | 362.55 ms | 13.75 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 86584b7+dirty | 43.75 MiB | 48.08 MiB | 4.33 MiB |
| 9a81842+dirty | 43.75 MiB | 48.08 MiB | 4.33 MiB |
| c637fc7+dirty | 43.75 MiB | 48.40 MiB | 4.64 MiB |
| d73150f+dirty | 43.75 MiB | 48.55 MiB | 4.80 MiB |
| fa7bb7e+dirty | 17.75 MiB | 19.75 MiB | 2.00 MiB |
| 3bd3f0d+dirty | 17.75 MiB | 19.70 MiB | 1.95 MiB |
| 88890fe+dirty | 17.75 MiB | 19.71 MiB | 1.96 MiB |
| 95aaf8a | 17.75 MiB | 19.68 MiB | 1.93 MiB |
| c0842e7+dirty | 43.75 MiB | 48.41 MiB | 4.66 MiB |
| 1e7a472+dirty | 17.75 MiB | 19.70 MiB | 1.96 MiB |
iOS (legacy) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 1229.13 ms | 1228.46 ms | -0.67 ms |
| 80e4616+dirty | 1221.32 ms | 1225.64 ms | 4.32 ms |
| 818a608+dirty | 1205.76 ms | 1208.00 ms | 2.24 ms |
| 77061ed+dirty | 1233.16 ms | 1234.88 ms | 1.71 ms |
| bef3709+dirty | 1222.07 ms | 1220.24 ms | -1.83 ms |
| a206511+dirty | 1185.00 ms | 1186.35 ms | 1.35 ms |
| 74979ac+dirty | 1210.49 ms | 1213.31 ms | 2.82 ms |
| a2bb688+dirty | 1223.53 ms | 1232.90 ms | 9.37 ms |
| 8a868fe+dirty | 1221.50 ms | 1230.78 ms | 9.28 ms |
| d590428+dirty | 1211.77 ms | 1220.51 ms | 8.75 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 3.41 MiB | 4.58 MiB | 1.17 MiB |
| 80e4616+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| 818a608+dirty | 2.63 MiB | 3.91 MiB | 1.28 MiB |
| 77061ed+dirty | 2.63 MiB | 3.98 MiB | 1.34 MiB |
| bef3709+dirty | 3.38 MiB | 4.78 MiB | 1.40 MiB |
| a206511+dirty | 3.41 MiB | 4.67 MiB | 1.25 MiB |
| 74979ac+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| a2bb688+dirty | 2.63 MiB | 3.99 MiB | 1.36 MiB |
| 8a868fe+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| d590428+dirty | 3.38 MiB | 4.78 MiB | 1.39 MiB |
After crash.yml taps "Crash" (Sentry.nativeCrash()), the plain `launchApp` (without clearState) causes the app to crash immediately on relaunch (~82ms) because the Sentry SDK reads the pending crash report during initialisation and hits a failure path. This writes a second crash report on top of the first, triggering iOS's simulator crash-loop guard for the bundle ID. The cascade: 1. nativeCrash → crash report #1 written 2. launchApp (no clearState) → app crashes on startup → crash report #2 3. Next test (captureMessage) gets the crash-loop ban → instant exit on launch Fix: add `clearState: true` to the post-crash launchApp so Maestro reinstalls the app, clearing both the crash report and the crash-loop state before assertTestReady runs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… VMs The iOS E2E tests have been consistently failing since the migration to Cirrus Labs Tart VMs (c1cade4). The nested virtualisation makes the simulator slower to stabilise, causing Maestro's XCTest driver to lose communication with the app on first launches. Two fixes: 1. Set erase_before_boot: false — each Maestro flow already reinstalls the app via clearState, so erasing the entire simulator is redundant and adds overhead that destabilises the simulator on Tart VMs. 2. Add a warm-up step that launches and terminates Settings.app so that SpringBoard and other system services finish post-boot initialisation before Maestro connects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cirrus Labs Tart VMs intermittently fail individual app launches — the app process exits before the JS bundle finishes loading, causing Maestro to report "App crashed or stopped". A single retry of the full suite is the most reliable way to absorb this flakiness. Also increased the warmup sleep from 3s to 5s to give SpringBoard more time to settle on the slow nested-virtualisation runners. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Android (new) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 7480abe+dirty | 363.80 ms | 431.34 ms | 67.54 ms |
| 2b89ce9+dirty | 372.22 ms | 417.06 ms | 44.84 ms |
| 170d5ea+dirty | 348.79 ms | 406.94 ms | 58.15 ms |
| b1579bc+dirty | 391.87 ms | 456.26 ms | 64.39 ms |
| 73f2455+dirty | 369.33 ms | 398.90 ms | 29.57 ms |
| 0b64753+dirty | 358.55 ms | 429.16 ms | 70.61 ms |
| 6a70a7e+dirty | 382.45 ms | 424.54 ms | 42.09 ms |
| 2adbd1e+dirty | 366.13 ms | 419.49 ms | 53.36 ms |
| f8d19f8+dirty | 374.17 ms | 383.40 ms | 9.23 ms |
| 7be1f99+dirty | 369.02 ms | 399.60 ms | 30.58 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 7480abe+dirty | 7.15 MiB | 8.41 MiB | 1.26 MiB |
| 2b89ce9+dirty | 7.15 MiB | 8.41 MiB | 1.26 MiB |
| 170d5ea+dirty | 7.15 MiB | 8.42 MiB | 1.27 MiB |
| b1579bc+dirty | 43.94 MiB | 49.27 MiB | 5.33 MiB |
| 73f2455+dirty | 43.94 MiB | 48.82 MiB | 4.88 MiB |
| 0b64753+dirty | 7.15 MiB | 8.42 MiB | 1.27 MiB |
| 6a70a7e+dirty | 7.15 MiB | 8.42 MiB | 1.26 MiB |
| 2adbd1e+dirty | 7.15 MiB | 8.43 MiB | 1.28 MiB |
| f8d19f8+dirty | 43.94 MiB | 48.91 MiB | 4.97 MiB |
| 7be1f99+dirty | 7.15 MiB | 8.42 MiB | 1.27 MiB |
iOS (new) Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 1216.61 ms | 1214.15 ms | -2.47 ms |
| 80e4616+dirty | 1206.90 ms | 1205.94 ms | -0.96 ms |
| 818a608+dirty | 1218.84 ms | 1223.18 ms | 4.34 ms |
| 77061ed+dirty | 1210.77 ms | 1218.45 ms | 7.68 ms |
| bef3709+dirty | 1217.79 ms | 1225.33 ms | 7.54 ms |
| a206511+dirty | 1225.02 ms | 1223.74 ms | -1.28 ms |
| 74979ac+dirty | 1212.33 ms | 1212.54 ms | 0.21 ms |
| a2bb688+dirty | 1244.82 ms | 1238.60 ms | -6.22 ms |
| 8a868fe+dirty | 1206.85 ms | 1215.04 ms | 8.19 ms |
| d590428+dirty | 1221.23 ms | 1225.27 ms | 4.03 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| ea3e26e+dirty | 3.41 MiB | 4.58 MiB | 1.17 MiB |
| 80e4616+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| 818a608+dirty | 3.19 MiB | 4.48 MiB | 1.29 MiB |
| 77061ed+dirty | 3.19 MiB | 4.54 MiB | 1.36 MiB |
| bef3709+dirty | 3.38 MiB | 4.78 MiB | 1.40 MiB |
| a206511+dirty | 3.41 MiB | 4.67 MiB | 1.25 MiB |
| 74979ac+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| a2bb688+dirty | 3.19 MiB | 4.56 MiB | 1.37 MiB |
| 8a868fe+dirty | 3.38 MiB | 4.60 MiB | 1.22 MiB |
| d590428+dirty | 3.38 MiB | 4.78 MiB | 1.39 MiB |
Instead of retrying the entire test suite, run each flow file individually with up to 3 attempts. This is more effective because different flows fail randomly on Tart VMs — retrying only the failed flow is faster and avoids re-running flows that already passed. The CLI now: 1. Lists all .yml files in the maestro/ directory 2. Runs each flow with `maestro test <flow.yml>` 3. On failure, retries the same flow up to 2 more times 4. Prints a summary of all results at the end Removes the suite-level retry wrapper from the workflow since per-flow retries in the CLI are more targeted. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address CodeQL finding by using execFileSync with an argument array instead of execSync with a template string. This avoids shell interpolation of filesystem-sourced flow file names. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
.github/workflows/e2e-v2.yml
Outdated
| # their post-boot initialisation before Maestro tries to connect. | ||
| xcrun simctl launch booted com.apple.Preferences | ||
| sleep 5 | ||
| xcrun simctl terminate booted com.apple.Preferences |
There was a problem hiding this comment.
The implementation on http://github.com/getsentry/sentry-react-native/pull/5755/changes#diff-c5263171d4b9aca7af53d70a3ee9f8423c7f4bebd6941e590c5610208f95c8bcR310 is different.
Sleep time is diffferent and we ignore when it fails.
Which pattern should we follow?
There was a problem hiding this comment.
- PR fix(ci): Fix consistent iOS E2E flakiness on Cirrus Labs runners #5752 (e2e-v2.yml): Added || true to both xcrun simctl commands so the warm-up is best-effort
- PR fix(ci): Fix Sample Application E2E test flakiness #5755 (sample-application.yml): Aligned comment and increased sleep 3 to sleep 5 to match e2e-v2.yml
The warm-up step is best-effort and should not fail the build if the Preferences app fails to launch or terminate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No issues so far, LGTM! once CI pass. |
| if (!sentryAuthToken) { | ||
| console.log('Skipping maestro test due to unavailable or empty SENTRY_AUTH_TOKEN'); | ||
| } else { | ||
| const maxAttempts = 3; |
There was a problem hiding this comment.
I have mixed feelings about it because it basically is an attempt to fix the issue with flakiness by trying it three times to run the same test. I also wonder if it works considering that the flakiness is consistent + it doesn't seem like it guarantees to work.
There was a problem hiding this comment.
That's a good point @alwx 👍
I don't like this much either because it's a workaround. I don't have other ideas on this at this point though. It seems that the tests have started failing randomly (not always the same test fails) thus the retries attempt to overcome this by making individual automatic test retries rather than manually rerunning the whole test.
Summary
Fixes consistent iOS E2E test failures that started after the Cirrus Labs Tart VM migration (
c1cade43, Feb 27). Every single iOS E2E run onmainhas failed since that commit.Root Cause
The iOS E2E
react-native-testjob was moved from GitHub-hostedmacos-26runners to Cirrus Labs Tart VMs (ghcr.io/cirruslabs/macos-tahoe-xcode:26.2.0). Tart VMs use nested virtualisation, which makes iOS simulator operations significantly slower and less stable:xcrun simctl launchtakes ~23 seconds (vs <1s on bare metal)This manifests as two distinct failure modes:
XCTest driver instability — After the driver connects, the first few app launches often fail. Screenshots show the app stuck on the default React Native welcome screen before crashing.
crash.yml cascade — After
nativeCrash(), the barelaunchAppcaused Sentry to read the pending crash report and crash again within ~82ms, triggering iOS crash-loop protection that broke subsequent tests.Note: PR #5735 (simulator-action v4→v5) is not the root cause —
erase_before_boot: truewas already the default in v4, and the failures predate that PR by 5 days.Fixes
wait_for_boot: trueon simulator-action so the simulator is fully booted before proceedingerase_before_boot: false— each Maestro flow already reinstalls the app viaclearState, so erasing the entire simulator is redundant overheadMAESTRO_DRIVER_STARTUP_TIMEOUT: 180000(3 min) to accommodate slower XCTest driver startupclearState: truein crash.yml after the intentionalnativeCrash()to prevent crash-loop cascadeTest plan
Test RN 0.84.0 legacy hermes ios production noandTest RN 0.84.0 new hermes ios production no)mainand verify subsequent runs are stable#skip-changelog
🤖 Generated with Claude Code