Skip to content

fix(ci): Fix consistent iOS E2E flakiness on Cirrus Labs runners#5752

Draft
antonis wants to merge 9 commits intomainfrom
antonis/e2e-ios-flakiness-fix
Draft

fix(ci): Fix consistent iOS E2E flakiness on Cirrus Labs runners#5752
antonis wants to merge 9 commits intomainfrom
antonis/e2e-ios-flakiness-fix

Conversation

@antonis
Copy link
Contributor

@antonis antonis commented Mar 3, 2026

Summary

Fixes consistent iOS E2E test failures that started after the Cirrus Labs Tart VM migration (c1cade43, Feb 27). Every single iOS E2E run on main has failed since that commit.

Root Cause

The iOS E2E react-native-test job was moved from GitHub-hosted macos-26 runners to Cirrus Labs Tart VMs (ghcr.io/cirruslabs/macos-tahoe-xcode:26.2.0). Tart VMs use nested virtualisation, which makes iOS simulator operations significantly slower and less stable:

  • xcrun simctl launch takes ~23 seconds (vs <1s on bare metal)
  • Maestro's XCTest driver takes ~10-20 seconds to connect to port 7001
  • App launches intermittently fail — the app process exits before the JS bundle loads, causing Maestro to report "App crashed or stopped" despite the app briefly appearing on screen

This manifests as two distinct failure modes:

  1. XCTest driver instability — After the driver connects, the first few app launches often fail. Screenshots show the app stuck on the default React Native welcome screen before crashing.

  2. crash.yml cascade — After nativeCrash(), the bare launchApp caused Sentry to read the pending crash report and crash again within ~82ms, triggering iOS crash-loop protection that broke subsequent tests.

Note: PR #5735 (simulator-action v4→v5) is not the root cause — erase_before_boot: true was already the default in v4, and the failures predate that PR by 5 days.

Fixes

  1. wait_for_boot: true on simulator-action so the simulator is fully booted before proceeding
  2. erase_before_boot: false — each Maestro flow already reinstalls the app via clearState, so erasing the entire simulator is redundant overhead
  3. Simulator warm-up step — launches Settings.app so system services finish post-boot initialisation
  4. Retry the full Maestro suite up to 3 times — the most reliable mitigation for intermittent app-launch failures on nested-virtualisation VMs
  5. MAESTRO_DRIVER_STARTUP_TIMEOUT: 180000 (3 min) to accommodate slower XCTest driver startup
  6. clearState: true in crash.yml after the intentional nativeCrash() to prevent crash-loop cascade

Test plan

  • Verify iOS E2E tests pass on this PR (Test RN 0.84.0 legacy hermes ios production no and Test RN 0.84.0 new hermes ios production no)
  • If tests pass, merge to main and verify subsequent runs are stable

#skip-changelog

🤖 Generated with Claude Code

After the react-native-test job was moved from GitHub-hosted macos-26 to
Cirrus Labs Tart VMs (macos-tahoe-xcode:26.2.0), iOS simulators take longer
to fully boot in the new virtualised environment. With `wait_for_boot` defaulting
to false, Maestro was racing to connect before the simulator was ready, causing
different failures on each run.

- Add `wait_for_boot: true` to `futureware-tech/simulator-action` so the job
  blocks until the simulator has fully completed booting before Maestro connects.
- Bump `MAESTRO_DRIVER_STARTUP_TIMEOUT` from 120s to 180s to give additional
  headroom for the Cirrus Labs runner environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

Semver Impact of This PR

None (no version bump detected)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


This PR will not appear in the changelog.


🤖 This preview updates automatically when you update the PR.

@antonis antonis marked this pull request as draft March 3, 2026 14:27
@antonis antonis added the ready-to-merge Triggers the full CI test suite label Mar 3, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

Android (legacy) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 514.00 ms 522.84 ms 8.84 ms
Size 43.75 MiB 48.48 MiB 4.73 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
86584b7+dirty 463.83 ms 500.31 ms 36.48 ms
9a81842+dirty 412.23 ms 416.56 ms 4.33 ms
c637fc7+dirty 433.70 ms 467.76 ms 34.06 ms
d73150f+dirty 411.21 ms 465.86 ms 54.65 ms
fa7bb7e+dirty 350.37 ms 377.02 ms 26.65 ms
3bd3f0d+dirty 447.21 ms 472.31 ms 25.10 ms
88890fe+dirty 350.94 ms 365.74 ms 14.80 ms
95aaf8a 437.89 ms 419.45 ms -18.44 ms
c0842e7+dirty 527.76 ms 566.69 ms 38.93 ms
1e7a472+dirty 348.80 ms 362.55 ms 13.75 ms

App size

Revision Plain With Sentry Diff
86584b7+dirty 43.75 MiB 48.08 MiB 4.33 MiB
9a81842+dirty 43.75 MiB 48.08 MiB 4.33 MiB
c637fc7+dirty 43.75 MiB 48.40 MiB 4.64 MiB
d73150f+dirty 43.75 MiB 48.55 MiB 4.80 MiB
fa7bb7e+dirty 17.75 MiB 19.75 MiB 2.00 MiB
3bd3f0d+dirty 17.75 MiB 19.70 MiB 1.95 MiB
88890fe+dirty 17.75 MiB 19.71 MiB 1.96 MiB
95aaf8a 17.75 MiB 19.68 MiB 1.93 MiB
c0842e7+dirty 43.75 MiB 48.41 MiB 4.66 MiB
1e7a472+dirty 17.75 MiB 19.70 MiB 1.96 MiB

Previous results on branch: antonis/e2e-ios-flakiness-fix

Startup times

Revision Plain With Sentry Diff
e8c63cb+dirty 518.77 ms 549.79 ms 31.02 ms

App size

Revision Plain With Sentry Diff
e8c63cb+dirty 43.75 MiB 48.48 MiB 4.73 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

iOS (legacy) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1227.49 ms 1225.93 ms -1.56 ms
Size 3.38 MiB 4.79 MiB 1.41 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
ea3e26e+dirty 1229.13 ms 1228.46 ms -0.67 ms
80e4616+dirty 1221.32 ms 1225.64 ms 4.32 ms
818a608+dirty 1205.76 ms 1208.00 ms 2.24 ms
77061ed+dirty 1233.16 ms 1234.88 ms 1.71 ms
bef3709+dirty 1222.07 ms 1220.24 ms -1.83 ms
a206511+dirty 1185.00 ms 1186.35 ms 1.35 ms
74979ac+dirty 1210.49 ms 1213.31 ms 2.82 ms
a2bb688+dirty 1223.53 ms 1232.90 ms 9.37 ms
8a868fe+dirty 1221.50 ms 1230.78 ms 9.28 ms
d590428+dirty 1211.77 ms 1220.51 ms 8.75 ms

App size

Revision Plain With Sentry Diff
ea3e26e+dirty 3.41 MiB 4.58 MiB 1.17 MiB
80e4616+dirty 3.38 MiB 4.60 MiB 1.22 MiB
818a608+dirty 2.63 MiB 3.91 MiB 1.28 MiB
77061ed+dirty 2.63 MiB 3.98 MiB 1.34 MiB
bef3709+dirty 3.38 MiB 4.78 MiB 1.40 MiB
a206511+dirty 3.41 MiB 4.67 MiB 1.25 MiB
74979ac+dirty 3.38 MiB 4.60 MiB 1.22 MiB
a2bb688+dirty 2.63 MiB 3.99 MiB 1.36 MiB
8a868fe+dirty 3.38 MiB 4.60 MiB 1.22 MiB
d590428+dirty 3.38 MiB 4.78 MiB 1.39 MiB

Previous results on branch: antonis/e2e-ios-flakiness-fix

Startup times

Revision Plain With Sentry Diff
e8c63cb+dirty 1228.57 ms 1225.39 ms -3.19 ms

App size

Revision Plain With Sentry Diff
e8c63cb+dirty 3.38 MiB 4.79 MiB 1.41 MiB

antonis and others added 3 commits March 3, 2026 16:10
After crash.yml taps "Crash" (Sentry.nativeCrash()), the plain `launchApp`
(without clearState) causes the app to crash immediately on relaunch (~82ms)
because the Sentry SDK reads the pending crash report during initialisation
and hits a failure path. This writes a second crash report on top of the
first, triggering iOS's simulator crash-loop guard for the bundle ID.

The cascade:
1. nativeCrash → crash report #1 written
2. launchApp (no clearState) → app crashes on startup → crash report #2
3. Next test (captureMessage) gets the crash-loop ban → instant exit on launch

Fix: add `clearState: true` to the post-crash launchApp so Maestro
reinstalls the app, clearing both the crash report and the crash-loop state
before assertTestReady runs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… VMs

The iOS E2E tests have been consistently failing since the migration to
Cirrus Labs Tart VMs (c1cade4). The nested virtualisation makes the
simulator slower to stabilise, causing Maestro's XCTest driver to lose
communication with the app on first launches.

Two fixes:
1. Set erase_before_boot: false — each Maestro flow already reinstalls
   the app via clearState, so erasing the entire simulator is redundant
   and adds overhead that destabilises the simulator on Tart VMs.
2. Add a warm-up step that launches and terminates Settings.app so that
   SpringBoard and other system services finish post-boot initialisation
   before Maestro connects.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Cirrus Labs Tart VMs intermittently fail individual app launches —
the app process exits before the JS bundle finishes loading, causing
Maestro to report "App crashed or stopped". A single retry of the
full suite is the most reliable way to absorb this flakiness.

Also increased the warmup sleep from 3s to 5s to give SpringBoard
more time to settle on the slow nested-virtualisation runners.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

Android (new) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 363.98 ms 406.06 ms 42.08 ms
Size 43.94 MiB 49.36 MiB 5.42 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
7480abe+dirty 363.80 ms 431.34 ms 67.54 ms
2b89ce9+dirty 372.22 ms 417.06 ms 44.84 ms
170d5ea+dirty 348.79 ms 406.94 ms 58.15 ms
b1579bc+dirty 391.87 ms 456.26 ms 64.39 ms
73f2455+dirty 369.33 ms 398.90 ms 29.57 ms
0b64753+dirty 358.55 ms 429.16 ms 70.61 ms
6a70a7e+dirty 382.45 ms 424.54 ms 42.09 ms
2adbd1e+dirty 366.13 ms 419.49 ms 53.36 ms
f8d19f8+dirty 374.17 ms 383.40 ms 9.23 ms
7be1f99+dirty 369.02 ms 399.60 ms 30.58 ms

App size

Revision Plain With Sentry Diff
7480abe+dirty 7.15 MiB 8.41 MiB 1.26 MiB
2b89ce9+dirty 7.15 MiB 8.41 MiB 1.26 MiB
170d5ea+dirty 7.15 MiB 8.42 MiB 1.27 MiB
b1579bc+dirty 43.94 MiB 49.27 MiB 5.33 MiB
73f2455+dirty 43.94 MiB 48.82 MiB 4.88 MiB
0b64753+dirty 7.15 MiB 8.42 MiB 1.27 MiB
6a70a7e+dirty 7.15 MiB 8.42 MiB 1.26 MiB
2adbd1e+dirty 7.15 MiB 8.43 MiB 1.28 MiB
f8d19f8+dirty 43.94 MiB 48.91 MiB 4.97 MiB
7be1f99+dirty 7.15 MiB 8.42 MiB 1.27 MiB

Previous results on branch: antonis/e2e-ios-flakiness-fix

Startup times

Revision Plain With Sentry Diff
e8c63cb+dirty 358.04 ms 400.36 ms 42.32 ms

App size

Revision Plain With Sentry Diff
e8c63cb+dirty 43.94 MiB 49.35 MiB 5.41 MiB

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

iOS (new) Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1211.54 ms 1214.67 ms 3.12 ms
Size 3.38 MiB 4.79 MiB 1.41 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
ea3e26e+dirty 1216.61 ms 1214.15 ms -2.47 ms
80e4616+dirty 1206.90 ms 1205.94 ms -0.96 ms
818a608+dirty 1218.84 ms 1223.18 ms 4.34 ms
77061ed+dirty 1210.77 ms 1218.45 ms 7.68 ms
bef3709+dirty 1217.79 ms 1225.33 ms 7.54 ms
a206511+dirty 1225.02 ms 1223.74 ms -1.28 ms
74979ac+dirty 1212.33 ms 1212.54 ms 0.21 ms
a2bb688+dirty 1244.82 ms 1238.60 ms -6.22 ms
8a868fe+dirty 1206.85 ms 1215.04 ms 8.19 ms
d590428+dirty 1221.23 ms 1225.27 ms 4.03 ms

App size

Revision Plain With Sentry Diff
ea3e26e+dirty 3.41 MiB 4.58 MiB 1.17 MiB
80e4616+dirty 3.38 MiB 4.60 MiB 1.22 MiB
818a608+dirty 3.19 MiB 4.48 MiB 1.29 MiB
77061ed+dirty 3.19 MiB 4.54 MiB 1.36 MiB
bef3709+dirty 3.38 MiB 4.78 MiB 1.40 MiB
a206511+dirty 3.41 MiB 4.67 MiB 1.25 MiB
74979ac+dirty 3.38 MiB 4.60 MiB 1.22 MiB
a2bb688+dirty 3.19 MiB 4.56 MiB 1.37 MiB
8a868fe+dirty 3.38 MiB 4.60 MiB 1.22 MiB
d590428+dirty 3.38 MiB 4.78 MiB 1.39 MiB

Previous results on branch: antonis/e2e-ios-flakiness-fix

Startup times

Revision Plain With Sentry Diff
e8c63cb+dirty 1218.35 ms 1222.40 ms 4.05 ms

App size

Revision Plain With Sentry Diff
e8c63cb+dirty 3.38 MiB 4.79 MiB 1.41 MiB

Instead of retrying the entire test suite, run each flow file
individually with up to 3 attempts.  This is more effective because
different flows fail randomly on Tart VMs — retrying only the failed
flow is faster and avoids re-running flows that already passed.

The CLI now:
1. Lists all .yml files in the maestro/ directory
2. Runs each flow with `maestro test <flow.yml>`
3. On failure, retries the same flow up to 2 more times
4. Prints a summary of all results at the end

Removes the suite-level retry wrapper from the workflow since
per-flow retries in the CLI are more targeted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
antonis and others added 3 commits March 3, 2026 19:32
Address CodeQL finding by using execFileSync with an argument array
instead of execSync with a template string. This avoids shell
interpolation of filesystem-sourced flow file names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor Author

@antonis antonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I've opened a separate PR for the Sample e2e flakiness #5755

@antonis antonis marked this pull request as ready for review March 4, 2026 08:35
# their post-boot initialisation before Maestro tries to connect.
xcrun simctl launch booted com.apple.Preferences
sleep 5
xcrun simctl terminate booted com.apple.Preferences
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation on http://github.com/getsentry/sentry-react-native/pull/5755/changes#diff-c5263171d4b9aca7af53d70a3ee9f8423c7f4bebd6941e590c5610208f95c8bcR310 is different.
Sleep time is diffferent and we ignore when it fails.
Which pattern should we follow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warm-up step is best-effort and should not fail the build if
the Preferences app fails to launch or terminate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lucas-zimerman
Copy link
Collaborator

No issues so far, LGTM! once CI pass.

if (!sentryAuthToken) {
console.log('Skipping maestro test due to unavailable or empty SENTRY_AUTH_TOKEN');
} else {
const maxAttempts = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about it because it basically is an attempt to fix the issue with flakiness by trying it three times to run the same test. I also wonder if it works considering that the flakiness is consistent + it doesn't seem like it guarantees to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point @alwx 👍
I don't like this much either because it's a workaround. I don't have other ideas on this at this point though. It seems that the tests have started failing randomly (not always the same test fails) thus the retries attempt to overcome this by making individual automatic test retries rather than manually rerunning the whole test.

@antonis antonis added the Blocked label Mar 4, 2026
@antonis antonis marked this pull request as draft March 4, 2026 11:24
@antonis antonis removed the ready-to-merge Triggers the full CI test suite label Mar 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Fails
🚫 Pull request is not ready for merge, please add the "ready-to-merge" label to the pull request

Generated by 🚫 dangerJS against 4d9b775

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants