fix: TOCTOU race in and_compute_with under multi-threaded tokio#1
Open
Squadrick wants to merge 1 commit intoanthropics:anthropic-0.12.13from
Open
fix: TOCTOU race in and_compute_with under multi-threaded tokio#1Squadrick wants to merge 1 commit intoanthropics:anthropic-0.12.13from
Squadrick wants to merge 1 commit intoanthropics:anthropic-0.12.13from
Conversation
In try_compute and try_compute_if_nobody_else, the waiter was removed from the waiter map (via set_waiter_value(ReadyNone)) BEFORE the actual cache mutation (insert_with_hash/invalidate_with_hash). This created a window where a concurrent and_compute_with caller could: 1. Successfully insert its own waiter (the first was already removed) 2. Read the cache and see the OLD value (first caller hasn't written yet) 3. Both callers execute Op::Put based on the same stale data This caused silent data loss under concurrent access with a multi-threaded tokio runtime (e.g. 10-15% of increments lost in a concurrent counter test with 8 writers). The fix defers set_waiter_value(ReadyNone) to after the cache mutation in each match arm, ensuring the waiter remains in the map (blocking concurrent callers) until the cache is updated. Contrast with try_init_or_read (used by get_with), which already does this correctly: it calls insert_with_hash BEFORE set_waiter_value.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In try_compute and try_compute_if_nobody_else, the waiter was removed from the waiter map (via set_waiter_value(ReadyNone)) BEFORE the actual cache mutation (insert_with_hash/invalidate_with_hash). This created a window where a concurrent and_compute_with caller could:
This caused silent data loss under concurrent access with a multi-threaded tokio runtime (e.g. 10-15% of increments lost in a concurrent counter test with 8 writers).
The fix defers set_waiter_value(ReadyNone) to after the cache mutation in each match arm, ensuring the waiter remains in the map (blocking concurrent callers) until the cache is updated.
Contrast with try_init_or_read (used by get_with), which already does this correctly: it calls insert_with_hash BEFORE set_waiter_value.