refactor the filter code in the garnet provider #808
refactor the filter code in the garnet provider #808hailangx merged 2 commits intometajack/diskann-garnetfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors Garnet’s filtered-search label handling by moving GarnetQueryLabelProvider from a raw-pointer-based implementation to a slice-borrowing design, and simplifies the filtered search path in the dynamic index wrapper.
Changes:
- Refactored
GarnetQueryLabelProviderto store a borrowed&[u8]and updatedis_set()to use safe bounds-checked access. - Simplified
DynIndex::search_vectorby removing theFilteredSearchResultswrapper and passing the output buffer through directly. - Updated filter types to use
GarnetQueryLabelProvider<'static>in DynIndex APIs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
diskann-garnet/src/labels.rs |
Refactors bitmap label provider to use slice-based storage and safer lookup logic; adds 'static raw-pointer constructor. |
diskann-garnet/src/dyn_index.rs |
Removes the filtered output wrapper and simplifies filtered search call path; updates filter type signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
diskann-garnet/src/labels.rs
Outdated
| @@ -73,49 +59,42 @@ | |||
| /// of this struct. | |||
| pub unsafe fn from_raw(data: *const u8, len: usize) -> Self { | |||
| if data.is_null() || len == 0 { | |||
| return Self { | |||
| data: std::ptr::null(), | |||
| len: 0, | |||
| }; | |||
| return Self { data: &[] }; | |||
| } | |||
| Self { | |||
| data: unsafe { std::slice::from_raw_parts(data, len) }, | |||
| } | |||
| Self { data, len } | |||
| } | |||
There was a problem hiding this comment.
from_raw constructs a &'static [u8] via slice::from_raw_parts, but the pointed-to memory is owned by the FFI caller and is only guaranteed to live for the duration of the search call. Creating an 'static reference to non-'static memory is unsound (can enable UB via lifetime-based optimizations), even if the value is dropped shortly after.
Safer options here are (a) store *const u8 + len (as the old implementation did) and do checked reads without ever materializing a Rust reference with an inflated lifetime, or (b) copy into owned storage (e.g., Arc<[u8]>) so the data is truly 'static for the Arc<dyn QueryLabelProvider<_>> used by BetaFilter.
| InplaceDeleteMethod, SearchOutputBuffer, glue::SearchStrategy, index::SearchStats, search, | ||
| }, | ||
| graph::{InplaceDeleteMethod, glue::SearchStrategy, index::SearchStats, search}, | ||
| provider::{Accessor, DataProvider}, |
There was a problem hiding this comment.
DataProvider is no longer used in this file after removing FilteredSearchResults, but it’s still imported. CI runs with -Dwarnings, so this unused import will fail the build; please remove it (and keep Accessor which is still needed for get_element).
| provider::{Accessor, DataProvider}, | |
| provider::Accessor, |
| #[derive(Clone)] | ||
| pub struct GarnetQueryLabelProvider<'a> { | ||
| data: &'a [u8], | ||
| } |
There was a problem hiding this comment.
Apologies, I don't think I was entirely clear on the feedback in #800, I meant that it's reasonable to have a lifetime in the backend use of Strategies, not that GarnetQueryLabelProvder needs a lifetime. Without that change, adding a lifetime bound to GarnetQueryLabelProvider does not really solve anything, especially since we need it to be 'static to use it anyways. And this actually feels worse since it forces the materialization of a &'static [u8]. At least when there is a raw pointer involved, programmers are primed to think "something is going on".
Without making changes to the core DiskANN to accept non-'static types, I suggest we keep this as-is.
There was a problem hiding this comment.
yeah, to remove the static we need to remove it from the query computer trait.
I don't see much difference regarding the raw pointers when initially it we also have the unsafe convert from raw pointers in this version but just have to mark it under static trick. And when remove the static from the core DiskANN would it be a nature change here to remove the static
There was a problem hiding this comment.
@copilot can you help revert this GarnetQueryLabelProvider to original version
### Removal of Obsolete Wrapper * Removed the `FilteredSearchResults` wrapper, which previously handled conversion between internal and external IDs for search results. This is no longer needed with previous change to expose generic external id in the beta search strategy. --------- Co-authored-by: Haiyang Xu <haixu@microsoft.com>
Label Provider Refactor
GarnetQueryLabelProviderfrom a raw pointer-based struct to a lifetime-parameterized struct that borrows a slice.Removal of Obsolete Wrapper
FilteredSearchResultswrapper, which previously handled conversion between internal and external IDs for search results. This is no longer needed with previous change to expose generic external id in the beta search strategy.