Conversation
There was a problem hiding this comment.
Pull request overview
This PR imports two new crates into the diskann workspace:
- diskann-garnet: A DataProvider implementation for the Garnet cache service with FFI bindings
- vectorset: A benchmarking CLI tool for Redis/Garnet vector set workloads
Changes:
- Adds FFI-based integration between DiskANN and Garnet cache service supporting u8 and f32 full-precision vector sets with cosine distance
- Implements free space map for ID management, label filtering via bitmaps, and comprehensive test coverage
- Provides vectorset benchmarking utility for ingesting vectors, running similarity queries, and measuring recall
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| Cargo.toml | Adds diskann-garnet and vectorset as workspace members |
| diskann-garnet/Cargo.toml | Defines diskann-garnet crate as cdylib with dependencies |
| diskann-garnet/src/lib.rs | Main FFI interface with unsafe C exports for index creation, insertion, search, and deletion |
| diskann-garnet/src/provider.rs | GarnetProvider implementing DataProvider trait with accessor patterns |
| diskann-garnet/src/garnet.rs | Garnet callback wrappers and types for read/write/delete/RMW operations |
| diskann-garnet/src/fsm.rs | Free space map for ID allocation and reuse tracking |
| diskann-garnet/src/labels.rs | Bitmap-based label filtering for filtered vector search |
| diskann-garnet/src/dyn_index.rs | Type-erased DynIndex trait for runtime polymorphism |
| diskann-garnet/src/alloc.rs | Custom 8-byte aligned allocator for Garnet data |
| diskann-garnet/src/test_utils.rs | Test utilities providing in-memory storage callbacks |
| diskann-garnet/src/ffi_tests.rs | Comprehensive FFI tests for all operations |
| diskann-garnet/diskann-garnet.nuspec | NuGet package specification |
| vectorset/Cargo.toml | Defines vectorset CLI tool with Redis and Azure dependencies |
| vectorset/src/main.rs | CLI application with ping, ingest, delete, and query commands |
| vectorset/src/loader.rs | Dataset loading utilities supporting batched iteration and multiple files |
| vectorset/config.toml.example | Example configuration file |
| Cargo.lock | Dependency resolution with many new transitive dependencies |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (65.09%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #800 +/- ##
==========================================
- Coverage 89.47% 88.85% -0.62%
==========================================
Files 432 442 +10
Lines 79367 81802 +2435
==========================================
+ Hits 71012 72687 +1675
- Misses 8355 9115 +760
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Hey @metajack - one quick question for you. Are we publishing |
That is what |
23342f1 to
66c9ebc
Compare
66c9ebc to
4b23e22
Compare
diskann-garnet/src/dyn_index.rs
Outdated
| /// to a `GarnetId` via `to_external_id` and written into the underlying buffer. | ||
| /// Start point (internal ID 0) is skipped since it has no external ID mapping. | ||
| /// | ||
| /// TODO: Once diskann-providers >= 0.45.0 exports BetaAccessor/BetaComputer/Unwrap, |
There was a problem hiding this comment.
Can we track this TODO? I think it's possible to move the ID translation out of the output buffer now.
There was a problem hiding this comment.
yes, this can be removed since we are now in the diskann repo , I can merge it into this branch or after this merged #808
There was a problem hiding this comment.
Whatever you think is best. If there is a desire to keep this more of a code import rather than fusing changes, that makes sense to me.
|
|
||
| impl std::fmt::Debug for GarnetQueryLabelProvider { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| f.debug_struct("GarnetQueryLabelProvider") |
There was a problem hiding this comment.
Why not derive(Debug) and print the pointer value?
|
@hailangx Can you answer the questions from mark and add a commit to address the issues to this PR? |
|
I have another PR to resolve some of the comments #808 |
### Removal of Obsolete Wrapper * Removed the `FilteredSearchResults` wrapper, which previously handled conversion between internal and external IDs for search results. This is no longer needed with previous change to expose generic external id in the beta search strategy. --------- Co-authored-by: Haiyang Xu <haixu@microsoft.com>
9a18fb2 to
a5262b0
Compare
|
I think this is ready for final review. We've addressed all the comments I think and I have added bunch of documentation. |
This imports the
diskann-garnetprovider and a benchmarking utility for Garnet vector set workloads,vectorset, to the repo.diskann-garnet: This is aDataProviderimplementation for the Garnet cache service. It currently supports u8 and f32 full precision vector set using cosine distance. Quantization and other features will be added soon.vectorset: is a benchmarking tool for Redis vector sets. Since Garnet speaks the Redis protocol, this allows for testing both Garnet and Redis vector workloads.Both of these were previously developed in their own repos in their early stages.