[REFACTOR][TIR] Remove body from AllocBuffer and DeclBuffer#18876
[REFACTOR][TIR] Remove body from AllocBuffer and DeclBuffer#18876tqchen wants to merge 4 commits intoapache:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant refactoring of the TIR (Tensor IR) by removing the explicit 'body' field from Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request is a significant refactoring that removes the body from AllocBuffer and DeclBuffer nodes in TIR, making them flat statements. This change simplifies the IR structure and has been applied consistently across a large number of files, including core IR definitions, visitors, mutators, transform passes, and codegen backends. The updates to tests and IR builder utilities also reflect these new semantics. The overall change is well-executed. However, I've found a potential issue in the AllocBufferCalculator which seems to no longer correctly calculate peak memory usage due to the change in allocation scoping.
Note: Security Review did not run due to the size of the PR.
Make AllocBuffer and DeclBuffer flat (no body field), like Bind. Buffer scope extends to end of enclosing scope instead of being defined by a nested body. - Remove body field from AllocBufferNode/DeclBufferNode - Update all visitors, mutators, codegen backends - Add scope-stack tracking in calculate_allocated_memory and lower_tvm_builtin for correct peak memory and LIFO free ordering - Remove obsolete AllocateFrame/DeclBufferFrame/AllocBufferFrame - Remove T.allocate() (replaced by T.alloc_buffer()) - Two-pass approach in lower_warp_memory (pre-scan function body) - Deferred remapping in lower_thread_allreduce for flat IR
d22b208 to
2b9541f
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This PR is a major refactoring to remove the body from AllocBuffer and DeclBuffer nodes, making them flat statements. This simplifies the IR structure and makes it more consistent with Bind. The changes are extensive, touching many files across the codebase, including IR definitions, IR builders, printers, and numerous analysis and transformation passes.
The overall refactoring is well-executed. Many passes have been correctly updated to handle the new flat semantics, often by adopting patterns like deferred processing or analyzing SeqStmt to understand allocation scopes. This is a significant improvement to TIR's design.
I have found one critical issue in the lower_thread_allreduce pass where the volatile_scope attribute for shared memory buffers is not correctly applied after this refactoring. This could lead to incorrect code generation for GPU targets.
Additionally, as noted in the test changes, the RemoveNoOp pass currently doesn't handle dead allocation elimination for the new flat AllocBuffer nodes. This seems to be a known limitation of this PR.
Note: Security Review is unavailable for this PR.
|
@gemini-code-assist review |
There was a problem hiding this comment.
Code Review
This pull request performs an extensive refactoring to flatten AllocBuffer and DeclBuffer nodes, making them simple statements by removing their bodies. However, a critical vulnerability has been introduced where attributes like volatile_scope and storage_alignment are now applied to empty or truncated scopes. This breaks synchronization on GPUs and loses alignment metadata, which can lead to data races and incorrect computation results. Specifically, the handling of volatile_scope for shared memory in src/s_tir/transform/lower_thread_allreduce.cc is broken, as the new implementation emits an AttrStmt with an empty body, rendering the volatile attribute ineffective. A pattern similar to the correct implementation in lower_warp_memory.cc, which gathers siblings into a new body, should be applied to address this.
With flat IR, T.decl_buffer() returns a plain Buffer instead of a frame (context manager). Move buffer declarations out of frame_scope lists in scan, sort, searchsorted, and nms_util.
… codegen For AllocBuffer, codegen now reads alignment directly from the buffer's data_alignment field instead of relying on a separate AttrStmt. Remove the AttrStmt emission from lower_tvm_builtin. AttrStmt storage_alignment is kept for function parameter buffers (from tvm_ffi_binder).
|
/gemini review |
There was a problem hiding this comment.
Code Review
This is a large and impactful refactoring to flatten AllocBuffer and DeclBuffer statements in TIR, removing their body field. The changes are extensive, touching core IR definitions, TVMScript, numerous TIR passes, and codegen backends. The overall implementation looks solid and consistent. The adaptation of visitors and mutators to the new flat scoping rules, particularly using deferred processing in passes like InjectDoubleBuffer and LowerThreadAllreduce, is well-handled. The replacement of the volatile_scope attribute with a tir.volatile annotation is also a good cleanup. I have one suggestion for improvement.
| ffi::Array<Stmt> seq; | ||
| seq.push_back(stmt); | ||
| seq.push_back(Bind(op->buffer->data, | ||
| Call(op->buffer->data.dtype(), builtin::nd_mem_alloc_with_scope(), args))); | ||
| stmt = SeqStmt(seq); |
There was a problem hiding this comment.
This pass appears to be lowering AllocBuffer to a Bind statement that performs the actual memory allocation. However, the current implementation replaces the AllocBuffer node with a SeqStmt containing both the original AllocBuffer and the new Bind statement. This seems redundant, as the AllocBuffer is a high-level construct that this lowering pass should replace.
Other lowering passes in this PR (e.g., LowerVtcmAlloc) replace AllocBuffer with just a Bind statement. To be consistent and avoid redundancy, this pass should probably do the same.
| ffi::Array<Stmt> seq; | |
| seq.push_back(stmt); | |
| seq.push_back(Bind(op->buffer->data, | |
| Call(op->buffer->data.dtype(), builtin::nd_mem_alloc_with_scope(), args))); | |
| stmt = SeqStmt(seq); | |
| stmt = Bind(op->buffer->data, | |
| Call(op->buffer->data.dtype(), builtin::nd_mem_alloc_with_scope(), args)); |
Summary
bodyfield fromAllocBufferNodeandDeclBufferNode, making them flat statements consistent withBindSeqStmtsemanticsTest plan