Skip to content

Research/traft#17272

Draft
LittleHealth wants to merge 1 commit intoapache:masterfrom
LittleHealth:research/traft
Draft

Research/traft#17272
LittleHealth wants to merge 1 commit intoapache:masterfrom
LittleHealth:research/traft

Conversation

@LittleHealth
Copy link
Contributor

Description

This PR introduces a new consensus implementation TRaft for Apache IoTDB, with partition-aware log replication for time-series ingestion and a compatible extension to consensus request time access.

Content1: TRaft protocol implementation (leader/follower replication + election comparator)

  • Added TRaftConsensus and TRaftServerImpl as the core runtime for TRaft.
  • Added TRaft-specific data structures:
    • TRaftLogEntry (includes timestamp, partitionIndex, intra-partition metadata, Raft index/term)
    • TRaftFollowerInfo (tracks per-follower replication partition progress and in-flight indices)
    • TRaftLogStore (shared persistent log for all followers)
    • TRaftVoteRequest / TRaftVoteResult (term + TRaft freshness dimensions)
  • Implemented TRaft write path:
    • Leader parses request time and builds partitioned log entry.
    • Hot path: directly replicates in-memory entries to followers in the same active partition.
    • Cold path: followers catch up from shared disk log with partition-complete transition.
  • Implemented ACK handling:
    • Per-follower in-flight index cleanup on ACK.
    • Partition transition only after current partition in-flight set is drained.
  • Implemented election comparison rule:
    • Term first, then partitionIndex, then currentPartitionIndexCount.

Design choice note:

  • Chosen design: single shared persisted log + per-follower progress state.
  • Alternative considered: per-follower log queues.
    Shared log was selected to reduce memory overhead and avoid duplicate persistence.

Content2: IConsensusRequest time capability extension and write-plan compatibility

  • Extended IConsensusRequest with:
    • hasTime() (default false)
    • getTime() (default throws UnsupportedOperationException)
  • This keeps backward compatibility for non-time requests, while allowing TRaft to read timestamps via top-level interface.
  • Added/updated time behavior on time-carrying write nodes (including recursive subclasses):
    • InsertRowNode, InsertRowsNode, InsertTabletNode
    • InsertRowsOfOneDeviceNode, InsertMultiTabletsNode
    • RelationalInsertRowNode, RelationalInsertRowsNode, RelationalInsertTabletNode
    • PipeEnrichedInsertNode delegates hasTime() / getTime()
  • Fixed previously unimplemented min-time behavior:
    • InsertRowsOfOneDeviceNode#getMinTime()
    • InsertMultiTabletsNode#getMinTime()

Design choice note:

  • Chosen API: capability check + default fallback (hasTime/getTime).
  • Alternative considered: changing getTime() return type to OptionalLong.
    Rejected due to signature conflict with existing InsertRowNode#getTime() (long) and broad compatibility impact.

This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR
  • iotdb-core/consensus/src/main/java/org/apache/iotdb/consensus/traft/
    • TRaftConsensus
    • TRaftServerImpl
    • TRaftLogEntry
    • TRaftFollowerInfo
    • TRaftLogStore
    • TRaftNodeRegistry
    • TRaftRequestParser
    • TRaftVoteRequest
    • TRaftVoteResult
    • TRaftRole
  • iotdb-core/consensus/src/main/java/org/apache/iotdb/consensus/common/request/
    • IConsensusRequest
    • IndexedConsensusRequest
    • BatchIndexedConsensusRequest
    • DeserializedBatchIndexedConsensusRequest
  • iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/planner/plan/node/write/
    • InsertNode
    • InsertRowNode
    • InsertRowsNode
    • InsertRowsOfOneDeviceNode
    • InsertTabletNode
    • InsertMultiTabletsNode
    • RelationalInsertRowNode
    • RelationalInsertRowsNode
    • RelationalInsertTabletNode

@LittleHealth LittleHealth reopened this Mar 8, 2026
@LittleHealth LittleHealth marked this pull request as draft March 8, 2026 14:24
@LittleHealth LittleHealth force-pushed the research/traft branch 2 times, most recently from 316c777 to e9ac801 Compare March 8, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant