ADR 0006: Git-Inspired Storage Model for SydraDB
Status
Proposed
Implementation reference (current, related subsystems):
Context
- SydraDB currently relies on WAL segments and columnar time-series segments without a first-class versioned metadata model.
- We plan to support advanced retention, branching, and replay scenarios reminiscent of DVCS workflows (branching, commits, diffs).
- The allocator roadmap (#61) will introduce shard-local arenas that can benefit from an object-store style layout.
Goals
- Provide a content-addressable object graph for series metadata, schemas, and compaction manifests.
- Enable lightweight branching/checkpoint semantics for WAL replay and experimentation.
- Integrate with forthcoming custom allocator features (per-shard arenas, append-only segments).
Proposed Architecture
Object Types
- Blobs: immutable payloads for segment manifests, tag dictionaries, WAL bundle summaries. Stored as
[type-prefix | hash | payload]. - Trees: directory-like objects mapping logical paths (
series/<series-id>/segment/<segment-id>) to blob or tree hashes. - Commits: point-in-time snapshots referencing a root tree, parent commit hashes, metadata (
timestamp,author,message, optionalbranch). - Refs: named pointers (
main,snapshots/<date>) resolved lazily; stored as plain text files within ref namespace for quick updates.
Storage Layout
- Content-addressable store under
<data_dir>/objects/<prefix>/<hex>(aligns with the currentobject_storelayout). - Separate ref namespace
data/refs/*mimicking Git'srefs/heads,refs/tags. - Object serialization uses Zig-friendly framing (length-prefixed sections) to avoid zlib; compression delegated to codec layer.
WAL Integration
- WAL append produces objects:
wal/chunk/<sequence>blob describing offsets and checksums.wal/index/<sequence>tree linking to chunk blobs.
- Periodic checkpoints create commits referencing latest segment manifests + WAL index.
- Replay uses refs to locate appropriate commit, then streams WAL chunks in hash order.
Compaction & Segments
- Compaction outputs a blob per segment (metadata + column stats).
- Tree entries track per-rollup segments; commit update becomes atomic rename (
old-hashreplaced withnew-hash). - Segment GC implemented via reachability from current refs (mark-sweep).
Allocator Tie-in
- Shard-local arenas allocate temporary nodes when building trees before flushing to object store.
- Append-only blob creation leverages new bump allocators; cross-shard references remain content-addressed.
- Deferred reclamation aligns with commit rollbacks by discarding unreferenced arenas.
Migration Plan
- Implement object store primitives (hashing, storage paths, blob encoding).
- Introduce commit writer in compaction pipeline; maintain
refs/heads/main. - Update WAL replay to resolve commit graph before reading segments.
- Add reachability-based GC CLI command.
Open Questions
- Hash algorithm: choose between BLAKE3 (fast, 32 bytes) vs SHA-256 (interoperability). Leaning BLAKE3.
- Security considerations for user-provided blobs; need validation.
- Multi-node coordination: eventual replication strategy for refs and objects.
- Exposure via API? Possibly
GET /debug/objects/<hash>for diagnostics.
Options Considered
- Stick with current manifest files (status quo): simpler but no branching, more manual GC.
- Leverage existing git repository: operationally heavy, requires external tooling, not Zig-native.
- Adopt content-addressable object store (chosen): fits allocator roadmap, extensible for branching.
References
- Issue #61 (allocator shards), upcoming data-model issue (to be opened).
- Git internal docs as inspiration: commits, trees, blobs, refs.