Version: Next

ADR 0006: Git-Inspired Storage Model for SydraDB

Status

Proposed

Implementation reference (current, related subsystems):

Context

SydraDB currently relies on WAL segments and columnar time-series segments without a first-class versioned metadata model.
We plan to support advanced retention, branching, and replay scenarios reminiscent of DVCS workflows (branching, commits, diffs).
The allocator roadmap (#61) will introduce shard-local arenas that can benefit from an object-store style layout.

Goals

Provide a content-addressable object graph for series metadata, schemas, and compaction manifests.
Enable lightweight branching/checkpoint semantics for WAL replay and experimentation.
Integrate with forthcoming custom allocator features (per-shard arenas, append-only segments).

Proposed Architecture

Object Types

Blobs: immutable payloads for segment manifests, tag dictionaries, WAL bundle summaries. Stored as [type-prefix | hash | payload].
Trees: directory-like objects mapping logical paths (series/<series-id>/segment/<segment-id>) to blob or tree hashes.
Commits: point-in-time snapshots referencing a root tree, parent commit hashes, metadata (timestamp, author, message, optional branch).
Refs: named pointers (main, snapshots/<date>) resolved lazily; stored as plain text files within ref namespace for quick updates.

Storage Layout

Content-addressable store under <data_dir>/objects/<prefix>/<hex> (aligns with the current object_store layout).
Separate ref namespace data/refs/* mimicking Git's refs/heads, refs/tags.
Object serialization uses Zig-friendly framing (length-prefixed sections) to avoid zlib; compression delegated to codec layer.

WAL Integration

WAL append produces objects:
- wal/chunk/<sequence> blob describing offsets and checksums.
- wal/index/<sequence> tree linking to chunk blobs.
Periodic checkpoints create commits referencing latest segment manifests + WAL index.
Replay uses refs to locate appropriate commit, then streams WAL chunks in hash order.

Compaction & Segments

Compaction outputs a blob per segment (metadata + column stats).
Tree entries track per-rollup segments; commit update becomes atomic rename (old-hash replaced with new-hash).
Segment GC implemented via reachability from current refs (mark-sweep).

Allocator Tie-in

Shard-local arenas allocate temporary nodes when building trees before flushing to object store.
Append-only blob creation leverages new bump allocators; cross-shard references remain content-addressed.
Deferred reclamation aligns with commit rollbacks by discarding unreferenced arenas.

Migration Plan

Implement object store primitives (hashing, storage paths, blob encoding).
Introduce commit writer in compaction pipeline; maintain refs/heads/main.
Update WAL replay to resolve commit graph before reading segments.
Add reachability-based GC CLI command.

Open Questions

Hash algorithm: choose between BLAKE3 (fast, 32 bytes) vs SHA-256 (interoperability). Leaning BLAKE3.
Security considerations for user-provided blobs; need validation.
Multi-node coordination: eventual replication strategy for refs and objects.
Exposure via API? Possibly GET /debug/objects/<hash> for diagnostics.

Options Considered

Stick with current manifest files (status quo): simpler but no branching, more manual GC.
Leverage existing git repository: operationally heavy, requires external tooling, not Zig-native.
Adopt content-addressable object store (chosen): fits allocator roadmap, extensible for branching.

References

Issue #61 (allocator shards), upcoming data-model issue (to be opened).
Git internal docs as inspiration: commits, trees, blobs, refs.

Status​

Context​

Goals​

Proposed Architecture​

Object Types​

Storage Layout​

WAL Integration​

Compaction & Segments​

Allocator Tie-in​

Migration Plan​

Open Questions​

Options Considered​

References​

Status

Context

Goals

Proposed Architecture

Object Types

Storage Layout

WAL Integration

Compaction & Segments

Allocator Tie-in

Migration Plan

Open Questions

Options Considered

References