On-disk format (as implemented)
SydraDB persists state under data_dir (default ./data), using a small set of files/directories.
For module-level details, see:
src/sydra/storage/wal.zigsrc/sydra/storage/segment.zigsrc/sydra/storage/manifest.zigsrc/sydra/storage/tags.zigsrc/sydra/storage/object_store.zigsrc/sydra/snapshot.zig
Directory layout
Under data_dir, the engine uses:
MANIFEST– manifest of segment entries (per series + hour bucket)wal/– write-ahead log filescurrent.wal- rotated
*.walfiles named by epoch millis
segments/<hour_bucket>/*.seg– per-series, per-hour segment filestags.json– tag index snapshotobjects/<prefix>/<hex>– loose content-addressed objectsobjects/packs/*.pack– immutable packed object containersobjects/packs/*.idx– fanout-based pack indexes for packed objectsobjects/packs/*.rev– reverse indexes that preserve physical pack object orderobjects/packs/*.manifest– per-pack manifests with checksums and per-type object countsobjects/info/store-format– repository-wide storage format marker and feature defaultsobjects/info/repository-id– stable repository identity used by local bundle/fetch/push workflowsobjects/info/alternates– optional list of borrowed local repositories searched after local object lookupobjects/info/pack-inventory– active pack inventory with pack paths, BLAKE3 digests, and object counts used by thin bundle/apply planningobjects/info/multi-pack-index– optional pack-set fanout index across all active packsobjects/info/commit-graph– optional commit ancestry side index with generation numbers and logical changed-path Bloom filtersobjects/info/reachability-bitmap– optional ref-keyed reachable-object side index for CAS maintenance fast pathsobjects/info/object-refs– optional explicit child-edge index keyed by object id for GC/fsck/bitmap refreshobjects/cruft/<timestamp>/...– quarantined unreachable CAS content retained until the GC grace window expiresrefs/– loose compatibility refs and reflogs for pre-migration repositoriesreftable/– append-only reftable stack for migrated/new repository refs and reflogstables.list– ordered active reftable stackstate– monotonic next-update counter for update-indexed table naminginfo/summary– rebuildable table summary index for key-range and reflog-range pruning<min_update>-<max_update>.table– update-indexed reftable files, including tombstones when refs are deleted
symrefs/– symbolic ref targets such as localHEADand mirrored remote HEAD targetslost-found/– optional fsck output for dangling commit/blob/tree ids
WAL format (v0)
WAL files are append-only streams of records.
Each record is encoded as:
[len:u32][type:u8][series_id:u64][ts:i64][value:f64bits][crc32:u32]
Notes:
lenis the payload byte length (type..value) and is little-endian.typecurrently uses:1= Put
series_id,ts, andvalue_bitsare little-endian.crc32is computed over the payload (type..value) and stored little-endian.
Replay order:
- All
*.walfiles underwal/are replayed in filename sort order, withcurrent.walforced to replay last.
Segment format
Segment files store points for a single (series_id, hour_bucket) group.
v1: SYSEG2
Header:
[magic:6 "SYSEG2"]
[series_id:u64][hour:i64][count:u32]
[start_ts:i64][end_ts:i64]
[ts_codec:u8][val_codec:u8]
Default codecs written by the engine:
ts_codec = 1– delta-of-delta + ZigZag varint (src/sydra/codec/gorilla.zig.encodeTsDoD)val_codec = 1– Gorilla-style XOR encoding (src/sydra/codec/gorilla.zig.encodeF64)
See also: src/sydra/codec/gorilla.zig.
v0: SYSEG1 (back-compat)
- Timestamp deltas encoded as ZigZag varints
- Values encoded as raw
f64bits
Manifest
The manifest tracks segment entries and is used to:
- find candidate segments during range queries
- build per-series “highwater marks” during WAL recovery (so old WAL points aren’t duplicated)
When metadata_read_mode = "primary" and a CAS head exists, the runtime can rebuild its in-memory manifest, tag index, and series catalog directly from the CAS snapshot without recreating these mirror files on startup. In cas_mode = "dual_write", MANIFEST, tags.json, and series_catalog.jsonl remain compatibility mirrors written by normal flush/maintenance flows and by explicit CAS export commands.
CAS objects
The CAS layer stores immutable objects addressed by a BLAKE3 hash of (type, payload).
- Loose objects live under
objects/<prefix>/<hex>. - Packed objects live in
objects/packs/*.packand are indexed byobjects/packs/*.idx. objects/info/store-formatversion 3 marks canonical repositories that default to the reftable ref backend, CAS-primary startup, and canonicalsegment_root/journal_rootmetadata for active reachable commits. Versions 1 and 2 remain readable compatibility formats for pre-normalization repositories.objects/info/multi-pack-indexprovides an optional cross-pack fanout table so lookups can resolve mixed pack sets without scanning every individual.idxfile first. Version 2 also records whether each active pack has a reverse index sidecar.objects/info/object-refsrecords typed object-to-child edges explicitly, so reachability,fsck, and bitmap refresh no longer need to infer every edge by reparsing arbitrary blob payloads.objects/info/reachability-bitmapcaches the exact reachable object-id set for the current sorted ref snapshot, socas pack, bundle selection, and non-reflog reachability checks can fall back to a side index instead of walking the full DAG every time.- The current implementation stores whole objects in packs; it does not use delta compression.
cas packwrites an additional pack/index/manifest set for the currently reachable loose object set, refreshesobjects/info/multi-pack-index, and removes redundant loose copies for the newly packed objects without pruning older active packs.cas gc --applypreserves unreachable content by first copying active pack files and sealing unreachable loose objects intoobjects/cruft/<timestamp>/packs/*.pack, then pruning older cruft directories after the configured grace window.
Current typed metadata payloads include:
- segment descriptors with a canonical
segment_roottree id, compatibilityContentRef, and optional mirror paths for exported.segfiles - tag snapshots
- series catalog snapshots
- WAL indexes with a canonical
journal_roottree id, compatibilityContentRef, optional mirror names, and captured byte counts for mutablecurrent.wal - checkpoint-state blobs with per-series replay high-water and the ordered WAL capture set for the commit
- tree objects and commit objects that link the metadata DAG together
Native segment roots now store:
- a
metablob with series/hour/count/range/codec metadata plus optional selector strings - a
blocks/tree keyed by logical block number - per-block trees containing
stats,ts, andvaluesentries tsandvaluespayloads chunked into extent trees with 64 KiB leaf blobs by default
Native journal roots now store:
- a
metablob with file size and frame count - a
frame_indexblob that records WAL frame offsets and lengths - a
frames/tree of immutable blob-backed WAL frames in replay order
ContentRef currently supports:
blob(<object id>)for legacy compatibility payloadsextent_tree { root_id, size_bytes, chunk_bytes }for chunked segment and WAL content stored as Merkle trees of chunk blobs
See src/sydra/storage/manifest.zig for the in-memory model and load/save behavior.
Snapshot/restore
snapshot/restore now operate on CAS bundles instead of directory-copying the live data directory.
A bundle directory currently contains:
bundle.manifest– versioned bundle manifest with exported refs, prerequisite commits, repository-format metadata, preserved pack paths, and copied ref metadata files- bundle manifests version 3 carry the source repository id and any borrowed local repositories referenced by the source bundle
- bundle manifests version 4 additionally record pack digests/object counts and prerequisite refs so apply/fetch can skip packs already present in the destination
objects/– bundle-local reachable loose objects plus preserved active pack/index/manifest filesobjects/packs/*.pack,*.idx,*.manifest– immutable active-pack payloads plus indexes and manifests copied directly from the source repositoryobjects/info/*– copied store-format and side-index files when presentreftable/– copied reftable stack snapshot for migrated repositoriesrefs/andlogs/refs/– copied loose refs/reflogs for compatibility repositories
Operational notes:
snapshotis a thin wrapper overcas bundle create <dst_dir>.restoreis a thin wrapper overcas bundle apply <src_dir>.- Applying a bundle now preserves pack files and ref metadata directly instead of re-inserting every object through the loose-object path.
- Restoring a bundle reapplies
objects/,objects/info/*, and whichever ref backend snapshot (reftable/or looserefs/) the bundle was created from; it still does not recreateMANIFEST,tags.json, orseries_catalog.jsonlunless an explicit CAS export command is run afterward. - Borrowed repositories are configured through
objects/info/alternates; object lookup stays local-first and falls back to borrowed repositories only when a local object is missing. - Incremental bundles list prerequisite commits in
bundle.manifest;cas bundle applyrejects them unless the destination store already contains those prerequisite objects. - Bundle apply now skips pack copies when the destination already has the same pack path and digest, then rebuilds local side indexes after the merge instead of trusting copied commit-graph or bitmap state blindly.
Integrity and cleanup
cas fsckis reflog-aware by default, so commits only referenced by reflogs are still considered reachable.objects/info/commit-graphversion 2 stores fixed-width Bloom filters for logical metadata paths such asmetadata/segments,metadata/tags,metadata/series_catalog, andwal/*.- Active packs now carry adjacent
.manifestfiles with per-type object counts and pack checksums;cas fsckvalidates those manifests before trusting mixed-pack reachability. - Reftable writes now use update-indexed table names, a persisted
reftable/statecounter, and block-indexed v3 tables with separate ref and reflog block indexes plus a footer checksum. Readers remain compatible with older flat v1/v2 tables and rewrite them into the current format during compaction or upgrade. - Runtime reftable lookups now use
reftable/info/summaryplus block-level cursor reads soreadRef,listRefs, and ref-scoped reflog reads no longer decode entire v3 tables by default. cas upgradenormalizes the active reachable commit graph before finalizing a v3 repository. After normalization, active refs emit canonical segment descriptors withsegment_root, canonical WAL descriptors withjournal_root, and a reftable-backedHEADsymref; loose refs and older descriptor payloads remain readable for compatibility and export paths but are no longer emitted for active history.cas fsckreports compatibility debt separately from corruption: reachable legacy segment descriptors, reachable legacy WAL descriptors, and loose refs that still exist in migrated v3 repositories.cas fsck --repaironly rebuilds derivable metadata: active-pack reverse indexes and manifests,objects/info/*side indexes, andreftable/stateplusreftable/tables.list. It never rewrites commit, tree, or blob payloads.cas fsck --connectivity-onlylimits validation to refs, reflogs, reachable objects, commit-graph consistency, and dangling detection.cas fsck --lost-foundwrites dangling commit/blob/tree ids intolost-found/.cas gc --no-reflogsignores reflog protection when deciding what is unreachable.cas expireapplies the policy-only maintenance phase: reflog trimming, checkpoint-ref expiry, and optional borrowed-object materialization fromobjects/info/alternatesinto local storage.cas pruneonly deletes previously quarantined cruft directories after the grace window and removes stale mirror files; it does not create new cruft packs.cas vacuumrunsfsck, optional repair, policy expiry/materialization, reachable-object repack, and thencas gcwith the configured prune grace period.
See also:
- Configuration (
data_dir, retention) - Source: engine orchestration (flush, compaction, retention triggers)