ADR 0007: Sharded Small-Pool Allocator Implementation Plan
Status
Accepted
Context
We need a custom allocator that meets the performance and telemetry expectations outlined in ADR 0006 and the supplementary architecture design. The allocator must deliver predictable tail latency for tiny allocations, isolate shard contention, and provide strong instrumentation hooks. This document captures the detailed implementation plan before we start modifying the allocator core.
Implementation reference (current code):
- Allocator entrypoints and stats:
src/sydra/alloc.zig - Slab shard implementation:
src/sydra/alloc/slab_shard.zig - Bench harness:
tools/bench_alloc.zig - Telemetry surfaces: CLI
statsand HTTP/debug/alloc/stats
Workload & Constraints
- Hot objects: 16–256 B, bursty, multi-writer with many concurrent readers.
- Requirements:
- Tail latency improvements (≥30% p99, ≥20% p999).
- Predictable reclamation via epoch/QSBR.
- Stable RSS during churn (±10%).
- Rich telemetry (shard occupancy, contention, deferred queues).
High-Level Architecture
- Per-core slab shards with fixed-size classes matched to hot object sizes.
- Thread-local shard selection (TLS) for constant-time lookup.
- Epoch-based deferred reclamation for cross-shard frees.
- Instrumentation across shards and fallback paths.
- Bench harness extensions to validate improvements.
Implementation Phases
Phase 1 – ShardManager & TLS Wiring
Status: completed (SmallPoolAllocator.ShardManager, unit test “shard manager assigns per-thread shards”)
Decisions
ShardManagerowns an array ofShardinstances plus a fallback allocator.- Threads obtain a shard ID via TLS; initial assignment can use round-robin on creation.
SmallPoolAllocator.initoptionally creates theShardManagerwhen sharding is enabled.
Tasks
- Implement
ShardManager(init,deinit,currentShard,fallback). - Add TLS helper (
threadlocal var thread_shard_id) and atomic counter for round-robin. - Extend
SmallPoolAllocatorstruct with optionalShardManager. - Expose configuration via build options (
-Dallocator-shards, fallback for disabled state).
Validation
- Unit test confirming two threads map to different shard IDs.
Phase 2 – Integrate Shard Alloc/Free into Fast Path
Status: completed (SmallPoolAllocator.allocInternal/freeFn, stats surfaced via snapshotSmallPoolStats() and tested)
Decisions
- Allocation order: shard → legacy bucket → GPA fallback.
- Free order mirrors allocation.
- Track counters for shard hits/misses and legacies.
Tasks
- Update
slab_shard.Shard.allocate/freeto consume the shared GPA (ret_addr for debug). - Modify
SmallPoolAllocator.allocInternal/freeFnto try shard manager first. - Record metrics (
shard_allocs,shard_frees,fallback_allocs, etc.).
Validation
- Unit tests for shard allocation success, fallback on oversize requests, cross-shard free returning true.
Phase 3 – Epoch/QSBR Reclamation
Status: largely complete; debug assertions plus deferred snapshot tests added; remaining instrumentation tracked in Phase 4/5. Decisions
- Each shard keeps a
current_epoch,deferredqueue, and per-thread observation map. - Writers push cross-shard frees into deferred list tagged with current epoch.
collectGarbagemoves nodes back to freelist once the minimum observed epoch surpasses the node’s epoch.- Provide
enterEpoch/leaveEpochfor readers, called around long-lived operations.
Status
- Implemented
Shard.freeDeferred, aggregated epoch tracking (global_epoch,thread_epoch), and manager wrappers (enterEpoch/leaveEpoch/advanceEpoch). - Cross-shard frees now enqueue into deferred lists and are recycled via
collectGarbage().
Tasks
- Extend
FreeNodewithclass_state(already present) and newepochmetadata. - Implement
Shard.deferFreeandShard.collectGarbage. - Add manager-wide APIs to advance epochs and record thread observations (TLS map).
- Debug assertions: ensure
FreeNode.class_statematches target shard, no double-free.
Validation
- Unit test: thread A allocates, thread B frees, deferred queue increments,
collectGarbagereturns node after epoch advancement.
Phase 4 – Instrumentation & Stats
Decisions
- Extend
SlabStatsto report:deferred_count,current_epoch,min_observed_epoch.- Contention metrics (wait/hold time, attempted cross-shard frees).
snapshotSmallPoolStatsmerges legacy buckets + shard stats.- Expose new stats via
AllocatorHandle.
Status
- Stats struct now aggregates shard hits/misses, deferred totals, and epoch bounds; bench driver emits the new metrics.
Tasks
- Add atomic counters in
slab_shard. - Update
alloc.zigstats structs & HTTP/CLI telemetry surfaces. - Document metrics in README/supplementary doc.
Validation
- Tests verifying stats reflect usage after simulated workloads.
- Manual check via
zig build run -- stats.
Phase 5 – Benchmarks & Stress Tests
Decisions
- Extend
tools/bench_allocwith options:--allocator=shardedto drive new path.- Shard count selection.
- Output p50/p95/p99/p999, deferred counts, fallback counts.
- Provide stress test scenario for cross-thread churn to validate epoch logic.
Tasks
- Instrument bench to record new metrics.
- Add multi-threaded Zig tests (guarded by
std.testingconcurrency allowances). - Optionally add debug-only slab poisoning to catch use-after-free.
Validation
- Compare metrics against acceptance criteria.
- Ensure regression checks fail loudly if deferred queue spikes or contention climbs.
Status
tools/bench_allocnow emits latency percentiles, shard/fallback counters, and an optional--stress-secondschurn harness that records deferred backlog and latency drift. Outputs feed into regression comparisons.
Phase 6 – Documentation & Cleanup
- Update README and design doc with new allocator options, metrics, expected behavior.
- Add diagrams or tables summarizing shard architecture.
- Ensure code comments describe tricky bits (TLS, epoch reclamation).
- Surfaced shard/fallback metrics through CLI
sydradb statsand HTTP/debug/alloc/statsfor live inspection.
Risks & Mitigations
- Cross-shard misuse: rely on debug assertions and unit tests; document API expectations.
- Epoch overhead: keep TLS data lightweight; only long-lived operations call
enterEpoch. - Fallback pressure: monitor fallback counters; adjust slab classes once telemetry shows distribution.
- Concurrency bugs: use atomics and per-shard locks carefully; keep critical sections short.
References
- ADR 0006 (allocator section)
- Supplementary architecture design
- Current implementation:
src/sydra/alloc.zig,src/sydra/alloc/slab_shard.zig,tools/bench_alloc.zig