Chapter 07

Performance

Around 70 microseconds warm, 73,000 req/s on a laptop. The brief targeted under 5ms for typical 10-20-node graphs; the engine clears it by 70x at warm steady-state.

Methodology

Headline numbers

ModeSourcep50p95p99req/s
Warm, 1 workerlocal file0.07 ms0.09 ms0.14 ms13,333
Warm, 1 workerlocal dfdb0.07 ms0.08 ms0.12 ms13,889
Warm, 1 workerprod DF (Render)0.06 ms0.08 ms0.10 ms14,706
Warm, 16 workerslocal dfdb0.13 ms0.23 ms1.45 ms73,529
Cold (fresh source per req)local dfdb2.51 ms3.68 ms6.21 ms375
Cold (fresh source per req)prod DF (Render)1474 ms1664 ms1838 ms1

What the numbers mean

Warm steady-state is source-agnostic

The first three rows are statistically identical because the engine's per-(ruleId, version) and per-referenceSet caches mean DocumentForge is hit once, ever, per pinned snapshot. Whether DF is a local file, a loopback HTTP node, or a Render-hosted instance is irrelevant once caches are warm. The 70µs cost is dominated by JSONPath traversal and filter evaluation — pure CPU work.

Cold-path matters only at boot or cache miss

The last two rows show what happens when the engine has to refetch from DF every request. Local dfdb on loopback completes in 2.5ms — three SQL queries plus a reference-set fetch. Cross-region DF goes 600× slower because of TLS handshake + the Render instance cold-starting. In production this happens at boot, on publish (cache invalidation), and on pod restart — never on every request.

Concurrency scales linearly

16 workers reach 73K req/s — 5× the single-worker number, with p99 still under 1.5ms. The runner is lock-free on the hot path: rule snapshots and reference sets live in ConcurrentDictionary caches; the DAG walker holds zero shared mutable state.

What RuleForge does to keep the hot path fast

  1. Source caching. DocumentForgeRuleSource caches every (ruleId, version) tuple indefinitely (versions are immutable). Environment bindings cache for 30 seconds.
  2. JSONPath subset, hand-rolled. No regex, no AST allocation. Five tokens, walked iteratively.
  3. Tight evaluator types. Each filter evaluator is a static method with no allocations on the happy path beyond the resolved-values list.
  4. Single-pass DAG walk. Topological-ish queue with a "wait for deps" gate. No re-evaluation. Cycle check happens once, at validate time.
  5. Shared JsonSerializerOptions. One global instance with camelCase + null-omit + case-insensitive reads. STJ caches metadata against it; no per-call cost.

Co-location matters more than embedding

The performance argument for going further — embedding DocumentForge as an in-process library — would shave the 2.5ms cold path to ~50µs. That's a 50× win on a code path that almost never fires. Not worth the deploy-story coupling. Run dfdb as a sidecar on the same host (Docker compose, Render private service, k8s pod) and you get 99% of the benefit with 5% of the integration cost.

Reproducing

git clone https://github.com/tailwind-retailing/ruleforge.git
cd ruleforge
dotnet build

# Warm bench, local file source
dotnet run --project src/RuleForge.Cli -- bench \
  --endpoint /v1/ancillary/bag-policy \
  --request  '@fixtures/scenarios/s-bag-3pc-markup15.json' \
  --warmup 100 --iterations 2000

# Concurrent bench against local dfdb
dotnet run --project src/RuleForge.Cli -- bench \
  --endpoint /v1/ancillary/bag-policy \
  --request  '@fixtures/scenarios/s-bag-3pc-markup15.json' \
  --df --df-base http://localhost:5000 \
  --warmup 100 --iterations 10000 --concurrency 16

Where you'd profile next

If your rule graphs balloon past 50 nodes or your reference sets exceed ~1k rows, the next bottlenecks (in order):

  1. Reference-set linear scan inside lookup-replace — currently O(rows). Index by matchOn columns to make it O(log n).
  2. Calc-node expression parsing — NCalc parses fresh per call. Cache compiled LogicalExpression by expression string.
  3. Trace allocation in debug mode — per-node TraceEntry + ctx snapshots add allocations. Production mode (no --debug / no ?debug=true) skips this entirely.

None of these matter at the current scale; flagged here so you know where to dig if you find yourself off the cliff.