Cost, Latency, and Throughput Engineering for Agents
The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new repo commit is b9a60e8, which adds per-step timing metadata, lightweight workload and token estimates, and performance/cost rollups to the harness traces and summaries. That change lives mainly in: src/harness_engineering/models.py src/harness_engineering/runner.py src/harness_engineering/tracing.py src/harness_engineering/store.py tests/test_harness.py README.md The core additions are: new timing and metrics fields on StepResult in models.py wall-clock measurement inside RetryPolicy.call() in runner.py step-level workload estimation in HarnessRunner._estimate_step_metrics() aggregated performance and cost rollups in build_trace_summary() in tracing.py operator-facing rollups in RunStore.build_summary() in store.py This is the right place for Post 12 to land, because cost and latency problems in agent systems almost never come from one bad prompt. They come from system shape: ...