Posts

Technical illustration of planner, executor, and reviewer components connected by explicit handoffs and an approval gate before a final file write

Multi-Agent Systems Without the Theater

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit dadf203, which adds a small but real multi-agent mode to the demo: the harness can now run with explicit planner, executor, and reviewer roles, persist role activity, record handoffs, and expose those artifacts through the CLI and saved run files. The core changes are in: ...

Technical illustration of an agent workflow passing through a policy gate before a filesystem write inside an allowed directory boundary

Sandboxing, Isolation, and Safe Execution

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit 98c6302, which adds an explicit policy layer to the harness: tools now carry action categories, risky writes are checked against allowed output roots before execution, and policy decisions are persisted in traces and summaries. The key code changes are in: src/harness_engineering/policy.py src/harness_engineering/tools.py src/harness_engineering/runner.py src/harness_engineering/cli.py src/harness_engineering/mcp.py src/harness_engineering/tracing.py src/harness_engineering/store.py src/harness_engineering/workflow.py sample_data/policy/restrictive.json That matters because “sandboxing” gets used too loosely in agent conversations. Sometimes people mean a real OS sandbox. Sometimes they mean a container. Sometimes they mean “the model only has a few tools.” Those are not the same thing. ...

Technical illustration of an agent workflow feeding event traces into a compact observability panel and evaluation checklist

Tracing, Observability, and Evals for Agent Systems

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit 85c762c, which adds two concrete things the repo was missing: a persisted trace-summary surface for every run a lightweight eval runner with trace-aware fixtures The key changes are in src/harness_engineering/tracing.py, src/harness_engineering/store.py, src/harness_engineering/cli.py, and the new src/harness_engineering/evals.py module, plus starter fixtures in sample_data/evals/basic.json. That matters because a lot of agent writing still treats observability as an afterthought and evals as a benchmark spreadsheet. In practice, most production pain shows up somewhere else: ...

Technical illustration of an agent workflow paused at an approval gate while a human reviewer decides whether to continue

Human-in-the-Loop Done Properly

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit 352fba2, which adds a first-class pending-approval inspection surface to the existing approval-gated harness. The key changes are in src/harness_engineering/runner.py, src/harness_engineering/cli.py, and src/harness_engineering/store.py. That matters because most writing about “human in the loop” in agent systems is still weirdly sloppy. A model says “should I proceed?”, a human types “yes”, and the demo declares the governance problem solved. It is not solved. In production, approval is not a vibe, not a chat convention, and not a magical hidden boolean inside the runtime. It is a workflow boundary with state, context, inspection, and recovery semantics. ...

Multimodal radiotherapy contouring with CT, PET, clinical text, and AI fusion

LLM and VLM for Radiotherapy Contouring: State of the Art, Gaps, and Opportunities

Radiotherapy contouring is entering a new phase. For years, progress was driven mainly by image segmentation: better backbones, larger datasets, and stronger 3D architectures improved the automatic outlining of visible anatomy. That approach remains highly effective for organs-at-risk (OARs), where the task is largely to identify and delineate structures that can be seen directly on imaging. Target contouring is different. Gross tumor volume (GTV), clinical target volume (CTV), nodal target volumes, and postoperative beds are not defined by pixels alone. They are shaped by disease extent, stage, pathology, surgical status, laterality, risk patterns of spread, institutional practice, and protocol logic. In real clinical workflow, radiation oncologists do not contour from images alone; they contour from images interpreted in context. ...

Layered agent memory diagram showing working context, session state, and retrieval memory around a checkpointed workflow

Memory Architecture for Agents: Context, Sessions, and State

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit d20e352, which adds an explicit memory-layer model to the demo instead of treating every stored value as one blurry thing called “memory.” The core addition is src/harness_engineering/memory.py, plus wiring in src/harness_engineering/store.py and src/harness_engineering/cli.py so every run now emits a memory.json snapshot and the CLI exposes a memory command. ...

Abstract architecture of embeddings, ANN indexes, storage layers, and AI agents

Vector Databases Explained: History, Internals, and Why Agentic AI Depends on Them

A lot of the recent attention on vector databases makes them sound like a brand-new invention created by the generative AI boom. That is not really true. What changed is not the underlying math. What changed is the workload. For more than a decade, industry and academia had already been working on large-scale nearest-neighbor search for recommendation systems, image retrieval, search, ads, and ranking. The generative AI wave did something different: it turned vector retrieval from a specialized backend capability into a mainstream application primitive. Once teams started building retrieval-augmented generation (RAG), long-term AI memory, semantic search, and tool-using agents, vector databases stopped being niche infrastructure and became part of the standard stack. ...

Engineering workflow diagram with checkpoints, event history, approval gate, and pause-resume arrows

Durable Execution Is the Difference Between a Demo and a System

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit 9612b58, which adds persisted run summaries plus replay-oriented history inspection to the existing approval-gated harness. The key changes are in src/harness_engineering/store.py and src/harness_engineering/cli.py. That addition matters because durable execution is where most agent demos quietly stop being honest. It is easy to show a model calling tools in one uninterrupted run. It is much harder to explain what happens when execution pauses for approval, the process dies, the machine reboots, the reviewer returns malformed output, or an operator needs to understand what state the run is actually in. ...

Systems diagram showing an agent harness with workflow nodes, approval gates, manager-worker branches, and handoff arrows

Orchestration Patterns: Loops, Graphs, Managers, and Handoffs

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did add a real repo capability before publishing. The repo now includes a workflow export layer in src/harness_engineering/workflow.py, plus a workflow CLI command in src/harness_engineering/cli.py that renders the current harness orchestration as structured JSON or Mermaid. That change shipped in commit a007c08. That may sound like a documentation flourish. It is not. The point of an orchestration post is not to wave vaguely at boxes and arrows. It is to make the runtime’s control structure explicit enough that you can inspect it, reason about it, and argue about whether it is the right one. ...

Linux kernel privilege escalation concept with memory pages, containers, and security signals

Copy Fail (CVE-2026-31431): Why a Small Linux Kernel Bug Became a Serious Root Escalation Risk

Date: May 2, 2026 Author: 67 AI Lab Classification: Public Technical Insight Executive Summary CVE-2026-31431, also known as Copy Fail, is a high-severity local privilege escalation flaw in the Linux kernel’s crypto subsystem. The bug lives in algif_aead, part of the AF_ALG userspace crypto interface, and traces back to an in-place optimization introduced in 2017. What makes this vulnerability unusually important is not just that it yields root, but that public analysis describes the exploit path as deterministic, compact, and cross-distribution. By chaining AF_ALG with splice(), an unprivileged local user can achieve a controlled 4-byte overwrite in page cache for a readable file. In practice, that is enough to corrupt the in-memory image of a setuid binary such as /usr/bin/su and obtain a root shell. ...