Sandboxing, Isolation, and Safe Execution

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new capability shipped in commit 98c6302, which adds an explicit policy layer to the harness: tools now carry action categories, risky writes are checked against allowed output roots before execution, and policy decisions are persisted in traces and summaries.

The key code changes are in:

src/harness_engineering/policy.py
src/harness_engineering/tools.py
src/harness_engineering/runner.py
src/harness_engineering/cli.py
src/harness_engineering/mcp.py
src/harness_engineering/tracing.py
src/harness_engineering/store.py
src/harness_engineering/workflow.py
sample_data/policy/restrictive.json

That matters because “sandboxing” gets used too loosely in agent conversations. Sometimes people mean a real OS sandbox. Sometimes they mean a container. Sometimes they mean “the model only has a few tools.” Those are not the same thing.

My claim in this post is narrower and more practical: before you can talk seriously about agent isolation, you need explicit runtime boundaries in the harness itself. If the harness cannot classify actions, gate risky side effects, record denials, and preserve those rules across pause/resume, then the rest of your safety story is hand-waving.

This repo still does not implement OS-level isolation. It does now implement a real harness-level policy boundary. That is worth showing clearly because it is the layer many demos skip.

What changed in the repo since the previous post

Post 8 added better observability through build_trace_summary(state) in src/harness_engineering/tracing.py and a lightweight eval runner in src/harness_engineering/evals.py. That made the runtime more legible.

For Post 9, the repo needed a stronger answer to a basic question: what stops a tool from performing a risky side effect in the wrong place?

Previously, the demo mostly treated safety as a boolean property of a tool. finalize_report was marked risky, and the run paused for approval before writing the final markdown file. That was already useful. But it was not enough to support a serious post about isolation.

The missing pieces were:

a way to classify tool actions beyond risky=True
an explicit policy object that could evaluate proposed actions
enforcement of allowed write destinations
persisted policy decisions in run artifacts and traces
a CLI surface to inspect the active policy
correct policy reloading on approve and resume

Those are now present.

The new policy surface in the live repo

The center of the change is PolicyEngine in src/harness_engineering/policy.py.

That module introduces:

DEFAULT_ACTION_CATEGORIES
PolicyDecision
PolicyEngine
default_policy_config()
load_policy_file()

The important design choice is that the policy system is not a separate abstract document first. It is derived from the live tool registry and then optionally overridden by a JSON file.

In src/harness_engineering/tools.py, the Tool dataclass now includes:

action_category: str = "utility"

The default registry assigns concrete categories:

search_mock → read_only
extract_facts → transform
draft_report → model_generation
finalize_report → filesystem_write
flaky_echo → utility

That is a small taxonomy, but it is already much more useful than “safe tool” versus “unsafe tool.” It lets the harness talk explicitly about the kind of action being attempted.

Then default_policy_config() builds a policy from the live registry. For filesystem_write tools, it automatically constrains writes to the current runs directory. In other words, the default policy is not “writes are risky but probably fine.” It is “writes are risky, require approval, and must stay inside the harness-owned output root.”

That is the first concrete isolation boundary in this repo.

Where enforcement actually happens

The most important implementation detail is in HarnessRunner._execute() inside src/harness_engineering/runner.py.

Before a tool runs, the runner now does this conceptually:

look up the tool
call self.policy.evaluate(tool_name, kwargs)
record the resulting PolicyDecision
emit a policy_checked trace event
deny execution immediately if the action is not allowed
otherwise continue to tool_start and actual tool execution

That means policy is not merely metadata. It is in the execution path.

When policy denies an action, _execute():

appends a failed StepResult
marks the run failed
clears pending_action
emits policy_denied
persists the updated state

That is exactly what I want from a harness policy boundary. A denial should be explicit, durable, inspectable, and boring.

There is also a second, very deliberate enforcement point in the draft_report branch of run_until_pause_or_complete(). Before the harness even pauses for approval, it computes the future output path for finalize_report and asks policy whether that write would be allowed.

So the repo now distinguishes two ideas that often get blurred together:

approval: a human must authorize the risky action
policy: the action must be structurally allowed at all

If policy says the write target is outside the allowed roots, the run fails before approval. That is correct. Human approval should not be a magic override for a disallowed target unless you explicitly design it that way.

Real examples from the live repo

Before writing this post, I verified the repo state in /home/james/.openclaw/workspace/harness-engineering.

I ran the required checks:

make check
PYTHONPATH=src python3 -m harness_engineering.cli doctor

Those passed.

make check completed successfully, including the secret scan.
doctor returned status: ok against the repo’s configured local OpenAI-compatible endpoint using model gemma4 at http://192.168.0.16:8080/v1.

I then exercised the live policy logic in two concrete ways using the repo code.

Example 1: allowed write under the harness run root

Using the live HarnessRunner, default registry, and default PolicyEngine, I created a run that paused for approval and then completed successfully.

That run was:

run ID: 394c54fa-12e1-4516-8ee6-43cedb3cccd6
initial status after start: waiting_approval
final status after approve/resume: completed

The saved pending action included:

action: finalize_report
action_category: filesystem_write
proposed_output_path: /home/james/.openclaw/workspace/harness-engineering/.runs-article-success/394c54fa-12e1-4516-8ee6-43cedb3cccd6/final_report.md

The resulting trace summary showed:

policy.checks: 5
policy.denials: 0
latest decision for finalize_report with allowed: true
reason: Write targets are allowed under: /home/james/.openclaw/workspace/harness-engineering/.runs-article-success.
action-category counts including filesystem_write: 4
approval events present
run_completed present

That is the happy path the post needs to describe honestly. The write is risky, it is approval-gated, and it is also path-constrained.

Example 2: denied write outside allowed roots

The repo also includes a restrictive sample policy at sample_data/policy/restrictive.json:

{
  "version": 1,
  "default_allowed_write_roots": ["/tmp/definitely-not-the-harness-runs-dir"],
  "tool_policies": {
    "finalize_report": {
      "enabled": true,
      "action_category": "filesystem_write",
      "allowed_output_roots": ["/tmp/definitely-not-the-harness-runs-dir"]
    }
  }
}

Using that policy with the live runner, I got a real denial:

run ID: 197ecaec-37eb-4efc-bffe-1d70f48095da
final status: failed
pending action: None

The latest persisted PolicyDecision for finalize_report was:

allowed: false
action_category: filesystem_write
reason: the write target under .runs-article-denied/.../final_report.md was outside the allowed root /tmp/definitely-not-the-harness-runs-dir

The trace summary for that run reported:

policy.checks: 4
policy.denials: 1
latest event: policy_denied
no final report path

That is the more important example, frankly. It proves the repo is not just decorating tools with labels. It is actually refusing an out-of-policy side effect.

Why this matters for “sandboxing” discussions

The MCP specification is quite explicit that tool use introduces real security and trust concerns, that tool descriptions should be treated cautiously, and that hosts should provide clear authorization flows and user control. But MCP is a protocol surface, not a full runtime safety system. It standardizes how capabilities are exposed; it does not by itself enforce durable approvals, write boundaries, or resumable policy state.

That is exactly the gap this repo is trying to illustrate.

Similarly, frameworks like LangGraph emphasize durable execution, human-in-the-loop control, and persistence, while workflow systems like Temporal frame durable execution around explicit state progression and replay. Those ecosystems are much broader than this demo, but they point in the same direction: reliable systems depend on runtime boundaries and durable control state, not just on convenient tool APIs.

In this repo, the policy layer is still small. Good. It still demonstrates three important engineering habits:

classify actions explicitly
evaluate policy before execution
persist the decision as part of run history

That is already more useful than vague claims about “safe agents.”

The CLI and inspection surfaces matter too

A policy boundary is much more credible when operators can inspect it directly.

The repo now adds cmd_policy() in src/harness_engineering/cli.py, exposed as:

PYTHONPATH=src python3 -m harness_engineering.cli policy --pretty

That prints the effective policy, including:

policy version
active policy file if one is used
store root
default allowed write roots
tool policies with action categories

This is complemented by two other surfaces.

First, tool_to_mcp_descriptor() in src/harness_engineering/mcp.py now exports meta.actionCategory alongside meta.risky. That means the harness’s internal safety vocabulary also appears in the MCP-style tool descriptor surface.

Second, build_trace_summary(state) in src/harness_engineering/tracing.py now reports policy counts and the latest policy decision. RunStore.save() in src/harness_engineering/store.py persists those summaries to trace_summary.json on every save.

That combination matters. Policy is no longer hidden inside runner logic. It is visible in descriptors, traces, summaries, and CLI output.

A subtle bug this repo change had to fix

One of the more interesting real bugs here was not about denial logic. It was about durability.

When I first added custom policy loading, start could create a run with a policy file, but approve and resume rebuilt the runner with defaults. That meant a run could begin under one policy and resume under another. For a post about isolation, that would have been unacceptable.

The fix is in src/harness_engineering/cli.py:

_build_runner() constructs a runner from a runs directory plus either a policy file or saved policy config
_build_runner_for_existing_run() reloads the saved run state and reconstructs the runner using the policy stored in state.artifacts["policy"]

Then cmd_approve() and cmd_resume() use _build_runner_for_existing_run().

This is exactly the kind of bug harness engineers should care about. Safety rules are part of execution state. If they are not preserved across resume, they are not real runtime constraints.

What the demo proves

1. Harness-level policy is a meaningful isolation layer

This repo still runs on the local machine and writes local files. But it now proves that a harness can impose explicit action categories and hard path constraints before side effects happen.

2. Approval and policy are different controls

The new flow shows that a risky action can require human approval and still be denied by policy. Those are separate mechanisms and should stay separate.

3. Policy decisions should be durable artifacts

The repo records policy decisions in state.artifacts["policy_decisions"], emits policy_checked and policy_denied trace events, and includes policy data in compact trace summaries. That is the right pattern for auditability.

4. Resume correctness is part of the safety story

The _build_runner_for_existing_run() fix proves an important operational point: if policy is configurable, the harness must reload the same effective policy when continuing a run.

What it still does not solve

This section matters more than the happy path.

1. This is not an OS sandbox

The repo does not provide container isolation, seccomp, namespace isolation, subprocess jails, VM boundaries, or network egress controls. If you need those, you need lower-level system controls in addition to harness policy.

2. The policy model is still narrow

Today the meaningful hard check is write-path enforcement for filesystem_write. There is no first-class policy for network destinations, subprocess execution, credential scope, per-user permissions, or time/resource quotas.

3. Action categories are hand-authored

The taxonomy is explicit, which is good, but still small and manually assigned. Real systems often need richer capability modeling and stronger review around tool registration.

4. The local-model reviewer path still has a known weakness

During verification, the repo-local OpenAI-compatible model connectivity was healthy, but a live local-model run still exposed the current reviewer fragility around structured JSON output. That is an honest limitation of the demo. For the policy examples above, I used the live repo code with deterministic mock behavior to get reproducible approval and denial traces.

5. There is no secret or credential isolation per tool

The harness can say where a file may be written. It cannot yet say which credential a tool may use, or prevent a broadly privileged tool from seeing too much ambient authority.

Practical takeaway

When people say “safe agent execution,” I think the useful first question is not “which model?” It is:

what are the action categories?
where are the risky boundaries?
what is structurally forbidden?
where are denials recorded?
do those rules survive pause and resume?

This repo now has a concrete answer for one important class of side effect: filesystem writes.

That is not the end state. It is the start of a real one.

A lot of agent demos jump directly from tool calling to grand claims about autonomy. I trust systems more when they first show me something simpler and stricter:

classify the action
check the rule
deny the bad path
persist the evidence

That is not glamorous. It is harness engineering.

References

67 AI Lab, harness-engineering repository: https://github.com/67ailab/harness-engineering
Model Context Protocol specification (2025-03-26): https://modelcontextprotocol.io/specification/2025-03-26
LangGraph overview: https://docs.langchain.com/oss/python/langgraph/overview
OpenAI Agents SDK documentation: https://openai.github.io/openai-agents-python/
OpenTelemetry documentation, “Traces”: https://opentelemetry.io/docs/concepts/signals/traces/
Temporal documentation, “Workflow Execution”: https://docs.temporal.io/workflow-execution

What changed in the repo since the previous post#

The new policy surface in the live repo#

Where enforcement actually happens#

Real examples from the live repo#

Example 1: allowed write under the harness run root#

Example 2: denied write outside allowed roots#

Why this matters for “sandboxing” discussions#

The CLI and inspection surfaces matter too#

A subtle bug this repo change had to fix#

What the demo proves#

1. Harness-level policy is a meaningful isolation layer#

2. Approval and policy are different controls#

3. Policy decisions should be durable artifacts#

4. Resume correctness is part of the safety story#

What it still does not solve#

1. This is not an OS sandbox#

2. The policy model is still narrow#

3. Action categories are hand-authored#

4. The local-model reviewer path still has a known weakness#

5. There is no secret or credential isolation per tool#

Practical takeaway#

References#