Security, Auth, and Policy in Agent Harnesses

The live demo repo for this series is 67ailab/harness-engineering, and for this post I did change the repo before publishing. The new repo commit is 3f2ec5d, which adds a checked-in baseline policy file at policy/default.json and tightens PolicyEngine so relative policy paths resolve from the policy file location rather than from the caller’s current working directory.

That sounds like a small change. It is small in lines of code. It is not small in meaning.

Security failures in agent systems usually do not come from one dramatic exploit. They come from vague boundaries:

a model can write somewhere you did not mean it to write
a provider key leaks from local config into a public repo
a policy file behaves differently depending on where the operator runs the command
a “safe” tool becomes unsafe because nobody defined its action category clearly
a local demo quietly depends on a model endpoint that was never actually verified

Those are harness problems.

That is why Post 11 is not about benchmark red-team theater or generic zero-trust slogans. It is about the boring but decisive mechanics around an agent loop: policy config, write boundaries, provider auth handling, secret hygiene, and verification.

In this repo, the relevant code now lives in:

src/harness_engineering/policy.py
src/harness_engineering/tools.py
src/harness_engineering/runner.py
src/harness_engineering/provider.py
src/harness_engineering/cli.py
scripts/secret_scan.py
policy/default.json
tests/test_harness.py

And I verified the repo state before writing this post with:

cd /home/james/.openclaw/workspace/harness-engineering
make check
PYTHONPATH=src python3 -m harness_engineering.cli doctor

For this run, make check passed, the secret scan passed, and doctor successfully reached the repo-local OpenAI-compatible endpoint and returned MODEL_OK for gemma4.

What changed in the repo since the previous post

Post 10 added explicit planner, executor, and reviewer handoffs. That made the harness more inspectable. Post 11 needed a different kind of inspectability: not who handed work to whom, but what the runtime is allowed to do at all.

The repo already had a meaningful security foundation before this run:

Tool.action_category in src/harness_engineering/tools.py
PolicyEngine.evaluate() in src/harness_engineering/policy.py
pre-execution policy checks in HarnessRunner._execute() in src/harness_engineering/runner.py
repo-local provider loading via load_dotenv() and load_model_config() in src/harness_engineering/provider.py
doctor_check() in src/harness_engineering/provider.py
.env ignore rules in .gitignore
secret scanning in scripts/secret_scan.py

But there was still an honest gap between “security logic exists” and “this repo has a policy configuration story that an operator could actually use.”

The missing piece was a first-class policy artifact.

So I added policy/default.json as a checked-in baseline policy, documented it in README.md, and fixed PolicyEngine so relative roots in policy files are resolved against the policy file directory. That last detail matters more than it sounds. If the meaning of a policy file changes depending on the shell cwd, then the policy is not really stable configuration. It is ambient behavior.

Security in an agent harness is mostly boundary work

I think people often compress “agent security” into one bucket, but there are really at least four different concerns in this demo:

Policy — what actions are allowed at all?
Approval — what actions require a human before execution?
Auth and provider config — how does the harness obtain model access safely?
Secret hygiene — how do we avoid leaking credentials into git and artifacts?

The repo addresses each one separately, and that separation is healthy.

Approval is not the same as policy. A risky action can be policy-allowed and still require explicit approval. Auth is not the same as policy either. You can have a perfectly configured API key and still have a badly bounded runtime. And secret hygiene is not the same as auth. A key can work fine technically while still being handled recklessly.

This is exactly the sort of separation that harness engineering is supposed to force.

The policy layer: simple, explicit, and local

The core policy abstraction is PolicyEngine in src/harness_engineering/policy.py.

The critical methods and types are:

PolicyEngine.from_file()
PolicyEngine.describe()
PolicyEngine.evaluate()
default_policy_config()
load_policy_file()
PolicyDecision

The model here is intentionally small.

The harness does not attempt OS sandboxing. It does not manage syscall profiles. It does not isolate network egress. Instead, it does something narrower and more demonstrable:

each tool has an explicit action_category
risky write behavior is classified as filesystem_write
write targets are checked against allowed roots
denials are persisted in trace and summary artifacts

That categorization starts in src/harness_engineering/tools.py, where default_registry() registers the tools and marks finalize_report as both risky and filesystem_write:

search_mock → read_only
extract_facts → transform
draft_report → model_generation
finalize_report → filesystem_write

I like this design because the category is attached to the tool definition itself, not inferred from vague runtime guesses.

Then HarnessRunner._execute() in src/harness_engineering/runner.py calls self.policy.evaluate(tool_name, kwargs) before tool execution. If the action is denied, the run fails cleanly, the denial is traced, and the decision is persisted. That means policy is not a README promise. It is part of execution state.

Why `policy/default.json` matters

Before this run, the repo had sample_data/policy/restrictive.json, which was useful as a denial example. But a denial example is not the same thing as a real baseline configuration.

Now the repo also has:

policy/default.json

That file makes the policy surface concrete and operator-visible. It says, in data rather than just code:

which tools are enabled
which action category each tool belongs to
which output roots are allowed for writes

That sounds mundane, but mundane is exactly what you want from a policy artifact. Policy should be readable, boring, reviewable, and diffable.

The more interesting improvement is the relative-path fix in PolicyEngine._resolve_policy_path().

Previously, a relative path such as .runs would resolve from the current working directory. Now it resolves from self.config_base_dir, which is derived from the policy file path. PolicyEngine.describe() also exposes both configured roots and resolved roots, making it easier to inspect what the runtime actually thinks the policy means.

That is a subtle but important hardening move. Configuration files should be portable artifacts, not traps.

tests/test_harness.py now includes test_policy_file_relative_roots_resolve_from_file_location, which verifies exactly that behavior.

Approval is adjacent to policy, not a substitute for it

This repo continues to model approval separately from policy, and I think that is correct.

In HarnessRunner.run_until_pause_or_complete(), the harness evaluates the would-be finalize_report write before the approval gate. If the write target falls outside allowed roots, the run fails immediately. If the write is policy-allowed, the harness creates a structured pending action and pauses in waiting_approval.

So the sequence is:

policy check says whether the action is allowed at all
approval gate says whether the human authorizes this particular execution

That is a better model than treating approval as the only guardrail. If something is disallowed by policy, there should not even be an approval UI for it.

This separation also matches MCP’s security guidance reasonably well. The MCP tools specification explicitly says tools are model-controlled, but also says there should be a human in the loop with the ability to deny invocations and clear UI around tool exposure and confirmations. Protocols can standardize tool shape. They do not replace harness policy.

Provider auth handling: local, explicit, and testable

The provider/auth side of the repo lives in src/harness_engineering/provider.py.

The important functions are:

load_dotenv()
load_model_config()
create_client_from_env()
doctor_check()
OpenAICompatibleClient.list_models()
OpenAICompatibleClient.chat()

What I like here is not sophistication. It is legibility.

load_model_config() prefers repo-local HARNESS_* variables such as:

HARNESS_MODEL_PROVIDER
HARNESS_MODEL_NAME
HARNESS_OPENAI_BASE_URL
HARNESS_OPENAI_API_KEY

and falls back to broader environment names like MODEL_PROVIDER and OPENAI_API_KEY.

That preference order is a practical choice. It reduces accidental collisions with other shell-level credentials on a workstation or CI environment.

Then doctor_check() validates two things before you trust local-model behavior:

can the endpoint answer GET /models?
can it answer a minimal /chat/completions request correctly?

That is exactly what the repo’s article-writing workflow asked for, and it is the right level of proof for this demo. In this run, doctor returned:

provider: openai_compatible
model: gemma4
status: ok
message: MODEL_OK

That matters because too much agent writing quietly assumes a model backend exists and works. If a post depends on local-model behavior, the harness should show a real connectivity check.

For external reference, this shape is aligned with the common OpenAI-style pattern of a model listing endpoint plus chat completions over a list of messages. The repo is not claiming full vendor compatibility across every edge case. It is implementing the small slice it actually uses.

Secret hygiene: simple controls beat wishful thinking

The repo’s secret posture is intentionally basic, but real:

.gitignore excludes .env and .env.* while allowing .env.example
.env.example contains placeholders only
scripts/secret_scan.py scans tracked files for obvious API key patterns
make check runs both tests and secret scanning before a push

This is not a vault. It is not a centralized secrets platform. But it does reflect sound practice.

OWASP’s Secrets Management Cheat Sheet stresses centralization, standardization, access control, and automation of secret handling. This repo does not solve all of that, but it does at least avoid the most embarrassing failure mode for a public demo project: committing real credentials by accident.

That is the right standard for a teaching repo. Not complete enterprise secret governance, but evidence that the maintainers are not sleepwalking into leakage.

What the demo proves

This demo now proves a few things that are worth taking seriously.

1. Policy can be a real runtime primitive in a small harness

You do not need a giant platform to get value from policy. A tiny harness can still:

classify actions explicitly
deny unsafe writes before execution
persist policy decisions for later inspection
distinguish policy denial from approval withholding

2. Configuration details are part of security, not documentation garnish

The policy/default.json addition and relative-path fix are a good example. Security often depends on boring interpretation rules. If configuration is ambiguous, operators will eventually deploy the wrong thing.

3. Local-model auth should be verifiable, not assumed

The combination of repo-local HARNESS_* variables and doctor_check() gives the demo a credible local-provider story. That is much better than claiming “supports local models” without a health check.

4. Public demo repos can practice decent secret hygiene without overengineering

Ignoring .env, keeping .env.example clean, and running a secret scan before push is not glamorous. It is still real engineering discipline.

What it still does not solve

This is the important section.

The repo is better now, but it is still a small harness demo. It does not solve:

network egress control
subprocess sandboxing
container or VM isolation
per-user identity and authorization policy
scoped credential minting
secret rotation or revocation workflows
audit-grade policy administration
supply-chain controls on tool implementations
SSRF protection for arbitrary network tools
host-level enforcement outside the Python process

It also does not model the nastier parts of real agent security, such as prompt injection across untrusted tools, exfiltration attempts through secondary channels, or capability partitioning across multiple trust zones.

So no, this repo is not a secure agent platform. It is a practical demonstration that harness-level policy and auth hygiene are concrete engineering work, not vibes.

Honest limitations

I have three main reservations about the current design.

First, the policy model is still tied mostly to filesystem writes. That is useful, but incomplete. As soon as the harness gains networked tools, subprocess tools, or external side effects, the category model needs to widen.

Second, load_dotenv() is intentionally tiny. That keeps the repo dependency-light, but it also means the parser is not trying to be a full-featured dotenv implementation. For a demo, fine. For a broader system, I would want stricter config handling and clearer validation errors.

Third, the secret scan is pattern-based. Pattern scans are good tripwires, not guarantees. They reduce obvious mistakes; they do not prove the absence of secrets.

Still, I prefer this repo with these modest controls over a more impressive-looking repo with none of them.

The broader lesson

The broader lesson of Post 11 is that security in agent systems is mostly about making boundaries explicit enough that operators can reason about them.

That means:

explicit tool categories
explicit risky actions
explicit policy files
explicit provider config precedence
explicit health checks
explicit secret hygiene steps

The industry loves to talk about agent autonomy. I think most teams would be better served by talking about agent controllability.

If your harness cannot answer simple questions like these, it is not ready:

What can this tool write, exactly?
Which config file defines that boundary?
How is that config resolved?
Which environment variables actually win?
Did we verify the model endpoint before relying on it?
What prevents a credential from ending up in git?

This repo can now answer those questions more cleanly than it could yesterday. That is the kind of progress I trust.

References

Live repo: https://github.com/67ailab/harness-engineering
MCP tools specification: https://modelcontextprotocol.io/specification/2025-06-18/server/tools
OWASP Secrets Management Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html
OpenAI API docs, models overview: https://developers.openai.com/api/docs/models
OpenAI API docs, chat completions overview: https://developers.openai.com/api/reference/chat-completions/overview

What changed in the repo since the previous post#

Security in an agent harness is mostly boundary work#

The policy layer: simple, explicit, and local#

Why policy/default.json matters#

Approval is adjacent to policy, not a substitute for it#

Provider auth handling: local, explicit, and testable#

Secret hygiene: simple controls beat wishful thinking#

What the demo proves#

1. Policy can be a real runtime primitive in a small harness#

2. Configuration details are part of security, not documentation garnish#

3. Local-model auth should be verifiable, not assumed#

4. Public demo repos can practice decent secret hygiene without overengineering#

What it still does not solve#

Honest limitations#

The broader lesson#

References#