Automation · PRJ-005

Agentic Orchestration & Operations Platform

Two agents, two machines, one operator — with a firewall off-the-shelf assistants can’t enforce.

~$35/mo

All-in AI cost for both agents + adjacent automation

100%

Eval pass rate — firewall, entity & coercion tests

p95 2.1s

Conversational latency (p99 4.8s)

The Problem

The operator runs across several legal entities at once: a senior salaried role at a sensitive employer, a personal consulting company with side ventures, interests in two partnerships, and an early-stage stealth product. Each has its own legal structure, tax treatment, partners, and operational posture.

Holding all of that in one head, a notes app, or a single ChatGPT thread does not work. Information leaks across boundaries that legally should not cross, decisions get missed, and personal life, business operations, and sensitive employer content collide in ways that create real professional and legal exposure.

The pain point is not lack of AI access — it is that consumer assistants treat one human as one undifferentiated context. What is needed is structural separation between contexts, persistent state across sessions, proactive monitoring of the business, and an enforceable firewall preventing any discussion of sensitive employer content through privately controlled tools. Off-the-shelf assistants cannot do any of that.

The Architecture

Rather than one assistant with modes, the system is two named agents on two machines, each with its own enforced role, identity, and security boundary. A personal agent runs locally on the operator’s laptop; a business agent runs 24/7 on a cloud VPS as a meta-agent sitting above Paperclip, a multi-agent orchestration layer of subordinate agents. Both boot from version-controlled markdown workspaces injected into context every turn so the rules cannot drift, communicate only over a tamper-evident Dispatch Layer, and run entirely on a private Tailscale mesh — with four isolated domains enforced at three independent layers.

Two agents, two machines

The personal agent lives on a laptop because personal context belongs on a device the operator physically controls; the business agent lives in the cloud because 24/7 availability and root on a Linux box are the real requirements of an operations agent. Splitting them removes a single point of failure and prevents commingling data that legally and operationally must stay separate.

Workspace-as-identity

Each agent boots from eleven plain-markdown files (~470 lines) encoding identity, personality, the user profile, the entities under management, hard security rules, an org chart, tools, bootstrap behavior, heartbeat cadence, and append-only curated memory. The files are injected into context every turn so rules cannot drift, and because they are markdown they are diffable, git-versioned, and portable if the framework is ever replaced.

Four-domain isolation model

The system is partitioned into four domains — Personal, Business, Commission, and Employer — isolated at three independent layers: instructional (hardcoded rules auto-injected every turn, with a 20+ trigger employer firewall and enumerated coercion patterns refused), database schema (per-domain namespaces under Postgres row-level security), and the dispatch protocol (typed messages that carry and check domain tags). Because the three layers are independent, a breach requires all three to fail simultaneously — defense in depth, not a single guard.

Semantic memory with row-level security

pgvector inside the existing Postgres provides per-agent namespaced memory. Every document is tagged with an entity classifier (personal, consulting, partnership_a, partnership_b, stealth_product, or employer_blocked), and Postgres RLS enforces access at query time — the business agent reaches the personal namespace only via explicit grants, and employer-tagged content is readable by neither agent under any condition. Retrieval is top-k cosine similarity over 800-token chunks, injected at query time.

Paperclip — multi-agent orchestration

Beneath the business agent, Paperclip provides multi-agent orchestration — org charts of subordinate agents, scheduled heartbeats, per-agent budgets, and governance rules. The meta-agent sits above Paperclip rather than inside it, so it can supervise, approve, and override the org without being bound by the orchestration loop. The Commission is deliberately kept outside Paperclip entirely: it is multi-user, must stay portable, and runs on real-time interaction rather than Paperclip’s heartbeat cadence.

The Dispatch Layer

Cross-domain and inter-agent communication flows through a Dispatch Layer of typed JSON messages, each carrying a SHA-256 checksum chained to the previous message so the history is tamper-evident. Every dispatch is persisted to a PostgreSQL audit store and visualized in a Next.js audit dashboard — which, like every other service, is bound exclusively to the Tailscale interface.

Private mesh networking — Tailscale

All machines join a private Tailscale mesh — a zero-trust WireGuard network — and every service binds exclusively to the Tailscale interface rather than a public one. There is no public attack surface to harden: a device must be an authenticated member of the mesh before it can reach any service at all.

Observability & evaluation

Every turn writes a structured event to an agent_traces table (model, tokens, tool calls, cost, latency, finish reason); Grafana renders per-agent cost, token consumption, p50/p95/p99 latency, and tool-call frequency, with webhook alerts at 80% of each agent’s monthly cap. A golden set of 40 prompts per agent — refusals, entity fidelity, coercion resistance — runs on every change to the security and identity files and weekly, and regressions block deployment.

Cost-engineered model tiering

A smart model handles conversational reasoning and tool use, a cheap model runs the scheduled heartbeats (where ~90% of replies are a no-op and the agent stays silent), and a premium model sits behind a slash command for explicit deep-reasoning sessions but is never the default. With prompt caching, each agent runs $10–15/month.

Key Decisions & Tradeoffs

The reasoning behind the build — and what each choice cost.

Two agents on two machines, not one agent with two modes

Why

Personal context belongs on a device the operator controls; the business agent needs 24/7 uptime and root on a Linux box. One process would be a single point of failure and would commingle data that must stay legally separated.

Tradeoff

Two deployments, two daemons, and no shared state — the human chooses which agent to address.

Reject the intelligent router we prototyped

Why

A front-door classifier that auto-dispatches “personal vs business” is convenient, but the employer firewall is the highest-priority boundary and a router becomes a single point of failure for it — one misclassification routes blocked content to an agent not designed to refuse it. It also costs a model call in latency and adds a failure mode, all to save one tap.

Tradeoff

The operator picks the right chat surface manually — one tap of friction for zero risk of cross-context contamination. The general lesson: when a wrong route is expensive, human intent beats a model classifier.

API-key auth, not OAuth

Why

The provider’s April 2026 policy permits OAuth tokens only for its native clients, so a third-party agent harness on OAuth risks suspension; API keys are fully sanctioned for agent workloads, and separate keys per agent give per-agent billing visibility.

Tradeoff

Manual key management instead of a managed OAuth flow.

An existing agent framework over a hand-built loop

Why

The framework provides the gateway, daemon, TUI, workspace injection, heartbeat scheduler, model-fallback chain, and slash-command model switching out of the box — weeks saved versus building those primitives.

Tradeoff

Documented and accepted: a bug in the framework’s custom-provider response path forced a mid-build migration from a self-built LLM proxy to the framework’s native first-party provider integration.

pgvector inside the existing Postgres, not a new vector DB

Why

Reusing the running Postgres for semantic memory avoids introducing another database while still getting per-agent namespaces, RLS-enforced entity boundaries, and fast retrieval.

Tradeoff

Couples memory to the primary database and leans on RLS policy discipline as the boundary — verified with unit tests and a red-team prompt set.

Workspaces as plain markdown, not database rows

Why

Markdown is readable, diffable, git-version-controllable, editable from any machine, and survives framework changes — making the workspace directory the single source of truth for each agent’s identity.

Tradeoff

Identity lives in files that must be kept in sync and protected, rather than in a managed store.

A private Tailscale mesh, not public exposure with firewall rules

Why

Every service binds exclusively to a zero-trust WireGuard mesh, so there is no public endpoint to attack — a device must be an authenticated mesh member before it can reach anything, which is far stronger than hardening public ports with firewall rules.

Tradeoff

Tailscale must be installed and authenticated on every device that needs access, and the mesh becomes a dependency for all connectivity.

The meta-agent sits above Paperclip, and The Commission stays outside it

Why

Keeping the meta-agent above the orchestration layer lets it supervise, approve, and override the subordinate org without being bound by the heartbeat loop. The Commission is excluded from Paperclip entirely because it is multi-user, must stay portable, and runs on real-time rather than heartbeat interaction.

Tradeoff

Two coordination models to reason about — the meta-agent’s supervisory layer and Paperclip’s orchestration loop — rather than one uniform hierarchy.

Tamper-evident dispatch with chained checksums, not plain message passing

Why

Typed JSON messages each carry a SHA-256 checksum chained to the previous one and are persisted to a Postgres audit store, so any alteration to inter-agent history is detectable after the fact — essential when agents act across legally separated domains.

Tradeoff

Extra serialization, hashing, and storage on every inter-agent message versus fire-and-forget passing.

Enforce domain isolation at three independent layers, not one

Why

Isolation is enforced instructionally, in the database schema (RLS), and in the dispatch protocol at once, so a breach requires all three to fail simultaneously — the highest-priority boundary should never rest on a single mechanism.

Tradeoff

More moving parts to keep consistent across three layers, and changes to the domain model must be made in all three places.

Outcome & Performance

Both agents are live in production. The personal agent boots at 19K tokens of a 128K window; the business agent at 17K of 200K — and both correctly identify their role, the principal, the entity boundaries, and their sister-architecture on the first turn of every session. The cloud agent runs 24/7 under systemd auto-restart with zero unplanned downtime.

Cost is engineered, not hoped for. Each agent runs $10–15/month with prompt caching, under $35/month all-in including adjacent automation, against a hard $50/agent ceiling with an 80% webhook alert and zero overruns to date. Splitting smart-model conversations from cheap-model heartbeats alone saves roughly $30/month per agent.

The boundaries hold under test. The eval harness runs 40 prompts per agent weekly and on every workspace change at a current 100% pass rate across firewall, entity-separation, and coercion-resistance categories; RLS overhead is under 5ms per query and has survived a red-team prompt set. Operationally, p95 conversational latency is 2.1s (p99 4.8s), top-k retrieval over a 12,000-document corpus runs ~80ms, and the heartbeat signal-to-noise ratio (1:14 business, 1:22 personal) confirms the agents stay quiet unless something material needs the principal.

Capabilities

Multi-agent architectureNode.js 22 / 24Agent framework + daemonPostgreSQL · pgvectorRow-level securityAnthropic models (tiered)Grafana · Pino observabilityEval harness (golden set)Paperclip orchestrationTailscale (WireGuard mesh)Tamper-evident dispatchsystemd · LaunchAgent · PM2Telegram surfaces