Dominic Plouffe (CTO)

Big data + agents. Less hype, more systems.

Blog

Agent design patterns for production
TL;DR:

Build agents by composing small, well-instrumented capabilities, enforcing runtime controls, and designing for predictable state transitions. These patterns reduce surprises in production and make agents auditable, debuggable, and cost-effective.

Intro

CTOs evaluating agent deployments quickly run into two recurring realities: agents are powerful automation that can dramatically reduce manual toil, and they are a new class of system with emergent failure modes. Successful production agents aren’t just “smart” — they are engineered with patterns that make them observable, bounded, and interoperable with existing operational practices. This article distills pragmatic design patterns I’ve seen work across cloud orchestration, customer service automation, and payment platforms, and shows how those patterns translate into measurable operational improvements.

Design for composable, bounded capabilities

Start by decomposing an agent into narrowly-scoped capabilities (skills, workers, or modules). Each capability exposes a small, well-documented interface: inputs, outputs, preconditions, expected side effects, and idempotency properties. This sounds obvious, but teams often hand agents unrestricted access to resources and then wonder why debugging is impossible.

Key design choices here are explicit schemas and idempotent actions. Use typed APIs or JSON schemas for every action the agent can request. Require an explicit “dry-run” flag for potentially destructive operations so you can validate plans before execution. Enforce idempotency by design: if an agent retries “create-user,” the back-end responds with a stable result rather than creating duplicates.

Benefits: smaller blast radius, simpler testing, and easier role-based permissions. In practice, teams replacing monolithic agents with composable capabilities saw mean time to resolution for agent-induced incidents fall by 30–60% because root causes were isolated to a single module.

Operational controls: policies, limits, and circuit breakers

Production agents must operate under operational constraints. Explicitly encode runtime policies: quota limits, cost budgets, concurrency caps, and real-time safety checks. These controls should be enforced outside the core reasoning loop so they cannot be bypassed by a clever plan. In other words, separate “what to do” (the plan) from “whether to do it” (the policy enforcement layer).

Circuit breakers and throttles are simple but effective. A circuit breaker can open when an upstream service has elevated error rates, forcing the agent to shift to a degraded mode (notify humans, stash the plan, or run a read-only analysis). Throttling helps control cost and latency: agents that invoke downstream ML models or third-party APIs should be able to back off based on budget signals.

Evidence: one cloud-ops team added a cost-aware throttle and saw API spend from automated remediation drop 22% without losing remediation coverage. Another team reduced the frequency of runaway reconcilers by instituting a simple retry budget per incident, turning unpredictable spikes into bounded queues.

Observe, record, and enable deterministic replay

Agents make decisions; production teams must be able to replay those decisions to understand what happened and why. Instrument every decision point: inputs, intermediate reasoning states, deterministic plan fragments, and final execution calls. Store those artifacts in a compact, structured format so you can reconstruct a run without storing raw model outputs or full logs.

Deterministic replay is not about believing the agent is always right; it’s about making debugging tractable. Keep a canonical “run record” that contains the request, the validated plan, policy evaluations, and the execution log with timestamps. Add a replay mode that replays the plan in a sandboxed environment (dry-run) and verifies that the same sequence of validations and guardrails triggers the same outcomes.

Practical wins from this pattern include faster incident postmortems and safer rollbacks. Teams that implemented structured run records cut postmortem analysis time in half and were able to add unit-like regression tests that reproduce prior failures.
- Make capabilities small and well-typed; enforce idempotency and dry-run support.
- Separate planning from policy enforcement; implement quota, cost, and safety checks externally.
- Add circuit breakers and retry budgets to bound failure modes.
- Record structured run artifacts that enable deterministic replay and automated tests.
- Design interaction models that prefer human escalation over irreversible actions in ambiguous situations.
- Use canaries and progressive rollouts for new capabilities; monitor business SLOs as well as technical metrics.
March 16, 2026
Daily digest: Agents, Systems, and Mental Models (auto)
Daily digest: Agents, Systems, and Mental Models (auto)
Daily digest: Agents, Systems, and Mental Models (auto)

Intro

As CTOs you are being asked to make fast, high-stakes decisions about agentic AI, orchestration platforms, and how to operationalize mental models across engineering teams. The technology is moving from research demos into production pipelines: models that can plan, delegate, and act autonomously are now tools for customer support, SRE automation, sales assistance, and data pipelines. This shift is not incremental. It changes the unit of composition from “models” to “systems of agents” and therefore demands a different set of mental models, metrics, and controls.

This digest lays out a concise, evidence-based framework you can use this week: (1) distinguish agents from systems, (2) harden your engineering mental models, and (3) map concrete operational controls. I’ll provide two short examples of concrete trade-offs we see in production and a compact checklist you can use in design reviews.

Agents vs. Systems: change the primitives you reason about

An “agent” in contemporary usage is a model plus behavior: planning, state, and the ability to take actions (API calls, SQL, code changes, messages). A “system” is an arrangement of agents, data stores, procedural controls, and human feedback loops. Treating agents as isolated components and plugging them into existing microservices is a mistake; engineers must instead design interactions, incentives, failure modes, and observability at the system level.

Evidence from early deployments shows that failures are rarely caused by a single agent model making a bad prediction. Instead, failures arise from emergent interactions: misaligned prompts, cascading retries, racing updates to shared state, or feedback loops that reinforce incorrect behavior. In other words, the right mental model is not “improve the model” but “control the interaction graph.”

Mental models that matter for CTOs

Adopt three engineering mental models immediately: control loops, cost-of-error classification, and epistemic boundaries. These models are lightweight, actionable, and map directly to product and risk decisions.

Control loops: Think in terms of closed loops with explicit sensors, actuators, and latency budgets. Where is your agent observing the world (logs, telemetry, user inputs)? What actions can it take, and how quickly? If your agent can write database rows, trigger transactions, or deprovision servers, put rate limits, timeouts, and human-in-the-loop gates in those loops. Empirical deployments demonstrate that short, well-instrumented loops reduce the mean time to detect and remediate agent-induced outages.

Cost-of-error classification: Not all errors are equal. Classify actions by their potential for irreversible harm (financial loss, privacy exposure, regulatory breach). For low-cost actions, aggressive automation and learning-from-feedback are acceptable. For high-cost actions, require explicit authorization, multi-party confirmation, or conservative rollbacks. This is where SRE practices (blast-radius control, canaries, circuit breakers) translate directly to agentic systems.

Epistemic boundaries: Understand where your models are competent and where they extrapolate. Maintain a runtime “competence map” that records which data domains, intents, and user segments a model is validated for. When agents operate outside their competence, they should either degrade gracefully (e.g., defer to a human) or expose uncertainty in structured ways that downstream systems can act on.

Practical controls and architecture checklist

Below is a concise checklist you can use in design reviews to decide whether an agent, a system redesign, or policy work is the primary mitigation. These are distilled from real production incidents and controlled experiments in enterprise deployments.
- Define action classes and attach a risk tier to each (read-only, write-safe, write-persistent, irreversible).
- Instrument every action with a request ID and provenance metadata (model version, prompt, agent plan steps).
- Enforce rate limits, quotas, and canary windows per tenant and per agent.
- Design explicit human-in-the-loop gates for tiered-risk actions with auto-escalation paths.
- Maintain a competence map and require fallback strategies when confidence < threshold.
- Implement observability for interaction graphs: call graphs, state stores, and external side-effects.
- Use replayable logs and deterministic replays for post-mortem analysis.
These controls are not optional add-ons; they are the minimal scaffolding that transforms brittle proofs-of-concept into reliable services.

Two short examples help ground the trade-offs.

Example 1 — the autonomous customer support agent: A team deployed an agent that could read a support ticket, query account data, and issue refunds up to $50. Initially, it reduced resolution time by 60%. After a change in the billing ledger schema, the agent misread a promotional credit as a refundable balance for a subset of users, issuing thousands of low-value refunds before detection. Root cause analysis showed three failures: schema assumptions in prompts, missing provenance on transactions, and lack of a write quota. Fixes were straightforward and immediate: add schema-aware adapters, attach provenance metadata to every refund, and move refunds into a canary mode requiring human approval above $10 until automated tests passed on the new schema.

Example 2 — the SRE automation agent: An on-call automation agent was allowed to restart failed services autonomously. It correctly restarted many instances and reduced morning wake-ups. During a latency spike, the agent retried restarts concurrently across a cluster causing a traffic storm and a cascading failure. The mitigations were to add a global circuit breaker, introduce jittered
March 14, 2026
Tiny Experiments to Reclaim Your Focus
Design tiny experiments to reclaim focus

Most productivity advice asks for dramatic overhauls: new tools, new schedules, or a complete life reboot. That’s seductive because big changes feel decisive. But they’re also fragile and hard to sustain. There’s a quieter option that’s both easier to start and easier to keep: tiny experiments—small, time-boxed adjustments you run long enough to learn from, not long enough to make you miserable.

Why tiny experiments work

Tiny experiments sidestep two fatal flaws of typical productivity fixes. First, they reduce friction: when the ask is small, you actually do it. Second, they convert hope into data: instead of a vague promise to “be more focused,” you have a measured change and a clear signal about whether it helped.

Think of them like short A/B tests for your life. You don’t commit to a permanent change; you try something for a week or two, measure a simple outcome, and decide. Over time, the compounding wins are enormous—because small wins stack and because you get better at designing meaningful experiments.

Pick the right levers

Not all experiments are created equal. To keep results readable and actionable, constrain your test along three axes:
- Time: How long is the experiment? For micro-experiments, pick 3–14 days.
- Scope: What exactly changes? Be specific—”no notifications on desktop until 10:00″ beats “fewer distractions.”
- Measurement: What’s your outcome metric? It can be subjective (daily focus rating) or objective (number of deep-work sessions completed).
Simple experiments you can run this week

Here are five low-friction tests that reliably surface useful signals.
- Notification quarantine: Silence email/mobile notifications until after a 90-minute morning block. Measure the number of uninterrupted work sessions and a one-line end-of-day note about task progress.
- Micro-break cadence: Take a 30–60 second physical break every 25 minutes. Track perceived energy and how many hours you felt “in the zone.”
- One-decision morning: Reduce morning choices—preset breakfast, clothes, and the first two tasks. Track decision fatigue and morning completion rate.
- Commit-and-cut: Promise to try a single focused task for 50 minutes, then immediately switch to 10 minutes of a different activity—no multitasking. Count how many intervals you finish.
- Email triage block: Batch email to two 20-minute slots per day. Record total emails processed and whether urgent items felt delayed.
How to measure without making it a second job

If measurement feels onerous, make the outcome simple and fast. Two approaches work well:
- One-line end-of-day: Each evening jot a single line: what went well, what didn’t. It takes ten seconds and is gold for pattern recognition.
- Binary signal: Did you complete the intended focus block? Yes/No. Tally across days to see trends.
Turn findings into durable improvements

After your test window, ask three questions: Did the change improve the chosen metric? Was it sustainable? What costs did it introduce? If a test passes on metrics and cost, scale it—extend the window or embed the habit by pairing the action with a stable cue (a calendar event, a specific time, the end of a meeting).

If it fails, treat the result as information. A failure might mean the change was the wrong lever, the measurement was off, or context (meetings, team rhythms) made it impractical. Learn and design a follow-up experiment that addresses the failure mode.

A retail analogy: trial sizes, not product swaps

In retail, a smart buyer tests product quantities in small batches—trial sizes—to learn demand before committing shelf space. Tiny experiments are the same: test small, learn quickly, and only when the signal is clear do you expand the commitment. This approach reduces regret and keeps your personal operating system nimble.

When experiments collide with team norms

Single-person hacks are easy; team changes are trickier. When your experiment affects others, start by making the ask explicit: explain the short trial, what you’ll measure, and when you’ll revert or commit. Most teammates are more tolerant of temporary changes if they understand the hypothesis and timeline.

Two quick templates to steal

Use these when you don’t have time to design an experiment from scratch.
- 90-minute morning focus: Mute notifications, do two prioritized tasks in a 90-minute block, one-line summary at noon. Run 7 days.
- Micro-break cadence: 25/5 work/break rhythm for 10 sessions per day, record energy at day’s end. Run 5 days.
What real progress looks like

It’s not a sudden doubling of output. It’s fewer days where you feel scattered, more afternoons where you can actually finish the hard thing, and a growing library of experiments that reliably tilt your weeks. Over months, you’ll notice the compounding effect: the practices that survive are ones you barely notice doing anymore.

Final prompt to run an experiment

Pick one test above. Set a 7-day window. Pick a single metric—completion yes/no or a one-line daily log. Run the experiment, then review what the data tells you. Design your next experiment from the evidence, not from good intentions.

Small experiments aren’t glamorous, but they’re how meaningful, lasting change happens. If you want one companion experiment to try this week, pick the notification quarantine: the payoff for a tiny ask is often surprisingly large.
March 13, 2026
Packaging Beats Peak Performance
Packaging beats peak performance: why open-source models stall at the doorstep

There’s a pattern I keep seeing: an engineering team furiously trains a model, they throw the weights up on a public registry, and then they wait. Silence. Not because the model is bad, but because the work that actually unlocks usage—distribution, inference packaging, and developer ergonomics—was left for later. Two recent stories brought the pattern into sharp relief: one about a large company putting together an opinionated, enterprise-focused agent platform, and another where a promising open model hit the wall because nobody could easily run it locally. Same problem, different faces: packaging, not pure performance, decides who gets adopted.

Adoption is a packaging problem: weights → packaging → runtime → developer UX → adoption.

Why packaging matters more than you think

Models are dense technical achievements, but they’re not products on their own. A model is a component. For a developer or product manager to treat it like a component they can actually use, three things need to happen:
- It must be accessible in the formats and runtimes the ecosystem already uses.
- It must be easy to evaluate cheaply and quickly.
- It must compose with toolchains—tokenizers, inference engines, tool calling frameworks—without a day-one deep dive.
The hidden checklist: why “best model” often loses to “best packaged”.

If any of those are missing, adoption stalls. People will swap to the slightly worse model that just works out of the box because velocity beats marginal quality improvement every time.

Two contrasting signals from the field

Big vendors packaging agents as a product

When incumbents decide to ship an agent platform, they don’t just release models. They bundle SDKs, observability, security integrations, deployment templates, and partner connectors. That’s not accidental. Enterprises care about operational risk: isolation, audits, rollout control, and predictable infra cost. What looks like a marketing move is often a rational answer to procurement and SRE requirements. The platform sells because it reduces the integration bill—and it can be opinionated about formats and runtimes, which makes it simpler for internal teams to adopt.

Open-source weights without the lift

Contrast that with a model release that drops weights in a single format and asks the community to figure out the rest. If common runtimes don’t support the format, if there’s no GGUF conversion, and if chat templates or tool-calling glue are incomplete, developers run straight past it. The outcome is predictable: people pick a slightly older or smaller model that plugs into vLLM, llama.cpp, or whatever their pipeline already uses. The model itself becomes a research artifact rather than a usable building block.

Why this is a product problem, not just engineering

Engineers build capabilities. Products remove friction. For open models to succeed, someone has to own the “last mile”: the conversions, the hosted inference endpoint, the reference SDKs, the promotion to popular inference marketplaces. That’s product work—prioritization, docs, SDK releases, and marketing to developer communities. It’s rarely glamorous, and it rarely wins research awards, but it determines whether a model gets embedded into apps or piles up in a downloads folder.

Three practical moves for teams that want their model adopted

If you’ve produced a model and want people to actually use it, these are the pragmatic steps that matter more than another benchmark result.
- Ship the formats people use: provide GGUF, safetensors, ONNX where it makes sense. If you can’t be in every runtime on day one, be in the top three for your target audience.
- Publish a minimal inference endpoint and a tiny “playground” that runs on cheap infra. Developers will try a hosted demo before spinning up hardware.
- Bundle a conversion and starter kit: tokenizer, chat template, and a one-click example to hook tool-calling or RAG. Make the first working app under 15 minutes.
These are small, high-leverage bets. They don’t need perfect engineering—just enough to let people instrument, test, and prototype.

How incumbents turn packaging into a moat

When a large vendor builds an opinionated agent platform, a subtle lock-in happens. Not because the models are proprietary (they might not be), but because the platform owns the integration surface: observability, authentication, billing, and deployment patterns. Teams adopt the platform because it removes work and risk. Over time, the “free” cost of switching rises—not from model accuracy but from migration overhead.

That’s why you’ll often see vendors emphasize partner integrations and enterprise controls early: these are the levers that turn a technical capability into a repeatable operational solution.

What builders should actually care about

If you’re building with models or you’re on a team deciding whether to run something in-house, your job is to short-circuit false choices. Don’t treat model X vs model Y as the only axis. Ask:
- Can I evaluate this with a $20 test bed in an afternoon?
- Will my existing toolchain accept this format without surgery?
- What’s the realistic path from prototype to monitored production?
If the answer to any of those is “no,” the model is less valuable than it looks on paper.

Retail/PPC analogy (short)

Think of models like ad creatives. A marginally better creative that takes two weeks to QA and publish will lose to a slightly worse creative you can deploy in an hour and iterate on with A/B tests. Velocity—small, safe bets—beats theoretical win rates in fast-moving systems.

Two bets I’d place as a PM

If you’re the product owner for a model or for tooling around models, here’s what I’d prioritize in order:
- Developer experience: make the first 15 minutes delightful. If someone can’t get a demo running quickly, they’ll move on.
- Inference options: supported runtimes, small hosted tier, and a conversion pipeline.
- Operational playbooks: simple monitoring, cost estimates, and a migration checklist for customers who want to move away later.
Closing: productize the last mile

There’s an asymmetry in AI adoption: the heavy lifting of training and papers gets attention, but the quiet, mundane work of packaging decides who wins. If you’re a founder or an engineering lead, your healthiest obsession should be “how do we make this trivial to try?” Because the teams that answer that question will win more users than the teams that chase the last 1–2% of benchmark performance.

Make it trivial to try, and people will. Make it hard, and performance won’t matter.
March 12, 2026
Agents Need Contracts, Not More Brains
Why the next decade of agents will be decided by their contracts, not their brains

There’s a familiar pattern I keep seeing whenever a hot new agent platform shows up: breathless demos of planning and autonomy, a bunch of infrastructure scaffolding, and then—inevitably—confusion the first time that agent needs to call a real tool in a real workplace.

Two things landed in the last 48 hours that make this obvious in a useful way. One is chatter about a big vendor shipping an open agent platform. The other is a clear, practical writeup of the ReAct/tool-calling loop where you explicitly model state, tool schemas, and transitions. Together they highlight a simple truth: agents aren’t just models + compute. They are contracts between a thinking thing and the systems it touches.

What I mean by “contracts”

By contract I mean the agreed shape of interaction—the inputs, the outputs, the error modes, and who is responsible for recovery. Contracts sit between three actors: the LLM (the reasoner), the tool set (APIs, DBs, UIs), and the business that owns the outcome. A good contract makes the interaction predictable. A bad one is where subtle failures hide until they become disasters.

Think of it like a marketplace listing. A great item description tells buyers precisely what they’ll get, what’s excluded, and what happens if something’s damaged in shipping. Tools need the same thing when agents use them: clear schemas, explicit side effects, and well-defined failure semantics.

Why this matters more than model size right now

Everyone wants to argue about parameter counts, token limits, or who trained what on which dataset. Those debates matter for capabilities, but not for production reliability. In practice, the majority of outages, hallucinations, and compliance incidents I see happen at the boundary—when an agent takes an action that touches people, money, or private data.

Here’s the mental model: the LLM is the planner, but the world is deterministic only if you make it so. The agent’s brain can generate a plan, but unless the tool contract guarantees idempotency, transactional boundaries, and clear error codes, the plan will meet chaos. That’s not an ML problem; it’s a systems design problem.

An everyday example

Imagine a marketing agent that updates bids in a PPC campaign. The agent decides to raise bids on a promoted SKU because conversion metrics looked good. If the API call is retried without idempotency, your bids could double or worse. If the tool returns vague success messages, the agent may assume the change applied when it didn’t. That’s a measurable revenue leak you’ll notice on Monday morning.

Three contract-level guarantees you should design for

When you build agent-enabled systems, prioritize these guarantees before you tune models:
- Idempotency: Every state-changing call should be safe to retry. If a request can’t be retried, make the contract explicit and force human confirmation.
- Observability: Tools must emit machine-readable events for every action and every failure. The agent sees events; humans can trace them; alerting works.
- Authority & scope: Each agent action must be scoped to an account/role and limited in blast radius. Prefer explicit capability tokens over vague “write” permissions.
Where ReAct-style graphs help

The recent practical guides on ReAct-style loops show that if you treat the agent’s reasoning loop as explicit state transitions, you get two big wins:
- You can instrument and replay the loop. When something goes wrong you can reconstruct the exact decisions and tool outputs that led there.
- You can encode stop conditions and human-handoff points. Instead of a monolithic “do it all” agent, you get a graph that can pause, ask, or escalate based on variables you control.
That’s operational gold. When a business runs thousands of agent actions per day, being able to replay a single mistaken sequence until you understand the failure is what turns a reactive firefight into a continuous-improvement cycle.

Design patterns that reduce risk

I use a few patterns repeatedly when I’m driving product decisions for agent features:
- Shadow mode first: Let the agent propose actions and write them to a queue or audit log instead of taking them. Let humans confirm or run a verification pass that replays the tools in sandboxed mode.
- Progressive capability rollout: Start with read-only and scheduled writes, then add real-time write capabilities after you’ve observed behavior in production for a while.
- Explicit compensation paths: Every destructive action needs a defined undo or compensation workflow. Build the undo API before you let agents touch the live system.
Why open agent platforms raise the stakes

Open agent frameworks and vendor platforms both make it easier to stitch together LLMs and tools. That’s great for innovation, but it increases the surface area for misunderstandings. An open platform with lots of connectors makes it easy to accidentally expose a tool without the right contract guarantees.

Platforms will succeed when they treat connectors as first-class citizens: packaged with schemas, test harnesses, and safety gates. The platform’s job is not just to let you wire a model to an API; it is to help you ship a predictable contract that survives scale.

Product implications

For product folks, the practical question is: what do you ship first? My bias: ship the guardrails before the autonomy. Customers will forgive an agent that’s slow or conservative if it doesn’t break things. They will not forgive silent data leakage or thundering financial changes.

So make autonomy a premium feature, not the default. Build visibility, role-based control, and sandboxing into the product experience. Then sell the autonomy story with a track record: “we ran this in shadow for 30 days and reduced handle time by 24% without any live write errors.” That is believably valuable.

Short checklist for launch
- Define the API contract (inputs, outputs, error codes).
- Implement idempotency and audit events on every write path.
- Run shadow-mode validation and collect replayable traces.
- Roll out capabilities progressively with human-in-the-loop gates.
- Document compensation workflows and test them under load.
The long game

In the long run we’ll get better models, and those models will make more credible plans. That’s exciting. But the thing that separates an agent demo from a sustainable product is not how clever the planner is—it’s whether the world it touches behaves in predictable, testable ways.

If you’re building agent features this year, treat the tool boundary like a core product surface. Ship contracts, not conveniences. Build the undo before you build the action. And if you want a quick win, instrument the loop so you can replay, debug, and iterate without a blame game.

One retail/personalization analogy

In retail analytics, a bad data contract is like asking for nightly sales numbers but getting different definitions of “sale” from each store. Decisions computed on that data are brittle. Agents face the same trap: if each connector reports success differently, your agent’s decisions will be brittle—and customers will notice where it hurts their margins.

What to do tomorrow

Pick one high-impact agent action in your product and apply the checklist above. Run it in shadow for two weeks, capture traces, and see how often your contract ambiguity shows up. Fix those gaps before you turn the knob to full autonomy.

That is boring work. It’s also what buys you a future where agents are a feature users trust instead of a liability they tolerate.
March 12, 2026
Agents need built-in security, not bolt-on audits
Problem

Organizations are racing to deploy agentic systems — assistants that act on our behalf, call tools, and change state in the world. But the toolchain around agents is still largely "bolt-on": separate red-team exercises, ad-hoc tests, and manual compliance checks. That model doesn’t scale. When agents have real permissions (sending emails, executing code, accessing databases), delayed or fragmented security practices quickly become catastrophic.

Explanation (what it is)

By "built-in security" I mean evaluation, testing, and governance embedded directly in the agent platform and development lifecycle. Instead of running a vulnerability scan after you ship, the platform enforces tests during development, keeps full traceability of tool calls, and provides automated guardrails that are part of the runtime. The result: faster iterations with fewer surprises, and meaningful audit trails for operators and regulators.

Mechanism (how it works)

There are three core pieces to make security first-class:
- Continuous evaluation hooks: unit-test-like checks for prompt templates, tool wrappers, and decision policies that run on every commit or model change.
- Runtime enforcement: a policy layer that intercepts tool calls and enforces constraints (rate limits, data redaction, allowed endpoints), with fast, deterministic fallbacks when the agent is uncertain.
- Observability and traceability: immutable logs that show the prompt, model outputs, tool inputs/outputs, and the policy decisions that led to a particular action.
Architecturally, this is a mix of developer tooling and runtime plumbing. CI pipelines need test runners that can invoke the agent locally with mocked tools; the runtime must implement a policy decision point that can block or transform tool calls; and storage must capture the artifact chain for forensic review.

Steps (how to implement)

A practical rollout in an engineering org looks like this:
- Step 0: Inventory your agent surface. List all agents, capabilities, tool integrations, and privileges. Keep the list small and explicit.
- Step 1: Add evaluation suites. For each agent, create lightweight tests: safety unit tests (jailbreak attempts), correctness tests (task outputs for canonical inputs), and privacy tests (data-leak scenarios). Run them in CI on every change.
- Step 2: Wrap tools. Never let the LLM talk directly to infra. Introduce thin RPC wrappers with explicit schemas, argument validation, and permission checks. Instrument these wrappers to emit structured events.
- Step 3: Enforce policies at runtime. Deploy a policy gateway that validates every tool call against a policy (who, what, why). Provide a fallback behavior: deny, ask-for-human, or sanitize input.
- Step 4: Capture trace logs. Store prompts, model versions, tool inputs/outputs, and policy decisions in an append-only store with retention and export capabilities for audits.
- Step 5: Automate red-team tests. Integrate scripted adversarial prompts into CI and schedule periodic fuzzing runs. Surface failures as blocking or advisory depending on severity.
- Step 6: Governance hooks. Build simple approval flows for granting agents new privileges and require recorded rationale for any elevated access.
Examples (hypothetical)
- Hypothetical: A support agent that can close tickets and run small scripts in production. Without wrappers, a malformed prompt could trigger a script with destructive arguments. With the approach above, the agent’s "run_script" tool requires an immutable schema (script_name, args_allowed), and policy denies scripts that touch protected namespaces. A CI safety test ensures that common jailbreaks can’t escalate privileges.
- Hypothetical: An HR agent that summarizes candidate data. Privacy tests ensure the agent never transmits raw PII in outbound tool calls. Runtime policy strips or redacts fields before the external logging system sees them.
Mistakes / Pitfalls
- Treating evaluation as optional. Running a few manual red-team exercises isn’t the same as continuous automated checks. Humans are inconsistent; automation is repeatable.
- Over-restricting agents out of fear. If every tool call requires human approval, agents become useless. Design graduated responses: sandbox, sanitize, ask, deny — not only deny.
- Log overload without structure. Dumping gigabytes of text into a lake is useless. Capture structured events: tool_name, args_hash, model_id, decision, outcome.
- Blind trust in third-party tooling. Open-source evaluation tools are fantastic, but vendor acquisitions and changing licenses can shift risk. Keep your core test suites mirrored in your repo.
- Forgetting economics. Tests, fuzzers, and traces cost money. Prioritize high-risk agents and high-impact tools first.
Conclusion (what to do next)

If you run agents in production today: start with inventory and implement thin tool wrappers this week. Add one automated safety test per agent and wire it into CI; you’ll catch more regressions than ad-hoc reviews ever will. If you’re building an agent platform: bake policy enforcement, structured tracing, and CI-first evaluations into your architecture from day one — customers will demand it and regulators will likely require it.

Tone note: This isn’t about fear-mongering. Agents deliver huge value, but value + autonomy = responsibility. Treat security like composable infrastructure: small, testable pieces that fail predictably and report loudly. That’s how you scale agents without scaling risk.
March 11, 2026
Agents Are Here — Build with an Action Firewall

Title: Agents Are Here — Build with an Action Firewall

Hook: The agent era is not a feature release — it’s a change in failure modes.

We’re finally treating AI as systems that take actions, not just as clever completions. Over the past 48 hours I’ve been digging into open-source frameworks and safety wrappers: the conversation is no longer “can we make agents?” but “how do we make them safe, observable, and useful in real infra?”

Take 1 — Attack surface beats hallucination: When an agent can run shell commands, edit files, or call your CI, hallucinations stop being the main risk. The real danger is silent side-effects: leaked tokens, accidental deploys, and taskchains that escalate privileges. Open-source tooling that inserts an interception layer between agent and OS is the natural next step. Expect ADR-style middleware to be a standard part of any production agent stack.

Take 2 — Taskflow orchestration is maturing: Declarative taskflows and orchestration primitives are moving from proofs-of-concept to audit-friendly patterns. They give you checkable steps, inputs, and outputs — which turns agents from black-box scribes into pipelines you can test and version. That doesn’t remove the need for human oversight, but it does make automated testing and security reviews tractable.

Take 3 — Open-source + infra integration wins: The momentum is with projects that treat agents as first-class infra components: identity, least privilege, logging, and reversible actions. If you treat an agent like a library instead of a service, you end up with brittle, opaque setups. Treat it like infra and you can instrument, revoke, and iterate safely.

Practical takeaway for builders: Don’t ship agents without three things in place: (1) an action firewall that vets every external operation, (2) declarative taskflows so behavior is inspectable and testable, and (3) short-lived credentials plus tight audit logs. Start with small scopes: automation for safe, low-impact ops, then expand as your ADR and testing coverage matures.

Tone note: I say this as a CTO who trusts engineers — but not their default config. Agents amplify capability and mistakes equally. Build for the latter.

March 11, 2026
The least sexy checklist that will keep your agent from burning down the org (rewrite draft)
Here’s a fun new job title that nobody asked for: AI babysitter.

If you’re shipping agents (or even “just” tool-calling features), you’re already in it. Because the moment an agent can do things — create tickets, merge code, email customers, change configs — you’ve put a small, fast, sometimes-wrong decision-maker in the middle of your business.

And the problem isn’t that agents are evil. The problem is simpler: agents are confident. They will happily take a vague instruction and turn it into a concrete action. That’s the whole selling point. It’s also the risk.

The trap: we’re treating agents like chatbots

Most teams still design agent features like they’re building a chat UI. They worry about tone. They worry about whether the answer is correct. They worry about hallucinations.

But once you connect tools, your real failure mode isn’t “wrong text.” It’s wrong action.

Wrong action looks like:
- Deleting the wrong customer record because “cleanup” sounded safe.
- Posting an internal note to a public channel because the agent misread context.
- Rotating an API key at 2pm because the agent thought it was in a staging environment.
None of that requires a malicious model. It just requires a model doing what it’s built to do: pick the next plausible step and keep moving.

What you’re really building: a junior operator with root access

Here’s the mental model I use: an agent with tools is a junior operator you hired overnight. Smart, fast, tireless… and absolutely missing context you assume is obvious.

Humans make mistakes because they’re tired or distracted. Agents make mistakes because they’re over-literal in weird places and over-confident in others.

So the question isn’t “How do we make it smarter?” The question is:

How do we make it safe when it’s wrong?

Why the “least sexy checklist” matters

Everyone wants the cool part: the demo where the agent writes code, files bugs, and closes the loop.

The boring part is where you win long-term: guardrails, permissions, audit trails, and predictable failure.

Because the first time your agent does something dumb in production, you’ll learn a harsh truth:

Trust isn’t a feature. It’s a system.

How agent failures actually happen (in real life)

Let’s use a simple hypothetical. You give an agent access to:
- GitHub (create branches, open PRs, merge)
- PagerDuty (ack incidents)
- Slack (post updates)
- Terraform (apply changes)
You think: “Great, it can help on-call.”

Then an alert fires: latency spike. The agent reads logs, sees timeouts, and decides to “scale up the database.” It opens Terraform, changes the instance class, and applies.

But it’s the wrong workspace. Or the change is safe but triggers a restart at the worst time. Or it scales the replica instead of the primary.

Again: not malicious. Just a bad chain of reasonable steps.

What to do instead (without turning your product into bureaucracy)

I’m not going to give you a 47-item compliance spreadsheet. You’ll ignore it, and I don’t blame you.

Here are the few moves that actually change outcomes:
- Default to read-only and earn write access slowly. Let the agent observe and propose before it acts.
- Make “dangerous” actions loud. If it can delete, publish, rotate keys, or run money-moving operations, require a human confirmation.
- Scope permissions to the task. “Fix this incident” shouldn’t imply “edit infra everywhere.” Use short-lived credentials where you can.
- Log everything like you’re going to debug it at 3am. Because you are.
- Design for rollback. If an agent can change something, it should be able to undo it — or at least tell you exactly what changed.
Mistakes I keep seeing
- One giant agent with every tool. That’s not a product — it’s a liability.
- “It’s just a draft” thinking. If an agent can reach prod systems, it’s never “just a draft.”
- No sandbox. Your first ten runs should be in a toy environment with fake data. Don’t learn in front of customers.
- No notion of intent. The agent can’t read your mind. If your tools accept ambiguous commands, the agent will generate ambiguous commands.
The point (and the opportunity)

This is the part people miss: the teams that get this right aren’t “more paranoid.” They’re faster.

When your agent has clean boundaries, you can ship new capabilities without holding your breath. You can let it do real work because you know the blast radius is contained.

And that’s the business value: not the wow demo — the quiet confidence that your automation won’t embarrass you.

What I’d do next (today)
- Pick one workflow where an agent saves time but doesn’t need full write access.
- Ship it with read-only + suggestion mode.
- Add a human “approve” step for the handful of actions you’d regret.
- Instrument the hell out of it for a week.
Then expand. Slowly. Deliberately. Like an adult.
March 11, 2026
The least sexy checklist that will keep your agent from burning down the org
The least sexy checklist that will keep your agent from burning down the org

Enterprise AI is no longer a thought experiment. Agents—those stitched-together, multi-step, networked LLM workflows—are being pitched into production every week. But here’s the thing: most of the risk isn’t in the model. It’s in the plumbing, the permissions, and the way you let a chain of calls loose on corporate systems.

Problem: agents do a lot, and permissions are fuzzy

Agents are powerful because they can call tools, read documents, and act—sometimes across services and cloud boundaries. That capability is also a liability. One mis-scoped permission or a too-handy internet fetch and an agent can leak secrets, corrupt records, or trigger actions someone forgot to gate.

Why this keeps happening

People treat agents like glorified macros. They hand them tokens, point them at a repo or a calendar, and assume the AI will behave. But agents combine several failure modes: lateral API calls, credential reuse, over-broad retrievals, and opaque decision logic. Add emergent planning that reorders steps and you have a machine that can escalate a tiny read access into a write operation across services.

Mechanism: where the plumbing exposes you
- Tool chaining: Each step often needs a credential. If the agent holds a single long-lived token, every tool it touches becomes a blast radius.
- Implicit trust in embeddings/RAG: Retrieval systems blur context boundaries; agents may confidently act on stale or incorrectly sourced data.
- Action equivocation: Natural language leaves room for interpretation—”update” can mean append, overwrite, or delete.
- Monitoring gaps: Observability is often built for humans, not for opaque, multi-hop agent traces.
Checklist playbook: what to do tomorrow

Skip the long policy drafts for now. Do these seven concrete things inside a week and you’ll massively reduce risk.
- 1) Least-privilege tool tokens: Issue short-lived, scoped tokens per tool per agent instance. If you can, tie them to the agent run and revoke on completion.
- 2) Action capability model: Explicitly register every action an agent can perform (read, list, create, update, delete), and require an allowlist lookup before the agent executes a step.
- 3) Decision provenance headers: Force agents to emit structured reasons with each external call: what it asked, why, and what it expects to do with the result.
- 4) RAG source tagging: When using retrieval, attach strong metadata to results (source id, freshness, trust score). Treat any low-trust result as “context only; human review required.”
- 5) Human-in-the-loop gates: For destructive verbs (delete, modify production, send email), require a human confirmation token—ideally one-time use and recorded.
- 6) Canary runs and simulation mode: Run agents in a simulated environment with canned responses before live runs. Compare planned against observed actions and block deviations.
- 7) Audit-first telemetry: Log every step with immutable IDs and make the full trace available for quick playback. Not just status codes—log inputs, model traces, and final decisions.
Concrete examples (short)

Example A: An agent is given access to a CRM to “clean bad contacts.” With least-privilege tokens you give it read/list and a sandboxed update queue. It can propose merges, but writes require a human confirmation token. That single change in flow prevents accidental mass-deletes.

Example B: An agent integrates with cloud infra. Rather than giving it an org-wide cloud role, give it a task-scoped role that can only touch a single project. Use simulation to validate that the plan won’t escalate to org-level operations.

Pitfalls people ignore
- Over-reliance on single-run explainability: A single human-readable explanation from a model is not adequate provenance.
- Assuming embedding trust: If your retrieval includes user-uploaded docs, treat them as untrusted—especially when the agent can act on them.
- Rewarding speed not safety: KPIs that prize agent throughput will bias engineers to widen permissions instead of tightening them.
- Fuzzy roles: When teams own different services, no one owns the agent’s permissions. The result: cross-team blame and drift.
Next action (30–90 mins)

Run a quick inventory: list every agent in dev or prod and for each record: the tokens it holds, the actions it can perform, and the sources it queries. If that list is more than three lines, schedule a 90-minute remediation sprint. Start with short-lived tokens and a single human gate on destructive actions.

Why this matters

Agents are already making the enterprise faster. They can also make enterprise mistakes cheaper—if you treat them like toys. A pragmatic checklist, implemented as code (not wordy policy), buys you time to adopt better tooling, monitoring, and evaluation practices.

Make the plumbing boring again. Safety is the feature people stop noticing when it works.
March 11, 2026
Agents on the Desktop: What It Means to Put an Agent Between You and the OS
Agents on the Desktop: What It Means to Put an Agent Between You and the OS

Problem: we handed developers autonomous assistants and forgot the guardrails. In the rush to ship agent frameworks, teams are now running pieces of code that can execute shell commands, fetch arbitrary URLs, install packages, and write files — often with minimal human supervision. That’s not an abstract risk anymore. It’s a live operational vector on laptops and CI runners. If you are building or adopting agentic tooling, you need a practical security posture, not slogans.

What it is: interception and Agent Detection & Response

At its core, Agent Detection & Response (ADR) is simply a control layer that sits between an AI agent and dangerous side effects. Think of it as EDR for agents: every tool call — a curl fetch, a package install, a file write, a shell exec — is intercepted, inspected, scored, and either allowed, blocked, or escalated. The pattern is familiar to security engineers; the novelty is integrating it with agents’ runtime hooks so you get real-time inspection without killing productivity.

How it works (high level)
- Hooking into runtimes: The ADR layer integrates with agent runtimes or extensions (editor plugins, agent SDKs) and intercepts tool calls before the OS sees them.
- Multi-layer detection: Each action is evaluated by a set of detectors — URL reputation, package supply-chain heuristics, plugin scans, and local pattern rules. Scores pile up; a single high-confidence hit can block the action.
- Privacy model: The usual compromise: metadata (hashes, URLs) can be sent to cloud reputation services while sensitive content stays on-device. Offline modes should exist for air-gapped environments.
- Policy and escalation: Actions can be auto-blocked, allowed, or queued for human review. For developer workflows, low-friction escalation paths (notifications, one-click allow with audit) matter.
Practical steps to implement ADR for your teams
- Inventory agent runtimes: Know what agent platforms and editor plugins your teams run. If it can execute commands, it’s in scope.
- Adopt interception hooks: Prefer agent frameworks that expose hook points. If none exist, deploy a shim that wraps common tool calls (git, npm/pip, curl, shell).
- Define threat rules: Start with simple YAML rules: block raw `rm -rf /`, warn on `curl | bash`, require review for new global package installs. Iterate based on incidents.
- Use layered detection: Combine lightweight local heuristics with optional reputation checks. Local checks reduce latency and keep secrets local; reputation adds contextual wisdom.
- Audit logs and forensics: Capture each intercepted action, decision rationale, and requester context. Make logs easy to query; they are the single most valuable artifact when something goes sideways.
- Developer ergonomics: Treat false positives as product defects. Provide clear, actionable messages and a fast path to override when appropriate — with audit trails.
- Test adversarial prompts: Red-team agent prompts that try to escape the sandbox. If an agent can trick its own hooks, the controls are useless.
Examples (hypotheticals)

Hypothetical A: An agent in a developer’s editor suggests installing a new package and runs an install command. The ADR layer intercepts and detects the package has no registry history and contains an unusual postinstall script. The action is queued for review and blocked until a human approves — preventing a supply-chain compromise.

Hypothetical B: An internal agent tries to fetch a configuration file from an external URL. The URL reputation check flags it as suspicious based on heuristic patterns; the agent is required to surface the content to the user and ask for confirmation before proceeding. The engineer notices the mismatch and stops the flow.

Hypothetical C: A CI-integrated agent attempts to write credentials into a config file. Local policy detects a credential pattern and blocks the write, creating an incident ticket automatically.

Mistakes and pitfalls teams make
- Treating ADR as optional: Security as an afterthought fails. If agents are given destructive capabilities, assume they will be abused or accidentally misused.
- Over-reliance on cloud reputation: Sending full content to a cloud vendor for scoring is convenient, but it creates privacy and supply-chain dependencies. Always support a fully local mode.
- Poor UX on false positives: Block-everything designs frustrate developers and lead to shadow IT or disabling protections. Balance safety and flow with good escalation UX.
- Insufficient logging: Without clear logs you cannot reconstruct what an agent did — and you lose the ability to improve detection rules.
- Not red-teaming agents: Agents can exploit their own tool integrations. Simulate prompt-injection and privilege escalation scenarios regularly.
- Ignoring plugin ecosystems: The weakest link is often a third-party plugin. Scan and vet plugins before deployment.
Conclusion — next actions

If you run or plan to run agentic tooling on developer machines or CI, treat ADR like basic hygiene. Start small: inventory, add lightweight intercepts, and log everything. Then iterate: tweak detection rules, run red-team exercises, and improve developer UX so protections stick.

Don’t wait for a headline. The agent era gives us powerful productivity gains — and a fresh attack surface. Build the interception layer today, or you’ll be rebuilding your infra after someone else’s agent writes into it.

Title suggestion: Agents on the Desktop: What It Means to Put an Agent Between You and the OS
March 11, 2026