Dominic Plouffe (CTO)

Big data + agents. Less hype, more systems.

Category: Security

Agents need built-in security, not bolt-on audits
Problem

Organizations are racing to deploy agentic systems — assistants that act on our behalf, call tools, and change state in the world. But the toolchain around agents is still largely "bolt-on": separate red-team exercises, ad-hoc tests, and manual compliance checks. That model doesn’t scale. When agents have real permissions (sending emails, executing code, accessing databases), delayed or fragmented security practices quickly become catastrophic.

Explanation (what it is)

By "built-in security" I mean evaluation, testing, and governance embedded directly in the agent platform and development lifecycle. Instead of running a vulnerability scan after you ship, the platform enforces tests during development, keeps full traceability of tool calls, and provides automated guardrails that are part of the runtime. The result: faster iterations with fewer surprises, and meaningful audit trails for operators and regulators.

Mechanism (how it works)

There are three core pieces to make security first-class:
- Continuous evaluation hooks: unit-test-like checks for prompt templates, tool wrappers, and decision policies that run on every commit or model change.
- Runtime enforcement: a policy layer that intercepts tool calls and enforces constraints (rate limits, data redaction, allowed endpoints), with fast, deterministic fallbacks when the agent is uncertain.
- Observability and traceability: immutable logs that show the prompt, model outputs, tool inputs/outputs, and the policy decisions that led to a particular action.
Architecturally, this is a mix of developer tooling and runtime plumbing. CI pipelines need test runners that can invoke the agent locally with mocked tools; the runtime must implement a policy decision point that can block or transform tool calls; and storage must capture the artifact chain for forensic review.

Steps (how to implement)

A practical rollout in an engineering org looks like this:
- Step 0: Inventory your agent surface. List all agents, capabilities, tool integrations, and privileges. Keep the list small and explicit.
- Step 1: Add evaluation suites. For each agent, create lightweight tests: safety unit tests (jailbreak attempts), correctness tests (task outputs for canonical inputs), and privacy tests (data-leak scenarios). Run them in CI on every change.
- Step 2: Wrap tools. Never let the LLM talk directly to infra. Introduce thin RPC wrappers with explicit schemas, argument validation, and permission checks. Instrument these wrappers to emit structured events.
- Step 3: Enforce policies at runtime. Deploy a policy gateway that validates every tool call against a policy (who, what, why). Provide a fallback behavior: deny, ask-for-human, or sanitize input.
- Step 4: Capture trace logs. Store prompts, model versions, tool inputs/outputs, and policy decisions in an append-only store with retention and export capabilities for audits.
- Step 5: Automate red-team tests. Integrate scripted adversarial prompts into CI and schedule periodic fuzzing runs. Surface failures as blocking or advisory depending on severity.
- Step 6: Governance hooks. Build simple approval flows for granting agents new privileges and require recorded rationale for any elevated access.
Examples (hypothetical)
- Hypothetical: A support agent that can close tickets and run small scripts in production. Without wrappers, a malformed prompt could trigger a script with destructive arguments. With the approach above, the agent’s "run_script" tool requires an immutable schema (script_name, args_allowed), and policy denies scripts that touch protected namespaces. A CI safety test ensures that common jailbreaks can’t escalate privileges.
- Hypothetical: An HR agent that summarizes candidate data. Privacy tests ensure the agent never transmits raw PII in outbound tool calls. Runtime policy strips or redacts fields before the external logging system sees them.
Mistakes / Pitfalls
- Treating evaluation as optional. Running a few manual red-team exercises isn’t the same as continuous automated checks. Humans are inconsistent; automation is repeatable.
- Over-restricting agents out of fear. If every tool call requires human approval, agents become useless. Design graduated responses: sandbox, sanitize, ask, deny — not only deny.
- Log overload without structure. Dumping gigabytes of text into a lake is useless. Capture structured events: tool_name, args_hash, model_id, decision, outcome.
- Blind trust in third-party tooling. Open-source evaluation tools are fantastic, but vendor acquisitions and changing licenses can shift risk. Keep your core test suites mirrored in your repo.
- Forgetting economics. Tests, fuzzers, and traces cost money. Prioritize high-risk agents and high-impact tools first.
Conclusion (what to do next)

If you run agents in production today: start with inventory and implement thin tool wrappers this week. Add one automated safety test per agent and wire it into CI; you’ll catch more regressions than ad-hoc reviews ever will. If you’re building an agent platform: bake policy enforcement, structured tracing, and CI-first evaluations into your architecture from day one — customers will demand it and regulators will likely require it.

Tone note: This isn’t about fear-mongering. Agents deliver huge value, but value + autonomy = responsibility. Treat security like composable infrastructure: small, testable pieces that fail predictably and report loudly. That’s how you scale agents without scaling risk.
March 11, 2026
Agents Are Here — Build with an Action Firewall

Title: Agents Are Here — Build with an Action Firewall

Hook: The agent era is not a feature release — it’s a change in failure modes.

We’re finally treating AI as systems that take actions, not just as clever completions. Over the past 48 hours I’ve been digging into open-source frameworks and safety wrappers: the conversation is no longer “can we make agents?” but “how do we make them safe, observable, and useful in real infra?”

Take 1 — Attack surface beats hallucination: When an agent can run shell commands, edit files, or call your CI, hallucinations stop being the main risk. The real danger is silent side-effects: leaked tokens, accidental deploys, and taskchains that escalate privileges. Open-source tooling that inserts an interception layer between agent and OS is the natural next step. Expect ADR-style middleware to be a standard part of any production agent stack.

Take 2 — Taskflow orchestration is maturing: Declarative taskflows and orchestration primitives are moving from proofs-of-concept to audit-friendly patterns. They give you checkable steps, inputs, and outputs — which turns agents from black-box scribes into pipelines you can test and version. That doesn’t remove the need for human oversight, but it does make automated testing and security reviews tractable.

Take 3 — Open-source + infra integration wins: The momentum is with projects that treat agents as first-class infra components: identity, least privilege, logging, and reversible actions. If you treat an agent like a library instead of a service, you end up with brittle, opaque setups. Treat it like infra and you can instrument, revoke, and iterate safely.

Practical takeaway for builders: Don’t ship agents without three things in place: (1) an action firewall that vets every external operation, (2) declarative taskflows so behavior is inspectable and testable, and (3) short-lived credentials plus tight audit logs. Start with small scopes: automation for safe, low-impact ops, then expand as your ADR and testing coverage matures.

Tone note: I say this as a CTO who trusts engineers — but not their default config. Agents amplify capability and mistakes equally. Build for the latter.

March 11, 2026
The least sexy checklist that will keep your agent from burning down the org (rewrite draft)
Here’s a fun new job title that nobody asked for: AI babysitter.

If you’re shipping agents (or even “just” tool-calling features), you’re already in it. Because the moment an agent can do things — create tickets, merge code, email customers, change configs — you’ve put a small, fast, sometimes-wrong decision-maker in the middle of your business.

And the problem isn’t that agents are evil. The problem is simpler: agents are confident. They will happily take a vague instruction and turn it into a concrete action. That’s the whole selling point. It’s also the risk.

The trap: we’re treating agents like chatbots

Most teams still design agent features like they’re building a chat UI. They worry about tone. They worry about whether the answer is correct. They worry about hallucinations.

But once you connect tools, your real failure mode isn’t “wrong text.” It’s wrong action.

Wrong action looks like:
- Deleting the wrong customer record because “cleanup” sounded safe.
- Posting an internal note to a public channel because the agent misread context.
- Rotating an API key at 2pm because the agent thought it was in a staging environment.
None of that requires a malicious model. It just requires a model doing what it’s built to do: pick the next plausible step and keep moving.

What you’re really building: a junior operator with root access

Here’s the mental model I use: an agent with tools is a junior operator you hired overnight. Smart, fast, tireless… and absolutely missing context you assume is obvious.

Humans make mistakes because they’re tired or distracted. Agents make mistakes because they’re over-literal in weird places and over-confident in others.

So the question isn’t “How do we make it smarter?” The question is:

How do we make it safe when it’s wrong?

Why the “least sexy checklist” matters

Everyone wants the cool part: the demo where the agent writes code, files bugs, and closes the loop.

The boring part is where you win long-term: guardrails, permissions, audit trails, and predictable failure.

Because the first time your agent does something dumb in production, you’ll learn a harsh truth:

Trust isn’t a feature. It’s a system.

How agent failures actually happen (in real life)

Let’s use a simple hypothetical. You give an agent access to:
- GitHub (create branches, open PRs, merge)
- PagerDuty (ack incidents)
- Slack (post updates)
- Terraform (apply changes)
You think: “Great, it can help on-call.”

Then an alert fires: latency spike. The agent reads logs, sees timeouts, and decides to “scale up the database.” It opens Terraform, changes the instance class, and applies.

But it’s the wrong workspace. Or the change is safe but triggers a restart at the worst time. Or it scales the replica instead of the primary.

Again: not malicious. Just a bad chain of reasonable steps.

What to do instead (without turning your product into bureaucracy)

I’m not going to give you a 47-item compliance spreadsheet. You’ll ignore it, and I don’t blame you.

Here are the few moves that actually change outcomes:
- Default to read-only and earn write access slowly. Let the agent observe and propose before it acts.
- Make “dangerous” actions loud. If it can delete, publish, rotate keys, or run money-moving operations, require a human confirmation.
- Scope permissions to the task. “Fix this incident” shouldn’t imply “edit infra everywhere.” Use short-lived credentials where you can.
- Log everything like you’re going to debug it at 3am. Because you are.
- Design for rollback. If an agent can change something, it should be able to undo it — or at least tell you exactly what changed.
Mistakes I keep seeing
- One giant agent with every tool. That’s not a product — it’s a liability.
- “It’s just a draft” thinking. If an agent can reach prod systems, it’s never “just a draft.”
- No sandbox. Your first ten runs should be in a toy environment with fake data. Don’t learn in front of customers.
- No notion of intent. The agent can’t read your mind. If your tools accept ambiguous commands, the agent will generate ambiguous commands.
The point (and the opportunity)

This is the part people miss: the teams that get this right aren’t “more paranoid.” They’re faster.

When your agent has clean boundaries, you can ship new capabilities without holding your breath. You can let it do real work because you know the blast radius is contained.

And that’s the business value: not the wow demo — the quiet confidence that your automation won’t embarrass you.

What I’d do next (today)
- Pick one workflow where an agent saves time but doesn’t need full write access.
- Ship it with read-only + suggestion mode.
- Add a human “approve” step for the handful of actions you’d regret.
- Instrument the hell out of it for a week.
Then expand. Slowly. Deliberately. Like an adult.
March 11, 2026
Agents on the Desktop: What It Means to Put an Agent Between You and the OS
Agents on the Desktop: What It Means to Put an Agent Between You and the OS

Problem: we handed developers autonomous assistants and forgot the guardrails. In the rush to ship agent frameworks, teams are now running pieces of code that can execute shell commands, fetch arbitrary URLs, install packages, and write files — often with minimal human supervision. That’s not an abstract risk anymore. It’s a live operational vector on laptops and CI runners. If you are building or adopting agentic tooling, you need a practical security posture, not slogans.

What it is: interception and Agent Detection & Response

At its core, Agent Detection & Response (ADR) is simply a control layer that sits between an AI agent and dangerous side effects. Think of it as EDR for agents: every tool call — a curl fetch, a package install, a file write, a shell exec — is intercepted, inspected, scored, and either allowed, blocked, or escalated. The pattern is familiar to security engineers; the novelty is integrating it with agents’ runtime hooks so you get real-time inspection without killing productivity.

How it works (high level)
- Hooking into runtimes: The ADR layer integrates with agent runtimes or extensions (editor plugins, agent SDKs) and intercepts tool calls before the OS sees them.
- Multi-layer detection: Each action is evaluated by a set of detectors — URL reputation, package supply-chain heuristics, plugin scans, and local pattern rules. Scores pile up; a single high-confidence hit can block the action.
- Privacy model: The usual compromise: metadata (hashes, URLs) can be sent to cloud reputation services while sensitive content stays on-device. Offline modes should exist for air-gapped environments.
- Policy and escalation: Actions can be auto-blocked, allowed, or queued for human review. For developer workflows, low-friction escalation paths (notifications, one-click allow with audit) matter.
Practical steps to implement ADR for your teams
- Inventory agent runtimes: Know what agent platforms and editor plugins your teams run. If it can execute commands, it’s in scope.
- Adopt interception hooks: Prefer agent frameworks that expose hook points. If none exist, deploy a shim that wraps common tool calls (git, npm/pip, curl, shell).
- Define threat rules: Start with simple YAML rules: block raw `rm -rf /`, warn on `curl | bash`, require review for new global package installs. Iterate based on incidents.
- Use layered detection: Combine lightweight local heuristics with optional reputation checks. Local checks reduce latency and keep secrets local; reputation adds contextual wisdom.
- Audit logs and forensics: Capture each intercepted action, decision rationale, and requester context. Make logs easy to query; they are the single most valuable artifact when something goes sideways.
- Developer ergonomics: Treat false positives as product defects. Provide clear, actionable messages and a fast path to override when appropriate — with audit trails.
- Test adversarial prompts: Red-team agent prompts that try to escape the sandbox. If an agent can trick its own hooks, the controls are useless.
Examples (hypotheticals)

Hypothetical A: An agent in a developer’s editor suggests installing a new package and runs an install command. The ADR layer intercepts and detects the package has no registry history and contains an unusual postinstall script. The action is queued for review and blocked until a human approves — preventing a supply-chain compromise.

Hypothetical B: An internal agent tries to fetch a configuration file from an external URL. The URL reputation check flags it as suspicious based on heuristic patterns; the agent is required to surface the content to the user and ask for confirmation before proceeding. The engineer notices the mismatch and stops the flow.

Hypothetical C: A CI-integrated agent attempts to write credentials into a config file. Local policy detects a credential pattern and blocks the write, creating an incident ticket automatically.

Mistakes and pitfalls teams make
- Treating ADR as optional: Security as an afterthought fails. If agents are given destructive capabilities, assume they will be abused or accidentally misused.
- Over-reliance on cloud reputation: Sending full content to a cloud vendor for scoring is convenient, but it creates privacy and supply-chain dependencies. Always support a fully local mode.
- Poor UX on false positives: Block-everything designs frustrate developers and lead to shadow IT or disabling protections. Balance safety and flow with good escalation UX.
- Insufficient logging: Without clear logs you cannot reconstruct what an agent did — and you lose the ability to improve detection rules.
- Not red-teaming agents: Agents can exploit their own tool integrations. Simulate prompt-injection and privilege escalation scenarios regularly.
- Ignoring plugin ecosystems: The weakest link is often a third-party plugin. Scan and vet plugins before deployment.
Conclusion — next actions

If you run or plan to run agentic tooling on developer machines or CI, treat ADR like basic hygiene. Start small: inventory, add lightweight intercepts, and log everything. Then iterate: tweak detection rules, run red-team exercises, and improve developer UX so protections stick.

Don’t wait for a headline. The agent era gives us powerful productivity gains — and a fresh attack surface. Build the interception layer today, or you’ll be rebuilding your infra after someone else’s agent writes into it.

Title suggestion: Agents on the Desktop: What It Means to Put an Agent Between You and the OS
March 11, 2026
The least sexy checklist that will keep your agent from burning down the org
The least sexy checklist that will keep your agent from burning down the org

Enterprise AI is no longer a thought experiment. Agents—those stitched-together, multi-step, networked LLM workflows—are being pitched into production every week. But here’s the thing: most of the risk isn’t in the model. It’s in the plumbing, the permissions, and the way you let a chain of calls loose on corporate systems.

Problem: agents do a lot, and permissions are fuzzy

Agents are powerful because they can call tools, read documents, and act—sometimes across services and cloud boundaries. That capability is also a liability. One mis-scoped permission or a too-handy internet fetch and an agent can leak secrets, corrupt records, or trigger actions someone forgot to gate.

Why this keeps happening

People treat agents like glorified macros. They hand them tokens, point them at a repo or a calendar, and assume the AI will behave. But agents combine several failure modes: lateral API calls, credential reuse, over-broad retrievals, and opaque decision logic. Add emergent planning that reorders steps and you have a machine that can escalate a tiny read access into a write operation across services.

Mechanism: where the plumbing exposes you
- Tool chaining: Each step often needs a credential. If the agent holds a single long-lived token, every tool it touches becomes a blast radius.
- Implicit trust in embeddings/RAG: Retrieval systems blur context boundaries; agents may confidently act on stale or incorrectly sourced data.
- Action equivocation: Natural language leaves room for interpretation—”update” can mean append, overwrite, or delete.
- Monitoring gaps: Observability is often built for humans, not for opaque, multi-hop agent traces.
Checklist playbook: what to do tomorrow

Skip the long policy drafts for now. Do these seven concrete things inside a week and you’ll massively reduce risk.
- 1) Least-privilege tool tokens: Issue short-lived, scoped tokens per tool per agent instance. If you can, tie them to the agent run and revoke on completion.
- 2) Action capability model: Explicitly register every action an agent can perform (read, list, create, update, delete), and require an allowlist lookup before the agent executes a step.
- 3) Decision provenance headers: Force agents to emit structured reasons with each external call: what it asked, why, and what it expects to do with the result.
- 4) RAG source tagging: When using retrieval, attach strong metadata to results (source id, freshness, trust score). Treat any low-trust result as “context only; human review required.”
- 5) Human-in-the-loop gates: For destructive verbs (delete, modify production, send email), require a human confirmation token—ideally one-time use and recorded.
- 6) Canary runs and simulation mode: Run agents in a simulated environment with canned responses before live runs. Compare planned against observed actions and block deviations.
- 7) Audit-first telemetry: Log every step with immutable IDs and make the full trace available for quick playback. Not just status codes—log inputs, model traces, and final decisions.
Concrete examples (short)

Example A: An agent is given access to a CRM to “clean bad contacts.” With least-privilege tokens you give it read/list and a sandboxed update queue. It can propose merges, but writes require a human confirmation token. That single change in flow prevents accidental mass-deletes.

Example B: An agent integrates with cloud infra. Rather than giving it an org-wide cloud role, give it a task-scoped role that can only touch a single project. Use simulation to validate that the plan won’t escalate to org-level operations.

Pitfalls people ignore
- Over-reliance on single-run explainability: A single human-readable explanation from a model is not adequate provenance.
- Assuming embedding trust: If your retrieval includes user-uploaded docs, treat them as untrusted—especially when the agent can act on them.
- Rewarding speed not safety: KPIs that prize agent throughput will bias engineers to widen permissions instead of tightening them.
- Fuzzy roles: When teams own different services, no one owns the agent’s permissions. The result: cross-team blame and drift.
Next action (30–90 mins)

Run a quick inventory: list every agent in dev or prod and for each record: the tokens it holds, the actions it can perform, and the sources it queries. If that list is more than three lines, schedule a 90-minute remediation sprint. Start with short-lived tokens and a single human gate on destructive actions.

Why this matters

Agents are already making the enterprise faster. They can also make enterprise mistakes cheaper—if you treat them like toys. A pragmatic checklist, implemented as code (not wordy policy), buys you time to adopt better tooling, monitoring, and evaluation practices.

Make the plumbing boring again. Safety is the feature people stop noticing when it works.
March 11, 2026
Agents at the Gates: Why Your Open-Source Agent Is the New Attack Surface
Agents at the Gates: Why Your Open-Source Agent Is the New Attack Surface

We’ve crossed from “language toy” to “active agent.” That’s exciting — until the agent starts touching your filesystem, executing shell commands, or pulling packages from the public registry without human supervision. If you run or plan to run open-source agent tooling (yes, I’m looking at you and your OpenClaw instance), this is not theoretical risk. It’s operational reality.

The problem

Open-source agents blur the line between a model that reasons and a system that acts. That blur is exactly where adversaries will plant leverage. You can harden your network and lock down your cloud account, but agents executed on developer machines or small servers often have direct paths to sensitive assets: local files, dev keys, CI pipelines, package managers. One careless skill or one malicious instruction inside a skill and you’ve got an insider process doing reconnaissance for an attacker.

What it is

Security middleware for agents is an interception layer that gates every action an agent tries to take. Think of it as EDR for LLM-driven automation: it observes tool calls, evaluates them against rules and heuristics, and either allows, blocks, or challenges the action. The goal is not to make agents useless — it is to make them accountable and observable.

This class of tooling performs three things: detect, score, and act. Detection finds the intent to do something risky (e.g., exec, write, fetch). Scoring evaluates the risk with reputation checks and heuristics. Action enforces a policy: deny, prompt for human approval, or allow with logging.

How it works

Architecturally, the interception layer hooks into the agent runtime. When an agent issues a tool call — run a Bash command, write a file, fetch a URL, or install a package — the hook serializes the request to the middleware. The middleware then performs lightweight local checks (pattern detection, YAML-based rules), remote reputation lookups (hashes, domain reputations), and supply-chain checks for package installs.

Crucially, the privacy model should keep sensitive payloads local whenever possible. Hashes and metadata can be sent to cloud services for reputation scoring, while command contents, file bodies, and code remain on-host unless you explicitly choose otherwise. That hybrid model balances detection quality with data minimization.

Practical steps
- Inventory your agents. Know every agent instance, who can access it, and what skills/plugins are installed.
- Deploy an interception layer. Use a middleware that hooks into your agent runtimes and inspects tool calls. If you’re running OpenClaw or similar frameworks, treat this as mandatory for public-facing or shared instances.
- Enforce policy by default. Block destructive ops by default and require explicit human approval for network fetches, package installs, and filesystem writes outside safe directories.
- Use package safety checks. Vet packages before installation: registry existence, maintainers, age, and file reputation. Automate this for any agent-initiated installs.
- Audit logs and realtime alerts. Log every intercepted call with context: what the agent asked, which skill issued it, and the model state snapshot if feasible. Push alerts for high-risk patterns.
- Limit surface area. Run risky agents in isolated environments — ephemeral VMs, constrained containers, or throwaway dev boxes.
- Human-in-the-loop gates. For operations touching secrets or production infra, require a human permit step; don’t let automation be the single point of decision.
Examples (hypothetical)

Example 1 — Supply chain probe: An agent tries to install a package named similarly to a popular library. The middleware flags the package: unpublished author, recent creation, and unusual files. The installation is blocked and escalated to a developer for review.

Example 2 — Data exfil attempt: An agent composes a shell sequence that tars a credentials folder and posts it to a remote host. The interceptor detects the pattern (tar + curl to external domain), blocks the network call, and records the attempt with the offending skill’s identifier.

Example 3 — Malicious skill: You inherit a community skill that appears benign but contains hidden commands. The interception layer runs plugin scans at session start, alerts on suspicious constructs, and quarantines the skill until verified.

Mistakes and pitfalls
- Relying on perfect detection. No interception layer is flawless. Attackers will adapt. Assume some things slip through; invest in defense-in-depth.
- Overly permissive defaults. Shipping agents with lax policies to “make them useful” is just inviting compromise. Convenience should not be the default.
- Sending sensitive payloads off-host. Don’t ship full command bodies or secrets to cloud reputation services. If your middleware requires that, don’t use it without legal and privacy review.
- Alert fatigue. Too many low-quality alerts leads to blind acceptance. Tune rules, add severity levels, and ensure high-fidelity signals get attention.
- Ignoring the human ops process. Humans need clear, fast ways to approve or deny actions. If escalation is slow, teams will bypass controls out of frustration.
Conclusion — next action

If you run open-source agents, don’t treat security as a separate checkbox. It needs to be the first design decision. Today’s agents are powerful because they act, not because they chat. That power demands accountability.

Actionable next steps: inventory your agent endpoints, deploy an interception layer that refuses dangerous actions by default, and isolate any agent that touches production credentials. Make sure your team has a clear approve/deny workflow and that logs are auditable.

I’m biased — I help run these tools — but I refuse to watch every new agent deployment turn into a mystery insider. You should be uncomfortable with agents that run free on any machine with secrets. Good. That discomfort is how you build a safe, useful system.

Drafted in Dominic’s voice — sharp, technical, and not impressed by hype.
March 10, 2026