Case Study · ADR-001

Blueface: 6-Agentic Email Response System

Status:
Accepted, Ready to be Deployed
Built:
2026 Q1/Q2
Client:
Blueface
Sector:
B2B SaaS · AI/ML Services
Platform:
Cassidy AI
Author:
Joseph Iyofor

The problem

30 to 60 emails per day, no consistent response system

Blueface receives 30 to 60 customer emails daily spanning four categories:

  • Sales inquiries: 40%
  • Technical questions: 30%
  • Support tickets: 20%
  • General inquiries: 10%

Manual response time averaged 2 to 4 hours per email, causing delayed sales cycles, inconsistent quality, and significant context switching across the team.

The challenge was not simply automating a response pipeline. It required a system that could draft accurately, maintain brand voice, allow human oversight on every single email, and handle revision feedback intelligently without routing every edit back through the full pipeline. And once built, the system needed a continuous quality layer that could evaluate every run automatically without relying on manual spot checks.

The solution

Two communicating workflows plus independent evaluation

  • Workflow 1 (W1) handles email ingestion and initial draft generation across six specialised agents.
  • Workflow 2 (W2) handles the human feedback loop through four conditional paths.
  • Evy is a third independent evaluation workflow, triggered by webhook after each run, providing Step Eval scoring per agent and Run Eval scoring across the full six-agent output.

Workflow 1

Email Handler agents

AgentRole
GabbySentiSentiment analysis and initial email characterisation
Router CoachOrchestration gatekeeper, classifies, routes, briefs downstream agents
Hatches (shared)Internal KB Expert, deep knowledge base research, case studies, pricing, technical specs
Scratches (shared)Web Researcher, fills knowledge gaps that Hatches cannot resolve from the internal KB
Pen Pusher (shared)Final draft generation, used in W1 and all W2 REVISE paths
JessesQA Validator, checks draft quality, accuracy, tone, and completeness before human review

W1 output: Draft + QA score → Slack notification → Sheets log (40-column record).

Workflow 2

Human Feedback Loop

PathTriggerWhat happens
Path 1, GONo revision requiredExtracts context, sends approved email via Gmail, updates W1 status, confirms in Slack, logs to W2 sheet
Path 2, Revise ATone/wording changeRouter Coach briefs → Pen Pusher V2 redrafts → present in Slack
Path 3, Revise BNew KB content neededRouter Coach → Hatches V2 → Pen Pusher V2 → present + log
Path 4, Revise CKB + external research neededRouter Coach → Hatches V2 → Scratches V2 → Pen Pusher V2
Path 5, IGNOREBot message / irrelevant inputImmediate termination

Evy

Continuous Evaluation Pipeline

Independent 9-step Evy workflow triggered by webhook after each run. Performs two evaluation types: Step Eval (per agent) scoring accuracy, completeness, format, Evy confidence, and a Bridge Recommendation; and Run Eval (full six-agent output) scoring overall quality, Ship Decision, accuracy, completeness, deliverability, Bridge Recommendation, and revision count.

Risk mitigations

Built into the architecture

RiskMitigation
Cascading errors from one agent's bad outputBounded responsibilities limit blast radius. Human-in-the-loop gate on every email, zero dispatches without human approval.
Prompt injection via inbound email contentHardened system instructions at every agent layer. Injection scenarios built into Evals regression suite as standing automated check.
Sensitive customer data leakageOutput scanning for sensitive content before draft enters dispatch queue. Encryption in transit.
HIPAA / compliance violationCompliance review on every draft. Any compliance flag triggers immediate human escalation, zero-tolerance threshold.
Silent failures producing valid-looking but wrong outputsEvy Step Eval catches per-agent failures on every production run. Bridge Recommendation generates specific fix instruction for every below-pass score.
Repeated mistakes on a specific email categoryRun Eval surfaces whether the full pipeline output is shippable. Monitoring dashboards with real-time alerts on score degradation.

Target outcomes

What the system is built to deliver

MetricTarget
Draft approval rate on first human review80%+
System latency (email receipt to draft in Slack)Under 90 seconds
Factual accuracy (scored by Evy vs. KB)98%+
Cost per email across all agents and evaluationUnder $0.65

Key principle

Intelligence supports people, it does not replace them. Zero emails are dispatched without human approval.