Case Study · ADR-001

Blueface: 6-Agentic Email Response System

Status:: Accepted, Ready to be Deployed
Built:: 2026 Q1/Q2
Client:: Blueface
Sector:: B2B SaaS · AI/ML Services
Platform:: Cassidy AI
Author:: Joseph Iyofor

The problem

30 to 60 emails per day, no consistent response system

Blueface receives 30 to 60 customer emails daily spanning four categories:

Sales inquiries: 40%
Technical questions: 30%
Support tickets: 20%
General inquiries: 10%

Manual response time averaged 2 to 4 hours per email, causing delayed sales cycles, inconsistent quality, and significant context switching across the team.

The challenge was not simply automating a response pipeline. It required a system that could draft accurately, maintain brand voice, allow human oversight on every single email, and handle revision feedback intelligently without routing every edit back through the full pipeline. And once built, the system needed a continuous quality layer that could evaluate every run automatically without relying on manual spot checks.

The solution

Two communicating workflows plus independent evaluation

Workflow 1 (W1) handles email ingestion and initial draft generation across six specialised agents.
Workflow 2 (W2) handles the human feedback loop through four conditional paths.
Evy is a third independent evaluation workflow, triggered by webhook after each run, providing Step Eval scoring per agent and Run Eval scoring across the full six-agent output.

Workflow 1

Email Handler agents

Agent	Role
GabbySenti	Sentiment analysis and initial email characterisation
Router Coach	Orchestration gatekeeper, classifies, routes, briefs downstream agents
Hatches (shared)	Internal KB Expert, deep knowledge base research, case studies, pricing, technical specs
Scratches (shared)	Web Researcher, fills knowledge gaps that Hatches cannot resolve from the internal KB
Pen Pusher (shared)	Final draft generation, used in W1 and all W2 REVISE paths
Jesses	QA Validator, checks draft quality, accuracy, tone, and completeness before human review

W1 output: Draft + QA score → Slack notification → Sheets log (40-column record).

Workflow 2

Human Feedback Loop

Path	Trigger	What happens
Path 1, GO	No revision required	Extracts context, sends approved email via Gmail, updates W1 status, confirms in Slack, logs to W2 sheet
Path 2, Revise A	Tone/wording change	Router Coach briefs → Pen Pusher V2 redrafts → present in Slack
Path 3, Revise B	New KB content needed	Router Coach → Hatches V2 → Pen Pusher V2 → present + log
Path 4, Revise C	KB + external research needed	Router Coach → Hatches V2 → Scratches V2 → Pen Pusher V2
Path 5, IGNORE	Bot message / irrelevant input	Immediate termination

Evy

Continuous Evaluation Pipeline

Independent 9-step Evy workflow triggered by webhook after each run. Performs two evaluation types: Step Eval (per agent) scoring accuracy, completeness, format, Evy confidence, and a Bridge Recommendation; and Run Eval (full six-agent output) scoring overall quality, Ship Decision, accuracy, completeness, deliverability, Bridge Recommendation, and revision count.

Risk mitigations

Built into the architecture

Risk	Mitigation
Cascading errors from one agent's bad output	Bounded responsibilities limit blast radius. Human-in-the-loop gate on every email, zero dispatches without human approval.
Prompt injection via inbound email content	Hardened system instructions at every agent layer. Injection scenarios built into Evals regression suite as standing automated check.
Sensitive customer data leakage	Output scanning for sensitive content before draft enters dispatch queue. Encryption in transit.
HIPAA / compliance violation	Compliance review on every draft. Any compliance flag triggers immediate human escalation, zero-tolerance threshold.
Silent failures producing valid-looking but wrong outputs	Evy Step Eval catches per-agent failures on every production run. Bridge Recommendation generates specific fix instruction for every below-pass score.
Repeated mistakes on a specific email category	Run Eval surfaces whether the full pipeline output is shippable. Monitoring dashboards with real-time alerts on score degradation.

Target outcomes

What the system is built to deliver

Metric	Target
Draft approval rate on first human review	80%+
System latency (email receipt to draft in Slack)	Under 90 seconds
Factual accuracy (scored by Evy vs. KB)	98%+
Cost per email across all agents and evaluation	Under $0.65

Key principle

Intelligence supports people, it does not replace them. Zero emails are dispatched without human approval.

Read AgriFinance next Book a Discovery Call Start with the $500 Quick Win Audit