docs(plans): add email triage agent spec by kovtcharov · Pull Request #796 · amd/gaia

kovtcharov · 2026-04-17T19:03:30Z

Summary

Adds a two-phase spec for a local-first email triage agent that runs inference on-device via Lemonade (Ryzen AI NPU/iGPU) — no email content transits a cloud API. Phase MVT ships in ~1.5 days (CC-assisted) by thin-wrapping existing primitives; Phase C1 polishes UX for v0.20.0; Phase C2 adds scheduled triage, Agent Inbox HITL, and in-tree Gmail MCP for v0.23.0. Slack is a first-class output channel from day one (webhook → MCP → interactive buttons across phases).

Key threads

MVT ships fast because ~95% of plumbing exists. §2.5 maps every required capability to an existing GAIA primitive (MCPClientMixin, DatabaseMixin, RAGSDK, TalkSDK, SummarizeAgent, ApiAgent, SSE). Why it matters: scoping the MVT as thin wrappers rather than new plumbing is what makes the ~1.5d estimate credible.
§22.4 catalogs in-flight PRs as prerequisites. Maps #606 (memory v2), #517 (autonomy M1/M3/M5), #495 (security.py), #622 (orchestrator), #779 (eval), #741 (vault), #737 (Slack connector) to which spec risks each one collapses. Why it matters: the "minimum set to start MVT safely" is named explicitly — feat(agents): file navigation, web browsing, scratchpad tools, and write security guardrails #495 + [Connector Hub] Split #545: credential vault as standalone deliverable #741 + one of feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard #606 / Add autonomous agent infrastructure (M1, M3, M5) #517 M1 — so sequencing is actionable.
Memory-PR conflict flagged (§22.4.4). feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard #606 and Add autonomous agent infrastructure (M1, M3, M5) #517 M1 overlap on memory subsystem; §22.4.4 calls out the reconciliation as a prerequisite decision, not a runtime surprise.
§27 "Known Weaknesses, Unvalidated Claims, Decision Debt" names the research bets (Custom AI Labels on local 4B, per-relationship voice, auto-follow-up quality) and unvalidated claims cited in the spec (97.5% tool-call reliability, GongRzhe archive date, etc.) so C2 isn't treated as an engineering certainty.
Slack integration scoped as an output channel (§12.18). Webhook at MVT → Slack MCP at C1 → interactive approve/edit/reject buttons at C2. Aligned with messaging-integrations-plan.mdx (Messaging adapters: Telegram, Discord, Slack integration #635).

Test plan

Render preview of docs/plans/email-triage-agent.mdx via Mintlify dev or amd-gaia.ai preview — confirm frontmatter, tables, code blocks, and section numbering (1–28) render cleanly.
Verify docs/docs.json navigation entry places the page under Agent UI group next to email-calendar-integration.
Cross-reference check: every [Link](file.mdx) target exists (email-calendar-integration, autonomy-engine, security-model, agent-ui, setup-wizard, messaging-integrations-plan).
Scan §22.4 PR numbers against the current PR queue (gh pr list --repo amd/gaia --state open) to confirm they're still open and the recommended sequence is feasible.

Two-phase local-first email triage agent — MVT (~1.5d CC-assisted) for v0.20.0, full EmailTriageAgent for v0.23.0. Covers auto-discovery, per-cohort autonomy, speech-act classification, undo ledger, Slack as first-class output channel, and an honest §27 catalog of research bets and unvalidated claims. §22.4 maps outstanding PRs to prerequisite role: #606 / #517 / #495 / #622 / #779 / #741 / #737. Landing the "minimum set" of #495 + #741 + one of #606 / #517 M1 collapses most of the missing-infrastructure workarounds before implementation starts.

§12.18 MCP Settings & One-Click Integration specifies the Agent UI surface users interact with to enable Gmail / Outlook / Slack — catalog cards, Connect-Flow modal with scope preview, discovery + manual-entry empty states, MCP server health panel, bulk actions. Cross-references Connector Hub work (#735, #736, #737, #738, #714) so if those ship first the email agent consumes the shared catalog instead of shipping a bespoke Settings page. §12.19 Output Formatting Grammar locks down the visual grammar for every email-agent output — inbox-summary cards, tool cards with undo + "why?" affordances and risk-tier ribbons, thread-view headers, draft preview with provenance, rich Daily Brief format, low-confidence surfacing, voice output. Ensures outputs are consistent, skimmable, and distinct from generic text walls. Slack section renumbered §12.18 → §12.20; all cross-references updated. §7.5 expanded to link to §12.18 and the Connector Hub issues. §12.0 Priority Index updated to include the new subsections per phase.

CI doc-verifier flagged three broken internal refs to `docs/guides/email.mdx` and `docs/sdk/sdks/email.mdx` (future deliverables, not yet in-tree) and one broken external URL to a MITRE ATLAS technique page that 404s. - Unwrap the future-file refs from markdown links to plain code-text so the link checker doesn't try to resolve them. - Replace the MITRE ATLAS 404 with the stable OWASP LLM Top 10 LLM01 (prompt injection) reference.

github-actions · 2026-04-17T23:10:58Z

Summary

Docs-only PR adding a 2,614-line two-phase spec for an Email Triage Agent at docs/plans/email-triage-agent.mdx, plus a navigation entry in docs/docs.json:361. The spec is exceptionally thorough: it maps every required capability to existing GAIA primitives (§2.5), catalogs in-flight prerequisite PRs with a recommended landing sequence (§22.4), is honest about its own weaknesses and unvalidated claims (§27), and defines a concrete MVT slice (§1.3) that rationalizes the "~1.5 days CC-assisted" estimate. All 10 internal MDX cross-references resolve to real files; docs.json placement under Agent UI next to email-calendar-integration matches the stated test plan.

The single most important thing to flag: the spec cites an external URL (jdhodges.com April 2026) for a "97.5% tool-call reliability" number. §27.1 already marks this as unvalidated — this is good practice, but I'd want the README / Executive Summary to not repeat the figure without the hedge (see Minor below).

Issues Found

🟢 Minor

1. Section numbering jumps from §2 directly to §2.5 (docs/plans/email-triage-agent.mdx:237)

Section 2 is titled "Why This Spec Exists (Relative to the Broader Plan)" but has no §2.1–§2.4 — the next subsection is §2.5. Either renumber to §2.1 or rename the heading to signal it's a standalone subsection (e.g. ### 2.1 What We Already Have …). Not a correctness issue; consistent numbering helps reviewers skim.

### 2.1 What We Already Have (Codebase Reality Check)

Note: if you keep the number as 2.5, the 30+ §2.5 references elsewhere in the doc don't need to change. If you rename to 2.1, search/replace §2.5 → §2.1 across the file.

2. Unhedged "97.5%" figure in §6 and §17.3 (Line 547, 2083)

§6 Design Rule 2 states "The 97.5% reliability figure (jdhodges.com, April 2026) is a single-source claim; treat as hypothesis until validated by our eval harness" — good. But §17.3 C2 Success Criteria uses the same number as a success threshold ("T2 Hermes-format tool dispatch succeeds in ≥ 97% of cases on Qwen3.5-4B-GGUF (matches jdhodges.com April 2026 benchmark)") without the hedge, which treats the unvalidated claim as a measurable target. Consider either:

- **Reliability:** T2 Hermes-format tool dispatch succeeds at a rate to be established by our own eval harness (initial hypothesis ≥ 97%; confirm or revise before locking the target).

…or drop the external benchmark reference entirely in the criteria table so success is measured on our own fixtures, not a third-party blog's number.

3. External commercial-product links will rot (§28 References)

~9 commercial product links (superhuman.com, shortwave.com, fyxer.com, etc.) are included for feature provenance. These work today but will degrade over 2–3 years as vendors reorganize their sites. Not blocking — just know that future edits to this plan may see broken external links and that's acceptable given the reference nature. No action needed unless you want to add an "accessed April 2026" note in the References header.

4. §12.0 Priority Index P1/P2 section references are dense (Line 1042-1044)

The priority matrix cells pack 7-10 cross-references each (e.g. "§12.3 stripped), Thread view with send-confirm modal (§12.4 core subset)..."). For a planning doc this is fine; for a reader trying to extract "what ships in MVT P0" this is hard to skim. Optional: consider pulling the MVT P0 list out into a simple bulleted list below the table. Not a blocker.

Strengths

§2.5 "Codebase Reality Check" — the capability → existing-primitive → file-path table is an excellent pattern for planning docs in this repo. It makes "~95% of plumbing exists" a verifiable claim, not rhetoric. Worth replicating for future plans.
§22.4 prerequisite-PR analysis — cataloguing in-flight PRs (feat(agents): file navigation, web browsing, scratchpad tools, and write security guardrails #495, Add autonomous agent infrastructure (M1, M3, M5) #517, feat(memory): agent memory v2 — second brain with hybrid search, LLM extraction, and observability dashboard #606, feat: AgentOrchestrator, routing fixes, and registry dataclass alignment #622, feat(eval): Agent Eval Toolchain — v0.18.0 milestone #779, [Connector Hub] Split #545: credential vault as standalone deliverable #741, [Connector Hub Phase 2] Token-auth connectors: Slack, GitHub, Notion #737) with the rework each one collapses, plus an explicit "minimum set to start MVT safely" (§22.4.5), turns this spec into a sequencing document and not just a design document. The memory-PR conflict callout (§22.4.4) is exactly the kind of cross-PR coordination hazard that usually surfaces only after someone has done duplicate work.
§27 "Known Weaknesses, Unvalidated Claims, Decision Debt" — honest meta-commentary about unvalidated claims, research bets, over-scoped and under-scoped areas. This is rare and valuable in a plan this large. Keeps future readers from treating C2 day-estimates as certainties.
§14 threat model is concrete (EchoLeak class cited, mitigations enumerated, explicit §14.7 "residual risk" acknowledging prompt injection is not solvable). The §4.6 "L5 templated auto-send" structural constraint is a genuinely strong design response to classifier-jailbreak risk.
All internal MDX cross-refs resolve — every [X](file.mdx) link (email-calendar-integration, autonomy-engine, security-model, agent-ui, setup-wizard, messaging-integrations-plan, ../sdk/core/agent-system, ../sdk/infrastructure/mcp, ../sdk/sdks/rag, ../sdk/sdks/audio) resolves to a real file.
MVT scope in §1.3 is disciplined — the "what MVT deliberately omits" list is as valuable as the "what MVT includes" list, and that's what makes the 1.5-day estimate defensible.

Verdict

Approve with suggestions.

This is a planning document, not production code, so the scope-clean / test-pass / lint-pass hurdles don't apply. The two content concerns above (§2 numbering gap, un-hedged 97.5% figure in §17.3) are minor polish items that can be folded into this PR or a follow-up. Navigation update in docs.json is correct, all internal links resolve, and the spec is structured to guide actual implementation sequencing rather than sit on a shelf.

Recommend merging; the §17.3 hedge is the one edit worth making before merge if the author is still iterating.

antmikinka

I see the vision and I like how well this is thought out!

@kovtcharov-amd

# GAIA v0.17.3 Release Notes GAIA v0.17.3 is an extensibility and resilience release. You can now package your own agents into a custom GAIA installer and seed them on first launch, point GAIA at alternative OpenAI-compatible inference servers from the C++ library (Ollama, for example), and start from three new reference agents (weather, RAG Q&A, HTML mockup) that execute against real Lemonade hardware in CI. It also hardens the RAG cache against an insecure-deserialization class of bug (CWE-502) — all users should upgrade. **Why upgrade:** - **Ship your own GAIA** — Export and import agents between machines, follow a new guide to produce a custom installer that seeds your agents on first launch, and on Windows install everything in one step because the installer now includes the Lemonade Server MSI. - **Work with alternative inference backends** — The C++ library now preserves OpenAI-compatible `/v1` base URLs instead of rewriting them to `/api/v1`, so servers that expose the standard `/v1` path (Ollama, for example) work out of the box. - **Start from a working example** — Three new reference agents (weather via MCP, RAG document Q&A, HTML landing-page generator) with integration tests that actually execute against Lemonade on a Strix CI runner. - **Safer RAG cache** — Replaces `pickle` deserialization with JSON + HMAC-SHA256 (CWE-502). Unsigned or tampered caches are rejected and transparently rebuilt on the next query. - **Better document handling** — Encrypted or corrupted PDFs now produce distinct, actionable errors (`EncryptedPDFError`, `CorruptedPDFError`) instead of generic failures, and the RAG index is hardened for concurrent queries. --- ## What's New ### Custom Installers and Agent Portability You can now package a custom GAIA installer that ships with your own agents pre-loaded, and move agents between machines with export/import (PR #795). On Windows, the official installer now includes the Lemonade Server MSI and runs it during install, so a fresh machine has the complete local-LLM stack after a single download (PR #781). **What you can do:** - Export an agent from `~/.gaia/agents/` to a portable bundle with `gaia agents export` and import it on another machine with `gaia agents import` - Follow the new custom-installer playbook at [`docs/playbooks/custom-installer/index.mdx`](/playbooks/custom-installer) to distribute GAIA with your agents pre-loaded — useful for workshops, team deployments, and internal tooling - On Windows, the installer now includes Lemonade Server — no separate download for a complete first-run experience **Under the hood:** - `gaia agents export` / `gaia agents import` CLI commands round-trip agents between machines as portable bundles - First-launch agent seeder (`src/gaia/apps/webui/services/agent-seeder.cjs`) copies `<resourcesPath>/agents/<id>/` into `~/.gaia/agents/<id>/` the first time the app starts - Windows NSIS installer embeds `lemonade-server-minimal.msi` into `$PLUGINSDIR` and runs it via `msiexec /i ... /qn /norestart` during install (auto-cleaned on exit) --- ### Broader Backend Compatibility in the C++ Library The C++ library now preserves OpenAI-compatible `/v1` base URLs (PR #773) instead of rewriting them to `/api/v1`. That means inference servers that expose the standard OpenAI `/v1` path — for example, Ollama at `http://localhost:11434/v1` — work out of the box without needing a special adapter. --- ### Reference Agents and Real-Hardware Integration Tests Three new example agents and a Strix-runner CI workflow land together (PR #340). **What you can do:** - Copy `examples/weather_agent.py`, `examples/rag_doc_agent.py`, or `examples/product_mockup_agent.py` as a starting point for your own agents - Run the new integration tests locally against Lemonade to validate agents end-to-end, not just structurally **Under the hood:** - `tests/integration/test_example_agents.py` executes agents and validates responses with a 5-minute-per-test timeout - `.github/workflows/test_examples.yml` runs on the self-hosted Strix runner (`stx` label) with Lemonade serving `Qwen3-4B-Instruct-2507-GGUF` - Docs homepage refreshed with a technical value prop ("Agent SDK for AMD Ryzen AI") and MCP / CUA added to the capabilities list --- ### Smarter PDF Handling in RAG Encrypted and corrupted PDFs now surface as distinct, actionable errors (`EncryptedPDFError`, `CorruptedPDFError`, `EmptyPDFError`) instead of generic failures or silent 0-chunk indexes (PR #784, closes #451). Encrypted PDFs are detected before extraction; corrupted PDFs are caught during extraction with a clear message. Combined with the indexing-failure surfacing in PR #723, you get a visible indexing-failed status the moment a document fails — and the RAG index itself is now thread-safe under concurrent queries (PR #746). --- ## Security ### RAG Cache Deserialization Replaced with JSON + HMAC Fixes an insecure-deserialization issue in the RAG cache (CWE-502, PR #768). Previously, cached document indexes were serialized with Python `pickle`; if an attacker could write to `~/.gaia/` — via a shared drive, a sync conflict, or a malicious extension — loading that cache could execute arbitrary code. v0.17.3 replaces `pickle` with signed JSON: caches are now serialized as JSON and authenticated with HMAC-SHA256 using a per-install key stored at `~/.gaia/cache/hmac.key`. Unsigned or tampered caches are rejected and transparently rebuilt on the next query. Old `.pkl` caches from previous GAIA versions are ignored and re-indexed the next time you query a document. **You should upgrade if you** share `~/.gaia/` across machines (Dropbox, iCloud, network home directories), run GAIA in a multi-user environment, or have ever imported RAG caches from another source. --- ## Bug Fixes - **Ask Agent attaches files before sending to chat** (PR #725) — Dropped files are indexed into RAG and attached to the active session before the prompt is consumed, so the model sees the document on the first turn instead of the second. - **Document indexing failures are surfaced** (PR #723) — A document that produces 0 chunks now raises `RuntimeError` in the SDK and surfaces as `indexing_status: failed` in the UI, instead of looking like a silent success. Covers RAG SDK, background indexing, and re-index paths. - **Encrypted or corrupted PDFs produce actionable errors** (PR #784, closes #451) — RAG now raises distinct `EncryptedPDFError` and `CorruptedPDFError` exceptions instead of generic failures, so you see exactly what went wrong. - **RAG index thread safety hardened** (PR #746) — Adds `RLock` protection around index mutation paths and rebuilds chunk/index state atomically before publishing it, so concurrent queries read consistent snapshots and failed rebuilds no longer leak partial state. - **MCP JSON-RPC handler guards against non-dict bodies** (PR #803) — A malformed JSON-RPC payload (array, string, null) now returns HTTP 400 `Invalid Request: expected JSON object` instead of an HTTP 500 from a `TypeError`. - **File-search count aligned with accessible results** (PR #754) — The returned count now matches the number of files the tool actually surfaces, instead of a pre-filter total that over-reported results the caller could not access. - **Tracked block cursor replaces misplaced decorative cursor** (PR #727) — Fixes the mis-positioned blinking cursor in the chat input box, which now tracks the actual caret position via a mirror-div technique. - **Ad-hoc sign the macOS app bundle instead of skipping code signing** (PR #765) — The `.app` bundle inside the DMG now carries an ad-hoc signature, so Gatekeeper presents a single "Open Anyway" bypass in System Settings instead of the unrecoverable "is damaged" error. Full Apple Developer ID signing is still being finalized. --- ## Release & CI - **Publish workflow: single approval gate, no legacy Electron apps** (PR #758) — Removed the legacy jira and example standalone Electron apps from the publish pipeline; a single `publish` environment gate governs PyPI, npm, and installer publishing. - **Claude CI modernization** (PR #797, PR #799, PR #783) — Migrated all four `claude-code-action` call sites to `v1.0.99` (pinned by SHA, fixes an issue-handler hang), bumped `--max-turns` from 20 to 50 on both `pr-review` and `pr-comment` for deeper analysis, upgraded to Opus 4.7, standardized 23 subagent definitions with explicit when-to-use sections and tool allowlists, and added agent-builder tooling (manifest schema, `lint.py --agents`, BuilderAgent mixins). --- ## Docs - **Roadmap overhaul** (PR #710) — Milestone-aligned plans with voice-first as P0 and 9 new plan documents for upcoming initiatives. - **Plan: email triage agent** (PR #796) — Specification for an upcoming email triage agent. - **Docs/source drift resolved** (PR #794) — Fixed broken SDK examples across 15 docs, rewrote 5 spec files against the current source (including two that documented entire APIs that don't exist in code), added 20+ missing CLI flags to the CLI reference, and removed 2 already-shipped plan documents (installer, mcp-client). - **FAQ: data-privacy answer clarified for external LLM providers** (PR #798) — Sharper guidance on what leaves your machine when you point GAIA at Claude or OpenAI. --- ## Full Changelog **21 commits** since v0.17.2: - `6d3f3f71` — fix: replace misplaced decorative cursor with tracked terminal block cursor (#727) - `874cf2a3` — fix: Ask Agent indexes and attaches files before sending to chat (#725) - `4fa121e2` — fix: surface document indexing failures instead of silent 0-chunk success (#723) - `34b1d06e` — fix(ci): ad-hoc sign macOS DMG instead of skipping code signing (#765) - `7188b83c` — Roadmap overhaul: milestone-aligned plans with voice-first P0 and 9 new plan documents (#710) - `1beddac5` — cpp: support Ollama-compatible /v1 endpoints (#773) - `cf9ac995` — fix: harden rag index thread safety (#746) - `1c55c31b` — fix(ci): remove legacy electron apps from publish, single approval gate (#758) - `52946a7a` — feat(installer): bundle Lemonade Server MSI into Windows installer (#774) (#781) - `e96b3686` — ci(claude): review infra + conventions + subagent overhaul + agent-builder tooling (#783) - `058674b5` — fix(rag): detect encrypted and corrupted PDFs with actionable errors (#451) (#784) - `7bcb5d51` — fix: replace insecure pickle deserialization with JSON + HMAC in RAG cache (CWE-502) (#768) - `a5167e5f` — fix: keep file-search count aligned with accessible results (#754) - `da5ba458` — ci(claude): migrate to claude-code-action v1.0.99 + fix issue-handler hang (#797) - `03f546b9` — ci(claude): bump pr-review and pr-comment --max-turns 20 -> 50 (#799) - `4119d564` — docs(faq): clarify data-privacy answer re: external LLM providers (#798) - `0cfbcf41` — Add example agents and integration test workflow (#340) - `c4bd15fb` — docs: fix drift between docs and source (docs review pass 1 + 2) (#794) - `407ed5b8` — docs(plans): add email triage agent spec (#796) - `06fb04a4` — fix(mcp): guard JSON-RPC handler against non-dict body (#803) - `880ad603` — feat(installer): custom installer guide, agent export/import, first-launch seeder (#795) Full Changelog: [v0.17.2...v0.17.3](v0.17.2...v0.17.3) --- ## Release checklist - [x] `util/validate_release_notes.py docs/releases/v0.17.3.mdx --tag v0.17.3` passes - [x] `src/gaia/version.py` → `0.17.3` - [x] `src/gaia/apps/webui/package.json` → `0.17.3` - [x] Navbar label in `docs/docs.json` → `v0.17.3 · Lemonade 10.0.0` - [x] All 21 PRs in the range (v0.17.2..HEAD) are represented in the notes - [ ] Review from @kovtcharov-amd addressed

kovtcharov requested a review from kovtcharov-amd as a code owner April 17, 2026 19:03

github-actions Bot added the documentation Documentation changes label Apr 17, 2026

kovtcharov-amd self-assigned this Apr 17, 2026

kovtcharov-amd requested a review from antmikinka April 17, 2026 19:11

kovtcharov-amd added this to the v0.17.4 — Website launch and shell-safety [OSS] milestone Apr 17, 2026

antmikinka approved these changes Apr 17, 2026

View reviewed changes

kovtcharov added this pull request to the merge queue Apr 18, 2026

Merged via the queue into main with commit 407ed5b Apr 18, 2026
23 checks passed

kovtcharov deleted the kalin/email-triage-spec branch April 18, 2026 00:41

itomek mentioned this pull request Apr 20, 2026

Release v0.17.3 #831

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(plans): add email triage agent spec#796

docs(plans): add email triage agent spec#796
kovtcharov merged 3 commits into
mainfrom
kalin/email-triage-spec

kovtcharov commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

antmikinka left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kovtcharov commented Apr 17, 2026

Summary

Key threads

Test plan

Uh oh!

github-actions Bot commented Apr 17, 2026

Summary

Issues Found

🟢 Minor

Strengths

Verdict

Uh oh!

antmikinka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants