Skip to content

Intake analyze prompts embed raw Exa-crawled web text without sanitize_untrusted_text (scoring path sanitizes; intake doesn't) #113

Description

@matthewod11-stack

Description

The codebase has a purpose-built prompt-injection defense applied in scoring signal-extraction — but the intake research path embeds raw Exa-crawled web text directly into LLM prompts with no sanitization. Exploit: a candidate plants injection text on their own crawled page → steers the analysis, search strategy, and findSimilar seeds. Bounded today (user's own key, structured output, flag-dead in prod UI), but it's the same class of attacker-controlled content sanitized in one path and raw in the other. (Audit finding 2.2.)

Current State

  • Defense exists: src-tauri/src/recruiting/scoring/sanitize.rs (injection test at :62-67), applied at signal_extract.rs:56,:85
  • Unsanitized embeds: analyze_company (recruiting/intake/prod.rs:326-330) and analyze_profile (:358-363), both feeding provider.structured_output

Suggested Fix

  • Apply sanitize_untrusted_text to crawled text before embedding in both intake prompt builders
  • Port/extend the existing injection test to cover the intake path

Verification

  • cargo test passes including the new intake injection test
  • recruiting clippy clean

Automation Hints

scope: src-tauri/src/recruiting/intake/prod.rs, src-tauri/src/recruiting/scoring/sanitize.rs
do-not-touch: everything outside src-tauri/src/recruiting/
approach: extract-and-move
risk: low
max-files-changed: 3
blocked-by: none
bail-if: sanitizer needs intake-specific behavior changes (escalate instead)

Priority

Medium — fold into the FHR-90/91 pre-flip work (Linear).

Metadata

Metadata

Assignees

No one assigned

    Labels

    securitySecurity vulnerability or hardeningtech-debtEligible for automated overnight fixing

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions