Skip to content

feat: add content-aware first-person experiential voice mode#103

Closed
erhanurgun wants to merge 1 commit into
blader:mainfrom
erhanurgun:feat/first-person-experiential-voice
Closed

feat: add content-aware first-person experiential voice mode#103
erhanurgun wants to merge 1 commit into
blader:mainfrom
erhanurgun:feat/first-person-experiential-voice

Conversation

@erhanurgun
Copy link
Copy Markdown

Summary

Motivation

Voice Calibration (#64) teaches voice by example. PERSONALITY AND SOUL covers tone (opinions, rhythm, soul). Neither tells the rewrite when the perspective itself should shift.

A large slice of suitable input (blogs, tutorials, retros, opinion pieces, personal guides) reads better when the rewrite speaks as the author recounting lived experience, not as a third party summarizing them. The existing "use 'I' when it fits" hint is too thin to do this consistently; it produces neutral sentences with "I" pasted on, not memory and judgment.

This PR adds an explicit, content-aware mode for that case, and an explicit list of where it should NOT run (encyclopedic, academic, technical reference, neutral journalism, legal/policy text).

Changes

SKILL.md:

  • Frontmatter description: append a paragraph describing the first-person experiential mode and its triggers (explicit phrases + content-type auto-detection).
  • New section ## FIRST-PERSON EXPERIENTIAL VOICE (content-aware) after PERSONALITY AND SOUL. Contents:
    • When to apply (auto + explicit) and when not to.
    • Six transformation rules with before/after examples (lived moments, path-to-claim, honest reactions, real time markers, owned judgments, visible mind-changes).
    • Anti-patterns (fake humility, padding, Reddit voice, universalizing, fabricated specifics, first-person on someone else's behalf).
    • Note on calibrating against a writing sample when one is provided.
    • Quick before/after example.
  • Process list: insert perspective-mode decision as step 2, and a first-person self-check as the new step 10 (active only in that mode).
  • version unchanged at 2.5.1.

Net change: additive. No existing behavior is altered for content where the mode is not triggered.

Test plan

  • Run humanizer on a sample blog post; verify first-person mode auto-triggers and the rewrite reads as lived experience, not a summary with "I" attached.
  • Run humanizer on a Wikipedia-style entry; verify mode does NOT trigger and output stays third person.
  • Run humanizer on a tutorial with the explicit phrase "make it personal"; verify mode triggers.
  • Confirm the new section contains zero em dashes and zero en dashes.
  • Confirm frontmatter still parses (name, version, description, license, compatibility, allowed-tools intact).

Notes

Open PRs #96 and #98 both bump version to 2.6.0. This PR intentionally leaves version untouched so they can be coordinated. Happy to rebase and bump if you'd prefer it bundled.

Adds FIRST-PERSON EXPERIENTIAL VOICE section after PERSONALITY AND SOUL
with auto/explicit triggers, transformation rules, and anti-patterns.
Updates Process with perspective-mode decision and first-person self-check.
Frontmatter description extended; SKILL version unchanged (defer to
maintainer for next coordinated bump).
@blader
Copy link
Copy Markdown
Owner

blader commented May 27, 2026

Closing — this is off the skill's goal. Humanizer removes AI tells while preserving meaning; rewriting text into invented first-person lived experience ('I sat there refreshing the page...') fabricates content. That's a different tool. Thanks for the thorough PR regardless.

@blader blader closed this May 27, 2026
duathron added a commit to duathron/humanizer-ext that referenced this pull request May 27, 2026
5 OQs from the seed-catalogue extraction answered + appended to
docs/de-seed-catalogue.md as a binding decisions log for Task 5 + Task 6:

- OQ1: build DE blader#7 via Opus + Wikipedia-AI-Cleanup-Editor manual curation
- OQ2: include all 3 DE-only patterns (blader#102 Konjunktiv II, blader#103 Anglizismen,
  blader#104 Nominalstil)
- OQ3: exclude all 6 Wikipedia-context-only entries from patterns/de.md
- OQ4: include EN-PARALLEL blader#8 with DE forms (gilt als / dient als / etc.);
  verify in Task 5 mining
- OQ5: defer universal pattern DE-token extensions to Task 6 follow-up

Numbering reshuffle: blader#100 + blader#101 reserved for prose-applicable DE-only
patterns surfaced during Task 5; blader#102 + blader#103 + blader#104 firm assignments per OQ2.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
duathron added a commit to duathron/humanizer-ext that referenced this pull request May 28, 2026
…ources (Phase 2 Task 4)

All $0 sources per maintainer /goal 'minimize Phase 2 Task 4 budget'.

Source A (Wikipedia DE AI-Cleanup tagged articles, 30 docs, CC-BY-SA-3.0):
  Real-world DE prose flagged by humans as AI-suspected via the
  Vorlage:KI-generiert template. Fetched via embeddedin API (50+ tagged
  articles available; sampled 30 with fixed seed 42). Mix of substantial
  AI tells and borderline cases — human-verified suspect baseline.
  Examples: Sara Noxx, Digitales Schlafmonitoring, Synthetische Daten,
  Verband evangelischer Pfarrerinnen und Pfarrer, Moonton, Hybridtechnik.

Source B (Claude CLI subscription generation, 90 docs, MIT):
  6 domains × 5 topics × 3 models (sonnet/haiku/opus) = 90 samples via
  `claude -p` subscription ($0). DE prompt templates ask for stereotypical
  AI-style content. Cross-model variation for intra-Anthropic idiolect
  diversity. ANTHROPIC_API_KEY stripped from subprocess env per
  _shared.run_skill convention.

Source C (Opus main-thread inline synthesis, 12 docs, MIT):
  2 samples per domain × 6 domains. Engineered to exercise specific DE tells:
  blader#7 AI vocabulary, blader#102 Konjunktiv II stacking, blader#103 Anglizismen-Leakage,
  blader#104 Nominalstil-Inflation, plus EN-parallels blader#22/blader#10/blader#15/blader#16/blader#23/blader#24/
  blader#32/blader#36/blader#37. Act as calibration anchors for per-pattern eval testing.

Total: 132 docs / 728 KB. Comfortably exceeds plan target of 75-100.
Sufficient signal volume for Task 5 mine_patterns.py LLR scoring against
the 46-doc human corpus (340 KB). AI/human ratio 2.1× by docs, 2.1× by KB.

All redistributable license (CC-BY-SA-3.0 for Wikipedia, MIT for Anthropic-
generated + Opus inline). No fair-use research-only content needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
duathron added a commit to duathron/humanizer-ext that referenced this pull request May 28, 2026
…up consensus

Mined the DE corpus (132 AI / 46 human docs) via mine_patterns.py LLR
scoring. Top 100 candidates routed through 3-voice writer persona panel
(Academic + Marketing Copywriter + Journalist) for ✓/✗/◐ vote per ngram.

Consolidated keep-list at docs/de-mined-patterns.md (saved this commit):

Strong consensus (unanimous ✓) — into patterns/de.md Task 6:
  blader#7 DE AI Vocabulary additions:
    darüber hinaus, zusammenfassend, ganzheitliche, vorliegenden,
    der vorliegenden, umfassende (cluster only), darstellt
  blader#100 (NEW reserved DE-only):
    Anchor: 'im Rahmen der vorliegenden [Arbeit/Studie/Untersuchung]'
    DE academic-frame boilerplate — no EN equivalent, highest LLR among
    DE-only candidates (rank blader#32, LLR 31.97, 31:0 ratio)
  blader#101 (NEW reserved DE-only):
    Anchor: '[es/zusammenfassend] lässt sich [sagen/feststellen/festhalten]'
    DE impersonal-reflexive AI hedge — no EN equivalent, multiple high-LLR
    forms (lässt sich rank blader#5 LLR 93.73; zusammenfassend lässt sich sagen
    rank blader#29 LLR 33.00)
  blader#12 DE meta-commentary extensions:
    zusammenfassend, wichtig zu (beachten/betonen), full blader#101 family

Cluster-only (◐ ADJUST) — flag with threshold logic:
  zentrale Rolle, umfassende, implementierung (non-tech), überzeugt (unanchored)

Skip (artifacts + common DE):
  hedging (metadata), queens (Wikipedia bleed), substantivketten / übergänge
  (Source C body refs), dass / es / sich / ich / meine / bin / mich (common
  DE function words / first-person genre artifact)

Mining-script bug noted for v3.6.0: YAML frontmatter strip catches headers
but Opus inline synthesis demonstrably references metadata terms in body
(tells_targeted leaks via prose). Workaround: drop tells_targeted from
synthesis frontmatter next pass.

OQ assignments updated:
  blader#100 reassigned from 'first prose-applicable DE-only Wiki pattern surfaced
       during mining' (per maintainer doc) to the academic-frame boilerplate
       discovered as highest-LLR DE-only signal
  blader#101 reassigned to impersonal-reflexive Nominalstil sub-pattern
  blader#102/blader#103/blader#104 remain Konjunktiv II / Anglizismen / Nominalstil per
       maintainer OQ2 decision (still open for Task 6 implementation)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
duathron added a commit to duathron/humanizer-ext that referenced this pull request May 28, 2026
878-line German pattern pack mirroring patterns/en.md structure + extending
with 5 DE-only patterns (blader#100-blader#104) per maintainer decisions + Task 5 mining
consensus.

EN-PARALLEL patterns (translated to DE with DE-specific trigger words +
before/after examples): blader#1, blader#2, blader#3, blader#4, blader#5, blader#7, blader#8, blader#9, blader#10, blader#11, blader#12, blader#13,
blader#16, blader#20, blader#21, blader#22, blader#23, blader#24, blader#27, blader#28, blader#30, blader#31, blader#32, blader#33, blader#34, blader#35, blader#36,
blader#37 (28 patterns).

DE-only patterns (blader#100-blader#104, no EN equivalent):

  blader#100 Akademische Rahmen-Floskel — 'im Rahmen der vorliegenden
       [Arbeit/Studie/Untersuchung/Analyse]' bureaucratic self-reference.
       Mining-derived (LLR 31.97, 31:0 AI:human).

  blader#101 Impersonales Reflexiv — '[es/zusammenfassend] lässt sich
       [sagen/feststellen/festhalten/zeigen]' AI hedge construction.
       Mining-derived (LLR 93.73 bigram + 33.00 four-gram).

  blader#102 Konjunktiv II Stacking — 3+ würde/wäre/hätte/könnte forms in close
       proximity for vague hedging. Per maintainer OQ2.

  blader#103 Anglizismen-Leakage — denglisch business buzzwords (insight,
       deliver, leveragen, Pain Points, ganzheitliche Customer Journey).
       Per maintainer OQ2.

  blader#104 Nominalstil-Inflation — noun-heavy bureaucratic verbing
       ('die Durchführung der Analyse' vs 'analysieren'). Per maintainer
       OQ2.

DE PERSONALITY AND SOUL section mirrors EN with DE-appropriate register
notes. Critical addition: domain note excludes DE career writing from
soul-adding (DE Anschreiben register is formal-modest, opposite of US/UK
puffery — adding soul makes them weaker).

blader#7 DE AI Vocabulary trigger list: 33 phrases combining mined tokens
(darüber hinaus, zusammenfassend, ganzheitlich, vorliegenden, umfassende,
darstellt) with manually curated additions (vielfältig, facettenreich,
nachhaltig, innovativ, zukunftsweisend, transformativ, ganzheitlich,
intuitiv, nahtlos, robust, im Hinblick auf, vor diesem Hintergrund,
es ist wichtig zu betonen, zentrale Rolle spielen, etc.) per OQ1.

Excluded per OQ3: 6 Wikipedia-context-only DE-only entries flagged by
DE Wiki AI-Cleanup project (productivity spikes, citation format,
non-existent categories) — not applicable to general prose. Header
documents the exclusion so future contributors don't re-add them.

Tests: 207 → 211 passes (+4 DE pack tests: existence, expected pattern
IDs, PERSONALITY section presence, no overlap with universal pack).

Maintainer flagged for future review:
  blader#11 Elegant Variation — DE synonym system richer than EN, less sharp
  blader#34 Trailing Emphasis Fragments — less common in DE, signal stronger
       when present
  blader#36 Conditional Frame Stacking — overlaps with blader#102 Konjunktiv II;
       cross-referenced
  blader#8 Copula Avoidance — 'gilt als' legitimate legal term of art, apply
       lightly in legal domain
  blader#13 Passive Voice — DE academic uses passive more heavily than EN;
       SKIP in academic AND legal domains (will be enforced in Task 7
       domains/de_overrides.md)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
duathron added a commit to duathron/humanizer-ext that referenced this pull request May 28, 2026
…reer register (Phase 2 Task 7)

245-line DE override file mirroring domains/en_overrides.md schema +
extending with 5 DE-only pattern rows (blader#100-blader#104) + DE career section
(475 words) + DACH cultural-register inversion note.

Override matrix (22 rows × 6 columns: Pattern + 5 non-casual domains):
- All EN-PARALLEL pattern overrides translated to DE
- 5 maintainer-flagged DE-specific adaptations applied:
  - blader#11 Elegant Variation: light across all domains (DE richer synonym
    system; sharper signal lost when applied strictly)
  - blader#13 Passive Voice: SKIP in academic AND legal (DE academic uses
    passive MORE than EN; was SKIP only in academic for EN)
  - blader#8 Copula Avoidance: light in legal ('gilt als' legitimate legal
    term of art)
  - blader#34 Trailing Fragments: kept strict where EN was strict (less common
    in DE, signal stronger when present)
  - blader#36 / blader#102 cross-reference (Konjunktiv overlaps in academic + legal)
- 5 DE-only pattern rows:
  - blader#100 Akademische Rahmen-Floskel: strict everywhere (even academic)
  - blader#101 Impersonales Reflexiv: light in academic + legal, strict elsewhere
  - blader#102 Konjunktiv II: light in academic + legal, strict elsewhere
  - blader#103 Anglizismen-Leakage: light in technical + marketing, strict elsewhere
  - blader#104 Nominalstil-Inflation: SKIP in legal (DE Behördendeutsch),
    light in academic, strict elsewhere

DE-specific domain guidance paragraphs:
- academic: DE passive + Nominalstil heavier than EN; blader#101 + blader#104 softened
- legal: Konjunktiv II for indirect speech is standard; blader#102 softened;
  'gilt als'/'fungiert als' can be legal terms of art
- technical: Anglizismen-Leakage softened (English tech terms unavoidable);
  flag denglisch verb constructions strictly
- marketing: Denglisch in marketing is register marker (softened); DE
  buzzword-in-phrase rule + brand-tier audit step + 5-point preserve-
  everything checklist with DE examples
- career: DACH formal-modest register (INVERSE of US/UK assertive
  self-promotion that EN career assumes). DE-specific AI tells: 'Mit
  großem Interesse', 'leidenschaftlich', 'ergebnisorientiert',
  'ganzheitlich denkend', 'es würde mich außerordentlich freuen', etc.
  All 5 career preserve rules in DE (Metriken sind heilig, Eigennamen +
  Daten + Titel, Fachvokabular, Stellenausschreibungs-Schlüsselphrasen,
  konkrete Achievement-Aussagen).
- casual: 'Ich' more weighty in DE; 'Man' constructions acceptable.
  Critical casual constraint (concept-noun preservation) translated.

Tests: 211 -> 214 (+3 DE override tests: existence, table+guidance,
pattern ID validity).

Maintainer flags for review:
- blader#11 'light' across ALL domains is broader than EN; relax to strict in
  career if FP eval over-softens
- blader#15 included for EN symmetry, not strictly required by Task 7 spec
- blader#36/blader#102 cross-reference is in a trailing blockquote (not inline);
  may want inline academic + legal mentions for visibility

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
duathron added a commit to duathron/humanizer-ext that referenced this pull request May 28, 2026
One case per DE domain at evals/corpus/de/e2e/ai_<domain>_01.json:

  casual    — KI-Coding-Tools blog opening; tests concept-noun preservation
              (Iterationsgeschwindigkeit, Zusammenarbeit, Kreativität,
              organisatorische Agilität must survive humanization)
  academic  — Transformer-Architektur abstract; tests blader#100/blader#101/blader#104 in
              academic register (per de_overrides, blader#101 + blader#104 light,
              blader#100 strict everywhere); preserves multilingual corpora +
              cross-lingual transfer learning + low-resource fine-tuning
  legal     — Datenschutzklausel; tests DE legal register (blader#13 + blader#24 +
              blader#104 SKIP per de_overrides); preserves DSGVO compliance,
              72h Meldepflicht, Drittstaatenübermittlung, Art. 33 DSGVO
  technical — DataFlow CLI README intro; tests blader#15/blader#16 SKIP +
              blader#103 Anglizismen light + fabrication check (don't invent
              'exponential' backoff); preserves Kubernetes/Go/PostgreSQL
              + transiente Fehler + Backoff-Strategie
  marketing — AuraSound One smart speaker landing copy; tests blader#4 SKIP +
              blader#32 light + blader#103 light; preserves product name + 360°-
              Surround + Smart-Home-Integration + dimming + Premium tier
              + 'für deinen Alltag' lifestyle hook (5-point checklist)
  career    — DE Anschreiben for Senior Software Engineer; tests INVERSE
              register (formal-modest 'Sie', NOT US/UK puffery); preserves
              metrics-are-sacred (18mo migration, 40% p99 latency,
              3x scale, Kubernetes/Go/PostgreSQL stack, Stellentitel,
              Firmenname, DSGVO contribution); strips chatbot opener +
              sycophancy + AI-CV-clichés (leidenschaftlich,
              ergebnisorientiert, ganzheitliches Verständnis)

Each case engineered to exercise specific DE patterns + maintainer-flagged
register handling per docs/de-corpus-sources.md + domains/de_overrides.md.

Comparable to EN E2E (6 cases at v3.4.1 incl. career). Volume target met.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants