feat: add content-aware first-person experiential voice mode#103
Closed
erhanurgun wants to merge 1 commit into
Closed
feat: add content-aware first-person experiential voice mode#103erhanurgun wants to merge 1 commit into
erhanurgun wants to merge 1 commit into
Conversation
Adds FIRST-PERSON EXPERIENTIAL VOICE section after PERSONALITY AND SOUL with auto/explicit triggers, transformation rules, and anti-patterns. Updates Process with perspective-mode decision and first-person self-check. Frontmatter description extended; SKILL version unchanged (defer to maintainer for next coordinated bump).
Owner
|
Closing — this is off the skill's goal. Humanizer removes AI tells while preserving meaning; rewriting text into invented first-person lived experience ('I sat there refreshing the page...') fabricates content. That's a different tool. Thanks for the thorough PR regardless. |
duathron
added a commit
to duathron/humanizer-ext
that referenced
this pull request
May 27, 2026
5 OQs from the seed-catalogue extraction answered + appended to docs/de-seed-catalogue.md as a binding decisions log for Task 5 + Task 6: - OQ1: build DE blader#7 via Opus + Wikipedia-AI-Cleanup-Editor manual curation - OQ2: include all 3 DE-only patterns (blader#102 Konjunktiv II, blader#103 Anglizismen, blader#104 Nominalstil) - OQ3: exclude all 6 Wikipedia-context-only entries from patterns/de.md - OQ4: include EN-PARALLEL blader#8 with DE forms (gilt als / dient als / etc.); verify in Task 5 mining - OQ5: defer universal pattern DE-token extensions to Task 6 follow-up Numbering reshuffle: blader#100 + blader#101 reserved for prose-applicable DE-only patterns surfaced during Task 5; blader#102 + blader#103 + blader#104 firm assignments per OQ2. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
duathron
added a commit
to duathron/humanizer-ext
that referenced
this pull request
May 28, 2026
…ources (Phase 2 Task 4) All $0 sources per maintainer /goal 'minimize Phase 2 Task 4 budget'. Source A (Wikipedia DE AI-Cleanup tagged articles, 30 docs, CC-BY-SA-3.0): Real-world DE prose flagged by humans as AI-suspected via the Vorlage:KI-generiert template. Fetched via embeddedin API (50+ tagged articles available; sampled 30 with fixed seed 42). Mix of substantial AI tells and borderline cases — human-verified suspect baseline. Examples: Sara Noxx, Digitales Schlafmonitoring, Synthetische Daten, Verband evangelischer Pfarrerinnen und Pfarrer, Moonton, Hybridtechnik. Source B (Claude CLI subscription generation, 90 docs, MIT): 6 domains × 5 topics × 3 models (sonnet/haiku/opus) = 90 samples via `claude -p` subscription ($0). DE prompt templates ask for stereotypical AI-style content. Cross-model variation for intra-Anthropic idiolect diversity. ANTHROPIC_API_KEY stripped from subprocess env per _shared.run_skill convention. Source C (Opus main-thread inline synthesis, 12 docs, MIT): 2 samples per domain × 6 domains. Engineered to exercise specific DE tells: blader#7 AI vocabulary, blader#102 Konjunktiv II stacking, blader#103 Anglizismen-Leakage, blader#104 Nominalstil-Inflation, plus EN-parallels blader#22/blader#10/blader#15/blader#16/blader#23/blader#24/ blader#32/blader#36/blader#37. Act as calibration anchors for per-pattern eval testing. Total: 132 docs / 728 KB. Comfortably exceeds plan target of 75-100. Sufficient signal volume for Task 5 mine_patterns.py LLR scoring against the 46-doc human corpus (340 KB). AI/human ratio 2.1× by docs, 2.1× by KB. All redistributable license (CC-BY-SA-3.0 for Wikipedia, MIT for Anthropic- generated + Opus inline). No fair-use research-only content needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
duathron
added a commit
to duathron/humanizer-ext
that referenced
this pull request
May 28, 2026
…up consensus Mined the DE corpus (132 AI / 46 human docs) via mine_patterns.py LLR scoring. Top 100 candidates routed through 3-voice writer persona panel (Academic + Marketing Copywriter + Journalist) for ✓/✗/◐ vote per ngram. Consolidated keep-list at docs/de-mined-patterns.md (saved this commit): Strong consensus (unanimous ✓) — into patterns/de.md Task 6: blader#7 DE AI Vocabulary additions: darüber hinaus, zusammenfassend, ganzheitliche, vorliegenden, der vorliegenden, umfassende (cluster only), darstellt blader#100 (NEW reserved DE-only): Anchor: 'im Rahmen der vorliegenden [Arbeit/Studie/Untersuchung]' DE academic-frame boilerplate — no EN equivalent, highest LLR among DE-only candidates (rank blader#32, LLR 31.97, 31:0 ratio) blader#101 (NEW reserved DE-only): Anchor: '[es/zusammenfassend] lässt sich [sagen/feststellen/festhalten]' DE impersonal-reflexive AI hedge — no EN equivalent, multiple high-LLR forms (lässt sich rank blader#5 LLR 93.73; zusammenfassend lässt sich sagen rank blader#29 LLR 33.00) blader#12 DE meta-commentary extensions: zusammenfassend, wichtig zu (beachten/betonen), full blader#101 family Cluster-only (◐ ADJUST) — flag with threshold logic: zentrale Rolle, umfassende, implementierung (non-tech), überzeugt (unanchored) Skip (artifacts + common DE): hedging (metadata), queens (Wikipedia bleed), substantivketten / übergänge (Source C body refs), dass / es / sich / ich / meine / bin / mich (common DE function words / first-person genre artifact) Mining-script bug noted for v3.6.0: YAML frontmatter strip catches headers but Opus inline synthesis demonstrably references metadata terms in body (tells_targeted leaks via prose). Workaround: drop tells_targeted from synthesis frontmatter next pass. OQ assignments updated: blader#100 reassigned from 'first prose-applicable DE-only Wiki pattern surfaced during mining' (per maintainer doc) to the academic-frame boilerplate discovered as highest-LLR DE-only signal blader#101 reassigned to impersonal-reflexive Nominalstil sub-pattern blader#102/blader#103/blader#104 remain Konjunktiv II / Anglizismen / Nominalstil per maintainer OQ2 decision (still open for Task 6 implementation) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
duathron
added a commit
to duathron/humanizer-ext
that referenced
this pull request
May 28, 2026
878-line German pattern pack mirroring patterns/en.md structure + extending with 5 DE-only patterns (blader#100-blader#104) per maintainer decisions + Task 5 mining consensus. EN-PARALLEL patterns (translated to DE with DE-specific trigger words + before/after examples): blader#1, blader#2, blader#3, blader#4, blader#5, blader#7, blader#8, blader#9, blader#10, blader#11, blader#12, blader#13, blader#16, blader#20, blader#21, blader#22, blader#23, blader#24, blader#27, blader#28, blader#30, blader#31, blader#32, blader#33, blader#34, blader#35, blader#36, blader#37 (28 patterns). DE-only patterns (blader#100-blader#104, no EN equivalent): blader#100 Akademische Rahmen-Floskel — 'im Rahmen der vorliegenden [Arbeit/Studie/Untersuchung/Analyse]' bureaucratic self-reference. Mining-derived (LLR 31.97, 31:0 AI:human). blader#101 Impersonales Reflexiv — '[es/zusammenfassend] lässt sich [sagen/feststellen/festhalten/zeigen]' AI hedge construction. Mining-derived (LLR 93.73 bigram + 33.00 four-gram). blader#102 Konjunktiv II Stacking — 3+ würde/wäre/hätte/könnte forms in close proximity for vague hedging. Per maintainer OQ2. blader#103 Anglizismen-Leakage — denglisch business buzzwords (insight, deliver, leveragen, Pain Points, ganzheitliche Customer Journey). Per maintainer OQ2. blader#104 Nominalstil-Inflation — noun-heavy bureaucratic verbing ('die Durchführung der Analyse' vs 'analysieren'). Per maintainer OQ2. DE PERSONALITY AND SOUL section mirrors EN with DE-appropriate register notes. Critical addition: domain note excludes DE career writing from soul-adding (DE Anschreiben register is formal-modest, opposite of US/UK puffery — adding soul makes them weaker). blader#7 DE AI Vocabulary trigger list: 33 phrases combining mined tokens (darüber hinaus, zusammenfassend, ganzheitlich, vorliegenden, umfassende, darstellt) with manually curated additions (vielfältig, facettenreich, nachhaltig, innovativ, zukunftsweisend, transformativ, ganzheitlich, intuitiv, nahtlos, robust, im Hinblick auf, vor diesem Hintergrund, es ist wichtig zu betonen, zentrale Rolle spielen, etc.) per OQ1. Excluded per OQ3: 6 Wikipedia-context-only DE-only entries flagged by DE Wiki AI-Cleanup project (productivity spikes, citation format, non-existent categories) — not applicable to general prose. Header documents the exclusion so future contributors don't re-add them. Tests: 207 → 211 passes (+4 DE pack tests: existence, expected pattern IDs, PERSONALITY section presence, no overlap with universal pack). Maintainer flagged for future review: blader#11 Elegant Variation — DE synonym system richer than EN, less sharp blader#34 Trailing Emphasis Fragments — less common in DE, signal stronger when present blader#36 Conditional Frame Stacking — overlaps with blader#102 Konjunktiv II; cross-referenced blader#8 Copula Avoidance — 'gilt als' legitimate legal term of art, apply lightly in legal domain blader#13 Passive Voice — DE academic uses passive more heavily than EN; SKIP in academic AND legal domains (will be enforced in Task 7 domains/de_overrides.md) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
duathron
added a commit
to duathron/humanizer-ext
that referenced
this pull request
May 28, 2026
…reer register (Phase 2 Task 7) 245-line DE override file mirroring domains/en_overrides.md schema + extending with 5 DE-only pattern rows (blader#100-blader#104) + DE career section (475 words) + DACH cultural-register inversion note. Override matrix (22 rows × 6 columns: Pattern + 5 non-casual domains): - All EN-PARALLEL pattern overrides translated to DE - 5 maintainer-flagged DE-specific adaptations applied: - blader#11 Elegant Variation: light across all domains (DE richer synonym system; sharper signal lost when applied strictly) - blader#13 Passive Voice: SKIP in academic AND legal (DE academic uses passive MORE than EN; was SKIP only in academic for EN) - blader#8 Copula Avoidance: light in legal ('gilt als' legitimate legal term of art) - blader#34 Trailing Fragments: kept strict where EN was strict (less common in DE, signal stronger when present) - blader#36 / blader#102 cross-reference (Konjunktiv overlaps in academic + legal) - 5 DE-only pattern rows: - blader#100 Akademische Rahmen-Floskel: strict everywhere (even academic) - blader#101 Impersonales Reflexiv: light in academic + legal, strict elsewhere - blader#102 Konjunktiv II: light in academic + legal, strict elsewhere - blader#103 Anglizismen-Leakage: light in technical + marketing, strict elsewhere - blader#104 Nominalstil-Inflation: SKIP in legal (DE Behördendeutsch), light in academic, strict elsewhere DE-specific domain guidance paragraphs: - academic: DE passive + Nominalstil heavier than EN; blader#101 + blader#104 softened - legal: Konjunktiv II for indirect speech is standard; blader#102 softened; 'gilt als'/'fungiert als' can be legal terms of art - technical: Anglizismen-Leakage softened (English tech terms unavoidable); flag denglisch verb constructions strictly - marketing: Denglisch in marketing is register marker (softened); DE buzzword-in-phrase rule + brand-tier audit step + 5-point preserve- everything checklist with DE examples - career: DACH formal-modest register (INVERSE of US/UK assertive self-promotion that EN career assumes). DE-specific AI tells: 'Mit großem Interesse', 'leidenschaftlich', 'ergebnisorientiert', 'ganzheitlich denkend', 'es würde mich außerordentlich freuen', etc. All 5 career preserve rules in DE (Metriken sind heilig, Eigennamen + Daten + Titel, Fachvokabular, Stellenausschreibungs-Schlüsselphrasen, konkrete Achievement-Aussagen). - casual: 'Ich' more weighty in DE; 'Man' constructions acceptable. Critical casual constraint (concept-noun preservation) translated. Tests: 211 -> 214 (+3 DE override tests: existence, table+guidance, pattern ID validity). Maintainer flags for review: - blader#11 'light' across ALL domains is broader than EN; relax to strict in career if FP eval over-softens - blader#15 included for EN symmetry, not strictly required by Task 7 spec - blader#36/blader#102 cross-reference is in a trailing blockquote (not inline); may want inline academic + legal mentions for visibility Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
duathron
added a commit
to duathron/humanizer-ext
that referenced
this pull request
May 28, 2026
One case per DE domain at evals/corpus/de/e2e/ai_<domain>_01.json:
casual — KI-Coding-Tools blog opening; tests concept-noun preservation
(Iterationsgeschwindigkeit, Zusammenarbeit, Kreativität,
organisatorische Agilität must survive humanization)
academic — Transformer-Architektur abstract; tests blader#100/blader#101/blader#104 in
academic register (per de_overrides, blader#101 + blader#104 light,
blader#100 strict everywhere); preserves multilingual corpora +
cross-lingual transfer learning + low-resource fine-tuning
legal — Datenschutzklausel; tests DE legal register (blader#13 + blader#24 +
blader#104 SKIP per de_overrides); preserves DSGVO compliance,
72h Meldepflicht, Drittstaatenübermittlung, Art. 33 DSGVO
technical — DataFlow CLI README intro; tests blader#15/blader#16 SKIP +
blader#103 Anglizismen light + fabrication check (don't invent
'exponential' backoff); preserves Kubernetes/Go/PostgreSQL
+ transiente Fehler + Backoff-Strategie
marketing — AuraSound One smart speaker landing copy; tests blader#4 SKIP +
blader#32 light + blader#103 light; preserves product name + 360°-
Surround + Smart-Home-Integration + dimming + Premium tier
+ 'für deinen Alltag' lifestyle hook (5-point checklist)
career — DE Anschreiben for Senior Software Engineer; tests INVERSE
register (formal-modest 'Sie', NOT US/UK puffery); preserves
metrics-are-sacred (18mo migration, 40% p99 latency,
3x scale, Kubernetes/Go/PostgreSQL stack, Stellentitel,
Firmenname, DSGVO contribution); strips chatbot opener +
sycophancy + AI-CV-clichés (leidenschaftlich,
ergebnisorientiert, ganzheitliches Verständnis)
Each case engineered to exercise specific DE patterns + maintainer-flagged
register handling per docs/de-corpus-sources.md + domains/de_overrides.md.
Comparable to EN E2E (6 cases at v3.4.1 incl. career). Volume target met.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
FIRST-PERSON EXPERIENTIAL VOICE (content-aware)toSKILL.md, placed afterPERSONALITY AND SOUL.Motivation
Voice Calibration (#64) teaches voice by example. PERSONALITY AND SOUL covers tone (opinions, rhythm, soul). Neither tells the rewrite when the perspective itself should shift.
A large slice of suitable input (blogs, tutorials, retros, opinion pieces, personal guides) reads better when the rewrite speaks as the author recounting lived experience, not as a third party summarizing them. The existing "use 'I' when it fits" hint is too thin to do this consistently; it produces neutral sentences with "I" pasted on, not memory and judgment.
This PR adds an explicit, content-aware mode for that case, and an explicit list of where it should NOT run (encyclopedic, academic, technical reference, neutral journalism, legal/policy text).
Changes
SKILL.md:## FIRST-PERSON EXPERIENTIAL VOICE (content-aware)after PERSONALITY AND SOUL. Contents:versionunchanged at2.5.1.Net change: additive. No existing behavior is altered for content where the mode is not triggered.
Test plan
Notes
Open PRs #96 and #98 both bump
versionto2.6.0. This PR intentionally leavesversionuntouched so they can be coordinated. Happy to rebase and bump if you'd prefer it bundled.