Skip to content

feat(site-migrator): generate YAML frontmatter from nitpicker DB and prepend to extracted pages#904

Merged
YusukeHirao merged 3 commits into
devfrom
feat/site-migrator-frontmatter-from-db
Jun 18, 2026
Merged

feat(site-migrator): generate YAML frontmatter from nitpicker DB and prepend to extracted pages#904
YusukeHirao merged 3 commits into
devfrom
feat/site-migrator-frontmatter-from-db

Conversation

@YusukeHirao

@YusukeHirao YusukeHirao commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

  • @d-zero/site-migrator の frontmatter 生成を parse5 ベース (extractFrontmatter) から .nitpicker DB 直読 (getFrontmatter + @nitpicker/querygetPageDetail) へ全面置換
  • formatFrontmatter を新設し、後続の scaffold パイプライン互換の ---\n…\n---\n YAML ブロックを生成。extractPages のパイプライン (getPageHtmlgetFrontmatterextractMainContent → prepend → writeFile) に統合
  • 出力 .html は「YAML frontmatter ブロック + レイアウト剥がし後の HTML」という中間成果物。後続の scaffold パイプラインで消費される前提
  • 後続予告: アセット参照書き換えの組み込み (PR 別) / scaffold プロジェクト構造への接続 (PR 別)

主要変更

ファイル 内容
src/archive/get-frontmatter.ts (新) @nitpicker/querygetPageDetail を呼び、flat meta カラム ⇄ Frontmatter をマップ。title 系は splitTitle を通し、whitespace-only カラムは空扱い
src/html/split-title.ts (新) parse5 依存ゼロの純関数として旧 extractFrontmatter から splitTitle を抽出
src/html/format-frontmatter.ts (新) js-yaml (forceQuotes / quotingType: '\"' / lineWidth: -1 / indent: 2 / noRefs) で scaffold パイプライン互換出力。og.* / twitter.* は nested map、空はサブツリーごと省略
src/page-extractor/extract-pages.ts Promise.allSettledgetPageHtml / getFrontmatter を並列実行。meta 失敗は fail-soft(本文は書き出す)
src/types.ts TwitterFrontmatter.url を削除(DB に対応カラム無し。og.url で代替)
src/index.ts extractFrontmatter を export から削除、getFrontmatter / formatFrontmatter / splitTitle を追加
src/html/extract-frontmatter.{ts,spec.ts} 削除
src/cli.ts USAGE 冒頭に「YAML frontmatter prepended」の概要を追記
package.json @nitpicker/{crawler,query} をバンプ、js-yaml + @types/js-yaml 追加
README.md DB ベース化・既知の制限(Quirks Mode 等)を反映
cspell.json "absolutised" を辞書に追加(別コミット)

設計判断

  • DB がメタの canonical source: nitpicker の新リリースで beholder ベースの flat meta が ~47 カラムに展開されたため、parse5 を介する必要が消えた
  • rawTitlesplitTitle 経由で再現: DB に rawTitle カラムは無いが、pages.title 全文に分割演算を適用すれば等価結果が得られる
  • メタは best-effort: getFrontmatter 例外時は HTML 本文だけ書き出す。本文抽出は成功しているので、メタ失敗で道連れにすると損失が非対称
  • twitter.url 削除: DB 非保持。og.url で代替する慣習に従う
  • YAML スタイル: 後続の scaffold パイプライン互換にピン留め

テスト

  • ユニット 93 件パス(うち本 PR で get-frontmatter.spec.ts 7 件、split-title.spec.ts 7 件、format-frontmatter.spec.ts 9 件、extract-pages.spec.ts 追加 3 件、migrate.spec.ts 追加 1 件)
  • chrome-devtools MCP (canary + flags 有効) で出力 .htmlfile:// で開き、frontmatter prepend 後の DOM・console・network を確認。Quirks Mode 警告は中間成果物として想定内(README に明記)

注意

.yarnrc.ymlnpmMinimalAgeGate: 7d は変更していない。新リリースの @nitpicker/{crawler,query} は yarn.lock に pin 済みで、yarn install は resolution ではなく lockfile 解決のため 7 日ゲートを通る(ゲートは yarn add 等の resolution 時のみ走る)。

Test plan

  • yarn build
  • yarn lint
  • yarn test
  • サンプル .nitpickerdz-migrate を流し、出力 .html 先頭に YAML frontmatter ブロックがあり、og.* / twitter.* が nested、空フィールドが省略されていることを目視

🤖 Generated with Claude Code

Used in the site-migrator JSDoc and README for the next commit to describe
nitpicker DB columns that store pre-absolutised URLs (canonical, og:image,
twitter:image, etc.).
…prepend to extracted pages

Switch frontmatter generation away from parse5-based HTML scraping (the now-
deleted `extractFrontmatter`) onto direct reads from the `.nitpicker` DB via
`@nitpicker/query`'s `getPageDetail`, which exposes the page meta as a flat
column schema. The new `getFrontmatter(session, url)` maps each non-empty DB
column onto the existing `Frontmatter` shape, runs title-shaped fields through
`splitTitle` (because the DB stores the full `<title>` text and has no pre-split
column), and drops whitespace-only / null values so downstream consumers never
see placeholder strings.

`splitTitle` is lifted out of the old parse5 extractor into its own
`html/split-title.ts` module — pure string operation, parse5-free.

`formatFrontmatter` (new) serialises a `Frontmatter` to the `---\n…\n---\n`
YAML block consumed by the downstream scaffold pipeline. Conventions are
pinned for that pipeline's expectations (js-yaml with `forceQuotes`,
`quotingType: '"'`, `lineWidth: -1`, `indent: 2`, `noRefs: true`; stable key
order; nested `og:` / `twitter:` maps; empty sub-objects omitted).

`extractPages` now runs `getPageHtml` and `getFrontmatter` in parallel via
`Promise.allSettled`, prepends the formatted YAML block to the extracted HTML,
and writes the combined buffer. Meta reads are best-effort: if `getFrontmatter`
rejects (e.g. transient SQLite contention), the page's already-extracted body
is still written without a frontmatter block, rather than discarding the body
along with the metadata failure.

`TwitterFrontmatter.url` is removed — `.nitpicker` has no corresponding column
and the standard convention is to read `og.url` when a Twitter-specific URL is
needed. CLI `USAGE` now mentions the YAML prepend so operators reading `--help`
understand the output format.

Public API churn (package is `private: true` so no npm consumers):
- removed: `extractFrontmatter`
- added:   `getFrontmatter`, `formatFrontmatter`, `splitTitle`, `TitlePair`
@YusukeHirao YusukeHirao force-pushed the feat/site-migrator-frontmatter-from-db branch from d0519a4 to 3a03c6d Compare June 18, 2026 05:16
…ypes JSDoc

jsdoc/no-undefined-types warns because eslint-plugin-jsdoc only resolves
{@link} against type / class names, not arbitrary export functions. The
warnings were surfacing as CI annotations on the PR diff (R14 / R23).
Switch the function references to backtick-quoted identifiers; the actual
type links (ArchiveSession.close) keep their {@link}.
@YusukeHirao YusukeHirao merged commit 9ad8f60 into dev Jun 18, 2026
1 check passed
@YusukeHirao YusukeHirao deleted the feat/site-migrator-frontmatter-from-db branch June 18, 2026 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant