Skip to content

feat(site-migrator): rewrite same-origin URL refs and assign per-page integer ids#905

Merged
YusukeHirao merged 2 commits into
devfrom
feat/site-migrator-rewrite-refs
Jun 18, 2026
Merged

feat(site-migrator): rewrite same-origin URL refs and assign per-page integer ids#905
YusukeHirao merged 2 commits into
devfrom
feat/site-migrator-rewrite-refs

Conversation

@YusukeHirao

@YusukeHirao YusukeHirao commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

  • extractPages で全ページの URL に決定論的な整数 id を割り当て、---\nid: <number>\n---\n を先頭に prepend する。採番は後段 scaffold パイプラインが期待する方式(root section: 5, 10, 15…/ N 番目のサブディレクトリセクション: N×10000+step5)で、@d-zero/shared/sort/pathpathComparator@d-zero/shared/sort/alphabeticalalphabeticalComparator で安定化済み。1 セクションあたり 2000 ページを超える場合は黙ってエイリアスせず明示的に throw。
  • rewritePageRefs を新設し、本文中の同一オリジン参照を後段 scaffold パイプライン向けに書き換える。
    • <a href> / <form action> / <iframe src> / <link rel=canonical|alternate|prev|next href>{{<id>}}?q=foo#frag テンプレート。
    • その他のアセット属性(img/script/source/embed/video/audio/track/srcset/<link rel=stylesheet|icon> 等)→ root-relative path。
    • 外部オリジン・mailto: / tel: / sms: / javascript: / data: / blob: / vbscript: / file:#fragment 単独(先頭空白付きも含む)は触らない。
  • ルックアップは buildPageIdLookup で 1 回だけ構築(per-page 構築だと O(N²))。trailing-slash drift を吸収しつつ exact ?query 一致を pathname-only fallback より優先するので、/list?p=1/list?p=2 双方が pageIds にあれば各々の id に解決される。
  • rewritePageRefs / getFrontmatter の失敗は fail-soft。HTML 本文は書き出し、onResult の outcome に rewriteError / metaError を載せて警告、MigrateReportpagesRewriteFailed / pagesMetaFailed で集計。CLI は stderr に warn を 1 行追加するだけ。
  • AssetResolver に 4 番目の引数 tagAttrs を追加。<link href>rel で振り分けるために必要。

Test plan

  • yarn test(127 件パス)
  • yarn lint(cspell / eslint / prettier / textlint 全 OK)
  • yarn build
  • CI 全グリーン

… integer ids

extractPages now assigns a deterministic integer id to every page in items
(directory-grouped: root 5/10/15…, Nth subdir N×10000+step5) and prepends
`id: <number>` to the YAML frontmatter. Body HTML is streamed through
rewritePageRefs, which converts same-origin <a href>/<form action>/<iframe src>
and <link rel=canonical|alternate|prev|next href> into the `{{<id>}}` template
the downstream scaffold pipeline consumes, while other same-origin asset URLs
(img, script, stylesheets, srcset, etc.) become root-relative paths.

Lookups absorb trailing-slash drift and prefer exact ?query matches before the
pathname-only fallback so URLs differing only by query string keep their own
ids. Section overflow (>2000 pages) throws rather than silently aliasing into
the next section.

rewritePageRefs / getFrontmatter failures stay fail-soft: the body is still
written and the outcome carries rewriteError / metaError; migrate() aggregates
both via pagesRewriteFailed / pagesMetaFailed.

AssetResolver gains a fourth `tagAttrs` argument so resolvers can read sibling
attributes (used to gate <link href> on rel).
@YusukeHirao YusukeHirao force-pushed the feat/site-migrator-rewrite-refs branch from 286db90 to d125c60 Compare June 18, 2026 12:58
@YusukeHirao YusukeHirao merged commit 22a63da into dev Jun 18, 2026
1 check passed
@YusukeHirao YusukeHirao deleted the feat/site-migrator-rewrite-refs branch June 18, 2026 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant