feat(kamado)!: replace jsdom with linkedom#115
Merged
Conversation
BREAKING CHANGE: DomSerializeOptions no longer accepts `url`. The previous lib.dom-shaped Window/Element types are replaced with linkedom derived DomWindow/DomElement types, so hook signatures change.
…linkedom BREAKING CHANGE: ManipulateDOMOptions no longer accepts `host`; it existed only to pass `url` into the (now removed) DomSerializeOptions. Hook signatures use DomElement/DomWindow from kamado/utils/dom.
- Strip leading HTML comments before fragment detection so a license banner ahead of <!doctype html> no longer routes the document through the fragment path. - Serialize fragments via tmpContainer.innerHTML so text and comment nodes survive (linkedom's .children is element-only). - Null-guard documentElement so a bare "<!doctype html>" input returns "" instead of throwing on .outerHTML. - Auto-create <head> when missing so downstream regex transforms (e.g. inject-to-head) still find an anchor under linkedom (which does not auto-insert one).
…tor scope - Detect real "data:" URLs (single colon), not just "data://", so genuine data URIs no longer slip past the protocol guard. - Strip query string and fragment before the extension check so a cache- busted src like "hero.png?v=abc" is sized normally. - Apply selector only to <img>; <picture> > <source> elements always participate so an img-shaped selector doesn't silently drop sources. - Drop the redundant array spread on linkedom's querySelectorAll (already a real Array at runtime).
- Re-parse the imageSizes output via linkedom and assert through DOM queries instead of fragile substring/regex matches, so the selector test cannot pass when target/other lookups silently return empty. - Craft the URL-guard srcs so each one would slip into trackDependency if its protocol/extension guard were removed, then assert the captured dependency set is empty. The earlier version relied on fs.stat returning null and passed even when the guards were missing.
- New packages/kamado/src/utils/dom.spec.ts directly covers the four recent getDOM hardening points so a regression in fragment detection, text/comment preservation, null documentElement, or auto-head insertion fails on its own. - Re-parse imageSizes output via linkedom in the <img> and <picture> > <source> tests so attribute order or quoting shifts cannot let the assertions pass vacuously. - Drop "should pass through pug file without compiler" — it documented an undefined-behavior path (pug source reaching the DOM serializer without a compiler) and the recent linkedom switch turned its former empty-string output into "p Hello, world!\n", a quiet behavioral surprise nobody actually relies on. - Add a comment on the layout snapshot explaining that <head></head> is the auto-inserted one and must not disappear.
…ints Add a "DOM Library (linkedom)" section to ARCHITECTURE.md / .ja.md listing every place linkedom diverges from jsdom (auto-head, null documentElement, baseURI/location, .children excluding text/comments, setAttribute ordering) and the file in src/utils/dom.ts that compensates for each. Names dom.spec.ts and manipulate-dom.spec.ts as the front-line guards so future linkedom bumps know what to re-verify.
…ntext
linkedom does not populate window.location / Document.baseURI, so reading
<a>.href returns the raw attribute instead of an absolute URL. The
previous host/url plumbing into JSDOM was removed in this branch and
left no first-class replacement — hooks had to fish baseURL out of
ctx.context.pkg.production manually.
- ManipulateDOMHookContext now extends TransformContext with:
baseURL: string | undefined — serve→devServer URL, build→pkg.production.baseURL/host
getHref(el): string | null — resolves the element's href against baseURL
- New kamado/utils/dom export resolveHref(el, base?) underpins getHref
and is available for direct use outside the manipulateDOM pipeline.
- Tests cover resolveHref unit-level (absolute, relative, protocol-
relative, malformed, missing attr, URL object base) and integration
via manipulateDOM (build vs serve base, no-base→null, absolute pass-
through).
- ARCHITECTURE.md / .ja.md updated to point at the new hook surface.
resolveHref previously returned whatever new URL() produced, so a hook that interpolated the result into HTML could become an XSS sink for `javascript:` / `data:` / `vbscript:` / `file:` hrefs, and a base URL with basic-auth credentials leaked the password into every resolved href. The helper now refuses opaque/local schemes and clears URL.username / URL.password before returning, keeping it safe-by-default for HTML output.
imageSizes joined rootDir + src segments with no escape check, so a crafted attribute like `<img src="../../../etc/passwd.png">` resolved outside rootDir; fs.stat would then read arbitrary bytes, image-size would try to parse them, and trackDependency would record the escaped path in the incremental-build manifest. Switch to path.resolve and verify the result stays within rootDir before touching the filesystem.
…erver gaps - Trim baseURL/host and treat empty strings as missing so `baseURL: process.env.PROD_BASE_URL ?? ''` no longer silently disables every ctx.getHref() in production. - Validate the candidate URL via `new URL(...)` and return undefined on parse failure so misconfigured values like `'example.com'` (no scheme) surface as 'no base' instead of throwing inside resolveHref on every href. - Default the pkg.production.host fallback to `https://` (it lands in user-visible HTML now via ctx.getHref — http:// would be a mixed- content / SEO regression). - Optional-chain devServer access so partial test/dev harnesses don't TypeError on missing host/port; return undefined instead.
- Trim the src attribute before the protocol guards so a stray leading space (common in CMS-authored content) no longer slips past the http(s)://, //, and data: checks and ends up pointing trackDependency at a nonsense path under rootDir. - Cache by srcPath (with `?` / `#` stripped) plus stat size, not raw src. Otherwise rotating cache-buster query strings forced a fresh image-size parse on every build despite the on-disk file being unchanged, defeating the cache layer the PR added for exactly that case.
…etter side effect - Comment-strip regex now matches both --> and --!> (the HTML5-tolerated variant) so a banner like `<!-- copyright --!><html>...</html>` is no longer misclassified as a fragment. - Auto-head detection walks documentElement.children explicitly rather than reading document.head, which in some linkedom versions materializes a <head> on first access. The previous if/insert branch was effectively dead code carried by that side effect; switching to a children walk makes the insertion path actually run when needed and survives a linkedom upgrade that drops the auto-creation behavior.
The previous `as unknown as readonly DomElement[]` cast assumed linkedom's querySelectorAll returns a real Array. It currently does (0.18.12), but a future shape change to a non-Array iterable would silently break flatMap flattening and crash on the first <img>. Switch to a for...of loop that works for both arrays and iterables so future linkedom upgrades degrade gracefully.
…inator
- Add resolveHref tests for the rejected schemes (javascript:, data:,
vbscript:, file:) and for basic-auth credential stripping on both the
relative-via-base and absolute paths.
- Add a fragment-classification test that pins the --!> comment
terminator as recognized: `<!-- copyright --!><html>...</html>` must
serialize as a full document, not a stripped-down fragment.
- Restore the pug-without-compiler test (page-compiler.spec.ts) with the
new expected output ('p Hello, world!\\n') so the linkedom-era
text-pass-through behavior is pinned. It's undefined behavior in the
formal sense (users should always register a compiler) but worth a
test to keep the symptom from silently flipping again.
Adds a getBaseURL spec helper and 9 tests that pin each branch of resolveBaseURL — empty/whitespace baseURL → undefined, scheme-less baseURL → undefined, trim before validation, host-only → https:// fallback, empty host → undefined, serve-mode missing devServer → undefined (no TypeError), and serve-mode partial devServer → undefined. The hardening that landed in c53bfc0 was previously only assured by default-context tests; reverting the patch now fails the matching case-specific test instead of silently passing.
- Path-traversal test: `<img src="../../../etc/passwd.png">` must not reach trackDependency or emit width/height. Pre-fix, the join-segments approach would have escaped rootDir; the fix uses path.resolve + path.relative to enforce containment. - Site-absolute src test: `/site-root.svg` must rebase under rootDir (not the filesystem root) so the traversal guard doesn't over-reject legitimate site-root-relative URLs. - Whitespace trim tests: cover both the leading-space external URL case (must still register as external) and a local src wrapped in spaces (must still resolve and size).
…urface Adds 4 integration tests reaching the resolveHref guards through the manipulateDOM hook (the surface users actually touch): - javascript: URL → ctx.getHref returns null (XSS guard) - data: URL → null - baseURL with basic-auth userinfo → resolved href has credentials stripped and contains neither the username nor the password - empty href attribute → null The unit tests in dom.spec.ts already pin resolveHref; this layer guarantees the wiring through manipulateDOM's hook context is not silently bypassed.
…ow-list - Auto-inserted <head> must be the first child of <html>, not appended somewhere arbitrary. The earlier 'contains <head>' assertion would pass even if linkedom dropped the head after <body>. - An existing <head> placed AFTER <body> must not be duplicated; the children-walk detection in getDOM has to be position-agnostic. - mailto: and tel: URLs are intentionally NOT in REJECTED_SCHEMES; pin that decision so a future contributor doesn't blanket-block opaque schemes by accident.
…rcPath-based keying - beforeEach clears the on-disk @d-zero/builder/image-sizes cache so a same-name same-byte-length fixture from a previous run can't leak width/height into the next test (cache key is srcPath:size, not a full file path). - New "cache-busted src values share the cache entry" test rewrites the file with same byte length but different parsed sizes between the two transforms and asserts the second access returns the cached result. If the cache key regressed to raw src, the test would catch it because the second parse would yield the new bytes.
resolveHref's REJECTED_SCHEMES set names `vbscript:` as one of the dangerous schemes it refuses to emit. Documentation that mentions the list trips cspell unless the word is whitelisted.
…olveHref - DomWindow/DomDocument/DomElement: explain the re-export rationale and call out that linkedom does not implement layout APIs / URL reflection. - DomSerializeOptions.hook: spell out when the hook runs and that mutations land in the returned string. - domSerialize: document the fragment-vs-full-document shape preservation and the bare-doctype empty-string fallback. - resolveHref: consolidate the duplicated JSDoc blocks into a single one with explicit @param / @returns; mention dangerous-scheme rejection and basic-auth credential scrubbing in the contract.
…M API README updates: - Replace the stale hook signature (`Element[]` / `Window` / `TransformContext`) with `DomElement[]` / `DomWindow` / `ManipulateDOMHookContext`, and document `ctx.baseURL` + `ctx.getHref`. - Remove the deleted `options.host` entry. - Spell out the selector's <img>-only scope and the query/fragment stripping on extension checks under imageSizes. - Add a "Resolving Anchor URLs with ctx.getHref" worked example. JSDoc updates (manipulate-dom.ts, image.ts): - ManipulateDOMHookContext: document each member with @param/@returns semantics; correct the host fallback to https:// (matching the impl). - ManipulateDOMOptions: fill in hook/imageSizes member docs. - manipulateDOM: rewrite the summary to mention the hook context wiring. - resolveBaseURL: document return-undefined-on-bad-input contract. - ImageSizesOptions / imageSizes: scope selector to <img>, mention the query/fragment strip and path-traversal guard.
…ager Lockfile metadata had drifted to version 9 (written by a stray yarn 4.14.1 invocation during local development) while the project pins `packageManager: "yarn@4.12.0"`. CI's `yarn install --immutable` refuses to rewrite the version field, failing the install step. Regenerate with the pinned yarn so the lockfile matches what CI uses.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
yarn bench --full --pages=500 --runs=3で median 約3.9倍速、RSS 約47%削減、yarn.lockから38件超の transitive を削減ctx.baseURL/ctx.getHref(el)をmanipulateDOMhook 第3引数 (ManipulateDOMHookContext) に追加し、jsdom 時代のwindow.location.hrefベースの URL 解決を取り戻すresolveHref(el, base?)/DomWindow/DomDocument/DomElementをkamado/utils/domから公開javascript:/data:/vbscript:/file:スキームをresolveHrefが拒否、basic-auth credential を結果から除去、<img src>の path traversal (..経由の rootDir 脱出) をブロックBenchmarks
yarn bench --full --pages=500 --runs=3Breaking changes
DomSerializeOptions.urlmanipulateDOMの第3引数ctx.baseURLを読むManipulateDOMOptions.hostctx.baseURLを上記経由で取得、または直接pkg.production.baseURLを設定manipulateDOMhook 第3引数の型TransformContextManipulateDOMHookContext(TransformContext を拡張、baseURLとgetHrefを追加)Element[]/Window)DomElement[]/DomWindow(getComputedStyle等の layout API は不在、<a>.hrefも絶対URLを返さない —ctx.getHrefを使用)Migration の詳細・before/after の書き換え例は
packages/kamado/ARCHITECTURE.md(DOM Library セクション) とpackages/@kamado-io/page-compiler/README.mdを参照してください。linkedom 挙動差分の補正 (kamado 側で吸収)
<head>を自動補完しないgetDOMが documentElement の children を走査して欠落時に挿入parseHTML('<!doctype html>')のdocumentElementがnull--!>を comment terminator として認識しない-->に正規化Document.baseURI/Window.location未実装ctx.getHref(el)を新設Test plan
yarn lint— cleanyarn build— 全 5 パッケージ成功yarn test— 467 passed (39 test files)yarn bench --full --pages=500 --runs=3を取り、本 PR と比較済みurl/hostを渡している箇所がないか確認 (TypeScript で typechecking 漏れた as any 経由の余地あり)yarn dev) を起動し、サンプルページのスクリプト/メタタグ注入 (inject-to-head) が引き続き機能することを目視確認Notes
2.0.0-alpha.16) なので破壊変更を伴うが、リリース時に2.0.0-alpha.17以降で公開予定🤖 Generated with Claude Code