Skip to content

feat(beholder)!: expand meta extraction with frontmatter-keys schema and Wappalyzer tag detection#878

Merged
YusukeHirao merged 1 commit into
devfrom
feat/beholder-frontmatter-meta-tags
Jun 16, 2026
Merged

feat(beholder)!: expand meta extraction with frontmatter-keys schema and Wappalyzer tag detection#878
YusukeHirao merged 1 commit into
devfrom
feat/beholder-frontmatter-meta-tags

Conversation

@YusukeHirao

Copy link
Copy Markdown
Member

Summary

  • <head> から取得する meta 情報を frontmatter-keys.md 仕様で全面拡張。フラットキー ('og:type', noindex 等) からネスト構造 (og.type, robots.noindex 等) に再設計。約 500 のキーをホワイトリストで型付き取得、未知キーは Meta.others に漏れなく保持
  • simple-wappalyzer (MIT) を依存に追加し、~2000 の技術スタックを検出。検出 provider をトリガーに id-extractors.ts で GA4/GTM/UA/FB Pixel/Hotjar/Clarity 等の実 ID を抽出して Meta.tags に格納
  • ブラウザ側は collectHead() で属性タプル列を返し、Node 側 classify() で純粋関数として分類するアーキテクチャに変更。raceWithTimeout 内で並列実行

Breaking Changes

@d-zero/beholderMeta 型と getMeta() の API が破壊的に変わりました。Major bump (2.x3.0.0) が必要です。

Before After
meta.canonical meta.link?.canonical
meta.alternate meta.link?.alternateHreflang (配列) など type/media で分岐
meta.noindex meta.robots?.noindex
meta.nofollow meta.robots?.nofollow
meta.noarchive meta.robots?.noarchive
meta['og:type'] meta.og?.type
meta['og:title'] meta.og?.title
meta['og:url'] meta.og?.url
meta['og:site_name'] meta.og?.siteName
meta['og:description'] meta.og?.description
meta['og:image'] meta.og?.image (配列)
meta['twitter:card'] meta.twitter?.card
getMeta(page, timeout?) getMeta(page, { url, html?, statusCode?, headers? }, timeout?)

新しい required フィールド: title, jsonLd, speculationRules, originTrial, tags, others。それ以外は optional。

New capabilities

  • viewport / robots / format-detection / referrer / theme-color (media 別) などのパース済みオブジェクト
  • Open Graph 全サブカテゴリ (article / book / profile / music / video)
  • Twitter Cards / Apple iOS / Microsoft tile / Verification (Google/Bing/Yandex 等) / Dublin Core / DC Terms / Geo / Citation / RDFa / Microdata / AMP / Pinterest / Fediverse / Microformats2
  • JSON-LD / SpeculationRules を JSON.parse 済みで取得 (パース失敗時は parseError 付き raw を保持)
  • Meta.tags: Wappalyzer カテゴリ別に技術スタックを集計、entries[] でフラットリスト
  • Meta.others: 未知の meta name / property / http-equiv / itemprop / link rel / script type / iframe を全件保持

Test plan

  • yarn lint 通過 (cspell 辞書に meta tag 関連の特殊用語を追加)
  • yarn build 通過 (27 packages)
  • yarn test 通過 (757 tests including 119 new tests for meta extraction)
  • CI 通過確認
  • 既存の e2e/crawl テストで regression がないこと

…and Wappalyzer tag detection

BREAKING CHANGE: `Meta` is restructured from flat keys (`noindex`, `canonical`,
`'og:type'`, `'twitter:card'`, ...) into a nested shape backed by
`frontmatter-keys.md`. New required fields: `title`, `jsonLd`,
`speculationRules`, `originTrial`, `tags`, `others`. `getMeta(page)` now takes
a context object `getMeta(page, { url, html?, statusCode?, headers? }, timeout?)`.
Old top-level shortcuts (`canonical`, `alternate`, `noindex`, `nofollow`,
`noarchive`, `'og:*'`, `'twitter:card'`) are removed; values move to
`meta.link.canonical`, `meta.robots.*`, `meta.og.*`, `meta.twitter.*` etc.

Changes:
- New `src/meta/` module: `types.ts`, `keys.ts`, `parsers.ts`, `classify.ts`,
  `id-extractors.ts`, `tag-detection.ts`, plus ambient `simple-wappalyzer.d.ts`
- Browser-side `collectHead()` serializes every `<meta>`, `<link>`, structured-data
  `<script>`, `<base>`, `<iframe>` plus a curated set of `window` globals into
  `RawHeadEntry[]`; Node-side `classify()` maps these to typed Meta fields
- `simple-wappalyzer` (MIT) added as a dependency for technology detection;
  detected providers run through `id-extractors.ts` for real ID extraction
  (GA4, GTM, UA, FB Pixel, Hotjar, Clarity, ...)
- Unknown markup is preserved under `Meta.others` (meta/property/httpEquiv/
  itemprop/link/script/iframe buckets) so nothing is silently dropped
- Tests: parsers/classify/id-extractors/tag-detection units + getMeta
  error/timeout fallback
@YusukeHirao YusukeHirao requested a review from yusasa16 as a code owner June 16, 2026 13:59
@YusukeHirao YusukeHirao merged commit e4a5b6c into dev Jun 16, 2026
2 checks passed
@YusukeHirao YusukeHirao deleted the feat/beholder-frontmatter-meta-tags branch June 16, 2026 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant