Skip to content

#1890 - feat(playwright): capture rendered DOM on non-2xx responses#1891

Merged
jnioche merged 1 commit intomainfrom
1890
Apr 27, 2026
Merged

#1890 - feat(playwright): capture rendered DOM on non-2xx responses#1891
jnioche merged 1 commit intomainfrom
1890

Conversation

@rzo1
Copy link
Copy Markdown
Contributor

@rzo1 rzo1 commented Apr 27, 2026

Add two opt-in configuration flags to the Playwright protocol so that Single-Page Applications which return a non-2xx stub document and then hydrate via JavaScript can be crawled correctly:

  • playwright.capture.content.on.error: also capture page.content() when the origin response status is not 2xx (default false, preserves current behavior).
  • playwright.override.status.on.content: when content was captured for a non-2xx response, report status 200 to downstream components and preserve the original status in the response metadata under playwright.origin.status (default false).

Document both flags in the module README and playwright-conf.yaml.

Thank you for contributing to Apache StormCrawler.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes

  • Is there a issue associated with this PR? Is it referenced in the commit message?

  • Does your PR title start with #XXXX where XXXX is the issue number you are trying to resolve?

  • Has your PR been rebased against the latest commit within the target branch (typically main)?

  • Is your initial contribution a single, squashed commit?

  • Is the code properly formatted with mvn git-code-format:format-code -Dgcf.globPattern="**/*" -Dskip.format.code=false?

For code changes

  • Have you ensured that the full suite of tests is executed via mvn clean verify?
  • Have you written or updated unit tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE file, including the main LICENSE file?
  • If applicable, have you updated the NOTICE file, including the main NOTICE file?

Note

Please ensure that once the PR is submitted, you check GitHub Actions for build issues and submit an update to your PR as soon as possible.

Add two opt-in configuration flags to the Playwright protocol so that
Single-Page Applications which return a non-2xx stub document and then
hydrate via JavaScript can be crawled correctly:

- playwright.capture.content.on.error: also capture page.content() when
  the origin response status is not 2xx (default false, preserves
  current behavior).
- playwright.override.status.on.content: when content was captured for
  a non-2xx response, report status 200 to downstream components and
  preserve the original status in the response metadata under
  playwright.origin.status (default false).

Document both flags in the module README and playwright-conf.yaml.
@rzo1 rzo1 requested a review from jnioche April 27, 2026 07:49
@rzo1 rzo1 self-assigned this Apr 27, 2026
@rzo1 rzo1 added this to the 3.6.0 milestone Apr 27, 2026
@jnioche jnioche merged commit 0394f83 into main Apr 27, 2026
2 checks passed
@jnioche jnioche deleted the 1890 branch April 27, 2026 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants