diff --git a/.claude/preferences/editorial-preferences.md b/.claude/preferences/editorial-preferences.md index bc518d1c9..5aa7a09ed 100644 --- a/.claude/preferences/editorial-preferences.md +++ b/.claude/preferences/editorial-preferences.md @@ -4,6 +4,17 @@ This file captures Frances's editorial judgment — corrections made to Claude's --- +## Parameter descriptions + +**Copy existing wording precisely when a parameter is already documented elsewhere.** +When a parameter appears in another SenseML method's docs and the underlying implementation is the same, copy the description verbatim. Only drop clauses that are structurally specific to the other method's context (e.g., a conditional "If you use the Width or Height parameters..." that doesn't apply). Do not paraphrase, tighten, or rewrite. + +*Why:* Rewriting introduces subtle differences that require correction. The wording is already right — use it. + +*Example:* `percentOverlapX` / `percentOverlapY` already documented in `intersection.md` with confirmed shared implementation (`getLinesInRegion` in `method-helpers.ts`). Required two interruptions to get Claude to stop paraphrasing. (2026-06, branch `region`) + +--- + ## Examples **Evaluate updating existing examples before authoring new ones** diff --git a/.claude/sessions/2026-07-02-region-asImage/checklist.md b/.claude/sessions/2026-07-02-region-asImage/checklist.md new file mode 100644 index 000000000..bb0878674 --- /dev/null +++ b/.claude/sessions/2026-07-02-region-asImage/checklist.md @@ -0,0 +1,4 @@ +# Checklist — region: asImage + percentOverlapX/Y + +- [x] Investigate Vale style check failure: `style 'Google' does not exist on StylesPath` — fixed: Google package was never synced; ran `vale sync` to download it, added `asImage`/`percentOverlapX`/`percentOverlapY`/`includeImages`/`isAbsoluteOffset` to vocab accept list +- [x] Decide whether `asImage` needs a full example — no example needed (user decision) diff --git a/.claude/sessions/2026-07-02-region-asImage/frances-edits.md b/.claude/sessions/2026-07-02-region-asImage/frances-edits.md new file mode 100644 index 000000000..f1f3faa9c --- /dev/null +++ b/.claude/sessions/2026-07-02-region-asImage/frances-edits.md @@ -0,0 +1,100 @@ +# Frances's edits — region/asImage session (2026-07-02) + +Consolidated diff: `aacc76421..bd7a671a4` (4 commits) + +Files touched: `document-range.md`, `region.md`, `query-group.md`, new `concepts/images.md` + +--- + +## Frances's stated reasons + +1. Query Group and Document Range both have image-related parameters — they were siloed from each other and from the new `asImage` Region parameter. +2. Three methods now support image-related capabilities. That's enough to warrant surfacing image processing as a concept topic. Previously only Document Range supported it. + +--- + +## Changes by file + +### NEW: `docs/Senseml reference/concepts/images.md` + +Created from scratch. A 2-column lookup table (Use case | Method) with three rows: + +| Use case | Method | +|---|---| +| Use an LLM to extract structured data from an image. For example, extract facts about a photo of a building, such as its color and whether it's multistory-story or single-story. | use the [Query Group](doc:query-group) method with the Multimodal Engine parameter configured | +| Extract an image from a known region as an encoded string. For example, use this option when your documents contain complex charts, from which neither LLM-based nor layout-based methods can reliably extract structured data. Extract the chart as an image and show it to human to interpret. | use the [Region](doc:region) method with the As Image parameter configured | +| Search for non-labeled, non-text images in a range. For example, search for unlabeled photos of houses in a real estate document, and extract the images' coordinates. This option returns images' coordinates, which you can then use to render the images yourself. | use the [Document Range](doc:document-range) method with the Include Images parameter configured | + +Plus a Notes section with: +- Coordinate conventions (top-left origin, not bottom-left as in PDF.js; in inches; ordered clockwise from top-left) +- "This topic is about processing non-text images. For information about processing text images, see [OCR](doc:ocr)." + +Title of page: "Image processing" + +--- + +### `document-range.md` + +**`includeImages` parameter description** — from: +> "Returns the zero-indexed page number and coordinates of regions containing images in the document range. **Notes**: If you set `true`, also set`"type": "images"` in the `field` object (see Examples section for an example). Returns image region coordinates, not image bytes or text lines. To extract structured data from images, see the [Query Group](doc:query-group) method and configure the Multimodal Engine parameter." + +To: +> "If true, Sensible searches for images in the document range and returns the zero-indexed page number and coordinates of regions containing images in the range. **Notes**: If you set `true`, also set `"type": "images"` in the `field` object (see Examples section for an example). Returns image region coordinates, not image bytes or text lines. Sensible doesn't support this parameter for scanned documents. For rendering the image coordinates returned by this parameter, see [Notes](doc:document-range#notes). For alternatives to this parameter, see [Image processing](doc:images)." + +**Notes section heading** — from `## Extracting images` to `## Extracting images from Document Range coordinates` + +**Notes section body** — from: +> "The Document Range supports extracting non-text images that you can then render. For example, extract photos of buildings embedded in an inspection report and save them to a backend. It doesn't support extracting structured data from the images. +> +> **Note:** To extract structured data from an image, use the [Query Group](doc:query-group) method with the Multimodal Engine parameter configured. For example, extract facts about the building, such as whether it's multistory-story or single-story. +> +> To extract images, set `"includeImages":true` for the Document Range method. Sensible returns the image region coordinates rather than the actual encoded bytes of images. If you want to extract the images themselves, you can use a PDF library in your chosen programming language to follow these general steps: +> * Render the page containing the image to a bitmap. Page numbers are zero-indexed in the Sensible output. +> * Convert Sensible's coordinates for the image region to pixel per inch (PPI) coordinates. Sensible's region coordinates follow these conventions: +> * they're in reference to a 0.0 origin at the top left corner of the page (not the bottom left origin, as is for example the convention with the popular PDF.js library) +> * they're in inches (to convert inches to pixels, multiply the inches coordinates by your PPI setting...) +> * they're ordered clockwise from top left: (top left), (top right), (bottom right), (bottom left)" + +To: +> "When you use the Document Range's Include Images parameter to search for images in a range, the Document Range returns the coordinates of images it finds rather than the encoded bytes of the image. If you want to extract the images themselves, use a PDF library in your chosen programming language to follow these general steps: +> * Render the page containing the image to a bitmap. Page numbers are zero-indexed in the Sensible output. +> * Convert Sensible's [coordinates](doc:image#coordinate-conventions) for the image region to pixel per inch (PPI) coordinates. +> * Extract a partial bitmap defined by the PPI coordinates of the image from the rendered page. +> * Encode the bitmap to bytes in the image format of your choice." + +(Coordinate conventions bullets removed — now live in `images.md` and linked via `doc:image#coordinate-conventions`.) + +**Parameter table** — column widths normalized (no content changes to other rows). + +--- + +### `region.md` + +**`asImage` parameter description** — from: +> "When true, Sensible renders the region's bounding rectangle from the PDF page and returns it as a `data:image/png;base64,...` string. Use this to capture visual content — such as signatures, stamps, or checkboxes — instead of extracting text." + +To: +> "When true, Sensible returns the region as an image. Sensible returns the image's [coordinates](doc:images#notes) and its base64-encoded string, for example, `data:image/png;base64data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABMgAAABaCAYAAA....` Use this option to capture visual content instead of extracting text. For example, use this option when your documents contain complex charts, from which neither LLM-based nor layout-based methods can reliably extract structured data. Extract the chart as an image and show it to human to interpret. For alternatives to this parameter, see [Image processing](doc:images)." + +--- + +### `query-group.md` + +**Parameter table** — column widths normalized (no content changes to existing rows). + +**`multimodalEngine` description** — added at end: +> "For alternatives to the Multimodal parameter, see [Image processing](doc:images)." + +--- + +## Speculated reasons (beyond stated) + +**Coordinate conventions consolidated into `images.md`** — previously lived only in `document-range.md`. Now that Region's `asImage` also returns coordinates, a single canonical home that both methods can link to was needed. The concept topic created that home. + +**`asImage` description rewrite: mechanism-first → scenario-first** — Claude's original description led with the implementation ("renders the bounding rectangle... returns it as a data:image/png;base64,... string"). Frances rewrote to lead with what Sensible does from the user's perspective, consistent with her stated editorial preference for scenario-first framing. She also replaced the specific use-case list (signatures, stamps, checkboxes) with the chart-interpretation example, which better distinguishes `asImage` from the Signature method (which also deals with visual content in bounded regions). + +**"show it to an end-user to review" → "show it to human to interpret"** — "end-user" and "review" imply a UI display context. "Human to interpret" is broader and makes the use case more legible: the point is that automated extraction failed and a human judgment is required. + +**Scanned-doc limitation added to `includeImages`** — either always true and previously undocumented, or noticed while reviewing `includeImages` in the context of the broader image processing topic. + +**`query-group.md` table reformat** — triggered by opening the file to add the `[Image processing]` cross-ref; the very wide column widths were inconsistent with other tables and were normalized while editing. diff --git a/.claude/sessions/2026-07-02-region-asImage/friction-log.md b/.claude/sessions/2026-07-02-region-asImage/friction-log.md new file mode 100644 index 000000000..f976d38d9 --- /dev/null +++ b/.claude/sessions/2026-07-02-region-asImage/friction-log.md @@ -0,0 +1,42 @@ +# Friction log — region: asImage + percentOverlapX/Y (PR #3351, #3375) + +### 1. Loose paraphrasing instead of verbatim copy + +**What happened:** When adding `percentOverlapX` and `percentOverlapY` to the region parameter table, Claude rewrote the descriptions rather than copying the existing wording from `intersection.md`, where these same parameters are already documented. Required two interruptions to correct. + +**First correction:** "I want you to more precisely copy the wording of the percentOverlap params in intersection for this. don't be loose the way you were just now." + +**Second correction (id row):** "I prefer you keep the original wording but add 'by default' like 'where "contained" by default means...' — Configure these thresholds with the Percent Overlap X and Percent Overlap Y parameters." + +**Rule:** When wording already exists in another doc for the same parameter (confirmed to share the same underlying implementation), copy it precisely. Only adapt what's structurally required (e.g., dropping an opening clause that's intersection-specific). Do not paraphrase. + +--- + +### 2. Organized session artifact by commit instead of by end state + +**What happened:** `frances-edits.md` was initially written commit-by-commit (4 sections, one per commit). User corrected: when commits are iterative stages toward one end goal, the artifact should reflect the final diff, organized by file — not the path taken to get there. + +**Rule:** Session edit artifacts should use the consolidated diff between the baseline and the final state. Organize by file, not by commit. Only use commits as the unit of analysis if they represent meaningfully distinct decisions (e.g., different features, separate reviewers). Intermediate commits that are just stages of one authoring session are noise. + +--- + +### 3. Vale style check failed + +**What happened:** Vale MCP server (`mcp__vale__check_file`) was not available as a callable tool (not in deferred tool list). Fallback to CLI `vale` also failed: + +``` +[E100] [loadStyles] Runtime error +style 'Google' does not exist on StylesPath +``` + +**Result:** Step 5 (style check) was not completed. Files were not checked before committing. + +**Root cause:** Vale is not configured correctly in this environment — `StylesPath` does not contain the Google style. Needs investigation. + +--- + +### 3. Artifact location: memory vs. repo + +**What happened:** Friction log and checklist were initially written to the memory system (`~/.claude/projects/.../memory/`). User redirected to commit them to the repo instead. + +**Rule:** Session artifacts (friction, checklists, open questions) belong in the repo under `.claude/sessions//`, not in the memory system. Memory is for cross-project, persistent preferences and user context. diff --git a/.claude/skills/update-docs-from-pr/description.md b/.claude/skills/update-docs-from-pr/description.md new file mode 100644 index 000000000..683b247b3 --- /dev/null +++ b/.claude/skills/update-docs-from-pr/description.md @@ -0,0 +1,46 @@ +# update-docs-from-pr — How it works + +This skill takes a pull request number from the `sensible-hq/sensible` engine repo and produces a docs PR in `sensible-docs`. It runs six steps. + +--- + +## Step 1 — Fetch the PR (deterministic) + +Runs two fixed `gh` commands in parallel to get the PR's metadata and full diff, then scans the PR body for references to related PRs and fetches those too. This is mechanical: the commands either succeed or fail, and the output is fixed given a PR number. + +## Step 2 — Identify affected docs (non-deterministic) + +Claude reads the diff and infers which doc files need updating — by searching for existing mentions of changed features/parameters and mapping the code changes to the doc taxonomy (preprocessors, methods, field types, API, etc.). This is a judgment call. The same PR could reasonably lead to different conclusions about which pages need attention. Hints passed as arguments steer this step. + +## Step 3 — Plan the changes (non-deterministic) + +Claude loads the style guide (overview, template, sentence guidance, editorial preferences) and decides for each affected area whether to create a new page, update an existing page, or both. Writing or editing content to accurately reflect the engine change — including choosing what to say, what to emphasize, and what examples to include — is inherently generative and non-deterministic. + +## Step 4 — Create a branch and make the changes (mixed) + +Branch creation is deterministic (`git checkout -b fe__docs`). The actual file edits are non-deterministic: Claude writes or rewrites doc content, follows the style guide, and decides how to structure new parameters and examples. + +## Step 5 — Style check (mostly deterministic) + +Vale is run on every modified file via the MCP server. Errors and warnings are fixed; this is largely rule-driven and deterministic. Applying suggestions (the "maybe fix" tier) requires judgment — Claude decides whether a suggestion fits existing conventions — so that part is non-deterministic. + +## Step 6 — Commit and open a PR (deterministic) + +Stages only the changed files, commits with a fixed message format referencing the source PR, pushes the branch, and opens a docs PR with a structured body. All of this is mechanical. + +--- + +## Summary + +| Step | Nature | +|------|--------| +| 1. Fetch PR + related PRs | Deterministic | +| 2. Identify affected docs | Non-deterministic | +| 3. Plan and write content | Non-deterministic | +| 4. Branch creation | Deterministic | +| 4. File edits | Non-deterministic | +| 5. Vale errors/warnings | Deterministic | +| 5. Vale suggestions | Non-deterministic | +| 6. Commit, push, open PR | Deterministic | + +The riskiest non-deterministic steps are 2 and 3 — if Claude misidentifies which docs need updating, or misreads what the engine change means, everything downstream will be wrong. Review the PR diff yourself if the change is subtle. diff --git a/.github/styles/config/vocabularies/Sensible/accept.txt b/.github/styles/config/vocabularies/Sensible/accept.txt index 24ba9fd51..1a797d7da 100644 --- a/.github/styles/config/vocabularies/Sensible/accept.txt +++ b/.github/styles/config/vocabularies/Sensible/accept.txt @@ -42,6 +42,11 @@ Sensible's [Aa]nyco # SenseML parameter names (appear in table Name columns) +asImage +percentOverlapX +percentOverlapY +includeImages +isAbsoluteOffset [Mm]ultimodal [Mm]ultimodalEngine [Mm]ulticolumn diff --git a/docs/Senseml reference/concepts/images.md b/docs/Senseml reference/concepts/images.md new file mode 100644 index 000000000..bf74d583f --- /dev/null +++ b/docs/Senseml reference/concepts/images.md @@ -0,0 +1,35 @@ + + +``` +title: Image processing +excerpt: '' +deprecated: false +hidden: false +metadata: + title: '' + description: 'Extract images and image data from documents' + robots: index +next: + description: '' +``` + +You have the following options for processing non-text images in documents: + +| Use case | Method | +| ------------------------------------------------------------ | ------------------------------------------------------------ | +| Use an LLM to extract structured data from an image. For example, extract facts about a photo of a building, such as its color and whether it's multistory-story or single-story. | use the [Query Group](doc:query-group) method with the Multimodal Engine parameter configured | +| Extract an image from a known region as an encoded string. For example, use this option when your documents contain complex charts, from which neither LLM-based nor layout-based methods can reliably extract structured data. Extract the chart as an image and render it for a human to interpret. | use the [Region](doc:region) method with the As Image parameter configured | +| Search for non-labeled, non-text images in a range. For example, search for unlabeled photos of houses in a real estate document, and extract the images' coordinates. This option returns images' coordinates, which you can then use to render the images yourself. | use the [Document Range](doc:document-range) method with the Include Images parameter configured | + +## Notes + +- Sensible's rectangular coordinates for images follow these conventions: + + * they're in reference to a 0.0 origin at the *top left* corner of the page (not the bottom left origin, as is for example the convention with the popular PDF.js library) + + * they're in inches (to convert inches to pixels, multiply the inches coordinates by your PPI setting. For example, an x-coordinate of 3.156 inches is \~227 pixels for a PPI setting of 72 (72 PPI \* 3.156 inches)). + + * they're ordered clockwise from top left: (top left), (top right), (bottom right), (bottom left) + +- This topic is about processing non-text images. For information about processing text images, see [OCR](doc:ocr). + diff --git a/docs/Senseml reference/layout-based-methods/document-range.md b/docs/Senseml reference/layout-based-methods/document-range.md index e81fbc988..252df43e8 100644 --- a/docs/Senseml reference/layout-based-methods/document-range.md +++ b/docs/Senseml reference/layout-based-methods/document-range.md @@ -24,15 +24,15 @@ You can use this method to return the coordinates of regions containing images. **Note:** For additional parameters available for this method, see [Global parameters for methods](doc:method#global-parameters-for-methods). The following table shows parameters most relevant to or specific to this method. -| key | value | description | -| ----------------- | -------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| id (**required**) | `documentRange` | Optionally set `"type": "paragraph"` in the Field object to include newlines (`\n`) in the output. | -| stop | [Match object](doc:match) or array of Match objects. default: `none` | Stops extraction at the top boundary of the matched line. The matched line isn't included in the method output. If this parameter and the Num Lines parameter are unspecified, matches to the end of the document. | -| numLines | integer. | Alternative to the Stop parameter. Extracts the specified number of lines succeeding the anchor. | -| includeAnchor | boolean. default: `false` | Includes the anchor line in the method output. If true, included in the total line count for the Num Lines parameter. | -| includeImages | boolean. default: `false` | Returns the zero-indexed page number and coordinates of regions containing images in the document range . **Notes**:
If you set `true`, also set`"type": "images"` in the `field` object (see Examples section for an example).
Returns image region coordinates, not image bytes or text lines. To extract structured data from images, see the [Query Group](doc:query-group) method and configure the Multimodal Engine parameter. | -| offsetY | number in inches. | Specifies the number of inches to offset the start of the document range from the top boundary of the anchor line.
Positive values offset down the page, negative values offset up the page.
If the offset falls below all lines on the page containing the anchor, the offset starts at the top boundary of the first line on the next page that contains lines.
For an example, see the Examples section. | -| stopOffsetY | number in inches. | Specifies the number of inches to offset the end of the document range from the top boundary of the stop line.
Positive values offset down the page, negative values offset up the page.
If the offset falls below all lines on the page containing the anchor, the offset starts at the top boundary of the first line on the next page that contains lines. | +| key | value | description | +| ----------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | +| id (**required**) | `documentRange` | Optionally set `"type": "paragraph"` in the Field object to include newlines (`\n`) in the output. | +| stop | [Match object](doc:match) or array of Match objects. default: `none` | Stops extraction at the top boundary of the matched line. The matched line isn't included in the method output. If this parameter and the Num Lines parameter are unspecified, matches to the end of the document. | +| numLines | integer. | Alternative to the Stop parameter. Extracts the specified number of lines succeeding the anchor. | +| includeAnchor | boolean. default: `false` | Includes the anchor line in the method output. If true, included in the total line count for the Num Lines parameter. | +| includeImages | boolean. default: `false` | If true, Sensible searches for images in the document range and returns the zero-indexed page number and coordinates of regions containing images in the range. **Notes**:
If you set `true`, also set `"type": "images"` in the `field` object (see Examples section for an example).
Returns image region coordinates, not image bytes or text lines. Sensible doesn't support this parameter for scanned documents.
For rendering the image coordinates returned by this parameter, see [Notes](doc:document-range#notes).
For alternatives to this parameter, see [Image processing](doc:images). | +| offsetY | number in inches. | Specifies the number of inches to offset the start of the document range from the top boundary of the anchor line.
Positive values offset down the page, negative values offset up the page.
If the offset falls below all lines on the page containing the anchor, the offset starts at the top boundary of the first line on the next page that contains lines.
For an example, see the Examples section. | +| stopOffsetY | number in inches. | Specifies the number of inches to offset the end of the document range from the top boundary of the stop line.
Positive values offset down the page, negative values offset up the page.
If the offset falls below all lines on the page containing the anchor, the offset starts at the top boundary of the first line on the next page that contains lines. | # Examples @@ -257,18 +257,11 @@ The following image shows the example document used with this example config: # Notes -## Extracting images +## Extracting images from Document Range coordinates -The Document Range supports extracting non-text images that you can then render. For example, extract photos of buildings embedded in an inspection report and save them to a backend. It doesn't support extracting structured data from the images. - -**Note:** To extract structured data from an image, use the [Query Group](doc:query-group) method with the Multimodal Engine parameter configured. For example, extract facts about the building, such as whether it's multistory-story or single-story. - -To extract images, set `"includeImages":true` for the Document Range method. Sensible returns the image region coordinates rather than the actual encoded bytes of images. If you want to extract the images themselves, you can use a PDF library in your chosen programming language to follow these general steps: +When you use the Document Range's Include Images parameter to search for images in a range, the Document Range returns the coordinates of images it finds rather than the encoded bytes of the image. If you want to extract the images themselves, use a PDF library in your chosen programming language to follow these general steps: * Render the page containing the image to a bitmap. Page numbers are zero-indexed in the Sensible output. -* Convert Sensible's coordinates for the image region to pixel per inch (PPI) coordinates. Sensible's region coordinates follow these conventions: - * they're in reference to a 0.0 origin at the *top left* corner of the page (not the bottom left origin, as is for example the convention with the popular PDF.js library) - * they're in inches (to convert inches to pixels, multiply the inches coordinates by your PPI setting. For example, an x-coordinate of 3.156 inches is \~227 pixels for a PPI setting of 72 (72 PPI \* 3.156 inches)). - * they're ordered clockwise from top left: (top left), (top right), (bottom right), (bottom left) +* Convert Sensible's [coordinates](doc:image#coordinate-conventions) for the image region to pixel per inch (PPI) coordinates. * Extract a partial bitmap defined by the PPI coordinates of the image from the rendered page. * Encode the bitmap to bytes in the image format of your choice. diff --git a/docs/Senseml reference/layout-based-methods/region.md b/docs/Senseml reference/layout-based-methods/region.md index 77c25664b..37fc2f731 100644 --- a/docs/Senseml reference/layout-based-methods/region.md +++ b/docs/Senseml reference/layout-based-methods/region.md @@ -10,7 +10,7 @@ metadata: next: description: '' --- -Extracts data in a rectangular region, defined in inches. The region extracts lines contained inside the region (for the definition of "contained", see the Parameters section). +Extracts data in a rectangular region, defined in inches. The region extracts lines contained inside the region (for the definition of "contained," see the Parameters section). In general, use this method: @@ -27,13 +27,16 @@ In general, use this method: | key | value | description | | ---------------------- | --------------------------------- | ------------------------------------------------------------ | -| id (**required**) | `region` | Extracts lines contained in the region, where "contained" means:
- condition 1: the region and the line's widths overlap by more than 90% of the smaller of the two's width.
AND
- condition 2: the region and the line's heights overlap by more than 80% of the smaller of the two's height. | +| id (**required**) | `region` | Extracts lines contained in the region, where "contained" by default means:
- condition 1: the region and the line's widths overlap by more than 90% of the smaller of the two's width.
AND
- condition 2: the region and the line's heights overlap by more than 80% of the smaller of the two's height.
Configure these thresholds with the Percent Overlap X and Percent Overlap Y parameters. | | start (**required**) | `above`, `below`, `left`, `right` | Defines the initial coordinates of the region's top-left corner (its "start point") relative to the anchor line's boundaries. For example, `right` specifies that the corner is at the midpoint of the anchor line's right boundary, and `below` specifies that the corner is at the midpoint of the anchor line's bottom boundary. | | offsetX (**required**) | number | Horizontally shifts the region's top-left corner from the point defined in the Start parameter by the specified number of inches. Positive values offset to the right, negative values offset to the left.
You can visually determine this number in the Sensible app by changing the number and watching the green region box resize, or by clicking a point in the document in the Sensible app, then dragging to display inch dimensions. | | offsetY (**required**) | number | Vertically shifts the region's top-left corner from the point defined in the Start parameter by the specified number of inches. Positive values offset down the page, negative values offset up the page.
You can visually determine this number in the Sensible app by changing the number and watching the green region box resize, or by clicking a point in the document in the Sensible app, then dragging to display inch dimensions. | | width (**required**) | number | The width in inches of the region. 
You can visually determine this number in the Sensible app by changing the number and watching the green region box resize, or by clicking a point in the document in the Sensible app, then dragging to display inch dimensions. | | height (**required**) | number | The height in inches of the region. 
You can visually determine this number in the Sensible app by changing the number and watching the green region box resize, or by clicking a point in the document in the Sensible app, then dragging to display inch dimensions. | | isAbsoluteOffset | boolean. default: `false` | Makes the offsets relative to the 0,0 origin at the top left of the page rather than to the point defined in the Start parameter. | +| asImage | boolean. default: `false` | When true, Sensible returns the region as a PNG data URI, for example, `data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABMgAAABaCAYAAA....` Sensible also returns the image's [coordinates](doc:images#notes). Use this option to capture visual content instead of extracting text. For example, use this option when your documents contain complex charts, from which neither LLM-based nor layout-based methods can reliably extract structured data. Extract the chart as an image and render it for a human to interpret.
For alternatives to this parameter, see [Image processing](doc:images). | +| percentOverlapX | number. default: `0.9` | Configures the strictness of the criteria by which a region "contains" a line. By default, Sensible determines that a region contains a line if their widths overlap by more than 90% of the smaller of the two's width. Loosen the criteria if a line can partly fall outside a region. For example, if you set this parameter to 0.5, then Sensible determines that a region contains a line if their widths overlap by more than 50% of the smaller of the two's width. Note the line must also meet the Percent Overlap Y parameter's criteria. | +| percentOverlapY | number. default: `0.8` | Configures strictness in the same manner as the Percent Overlap X parameter, but applies to height instead of width. | ## Syntax example @@ -51,7 +54,10 @@ The following example shows the preceding parameters documented with in-line com "offsetY": 0.00, /* vertically shifts the region's top-left corner specified in the Start parameter by the specified number of inches. positive: down, negative: up */ "width": 0.00, /* width of the region in inches */ "height": 0.00, /* height of the region in inches */ - "isAbsoluteOffset": false /* default: false. if true, offsets are relative to the top-left of the page, not to the Start parameter */ + "isAbsoluteOffset": false, /* default: false. if true, offsets are relative to the top-left of the page, not to the Start parameter */ + "asImage": false, /* default: false. if true, returns the region rendered as a data:image/png;base64,... string instead of extracting text */ + "percentOverlapX": 0.9, /* default: 0.9. fraction of width overlap required for a line to be inside the region; 0 accepts any overlap */ + "percentOverlapY": 0.8 /* default: 0.8. same as percentOverlapX, but for height */ } } ``` diff --git a/docs/Senseml reference/llm-based-methods/query-group.md b/docs/Senseml reference/llm-based-methods/query-group.md index fb7d37e7c..21fe8a550 100644 --- a/docs/Senseml reference/llm-based-methods/query-group.md +++ b/docs/Senseml reference/llm-based-methods/query-group.md @@ -62,22 +62,22 @@ For information about how this method works, see [Notes](doc:query-group#notes). ## Query group parameters -| key | value | description | interactions | -| :-------------------- | :--------------------------------------------------------------------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| id (**required**) | `queryGroup` | | | -| queries | array of objects | An array of query objects, where each extracts a single fact and outputs a single field. Each query contains the following parameters:
`id` (**required**) - The ID for the extracted field.
`description` (**required**) - A free-text question about information in the document. For example, `"what's the policy period?"` or `"what's the client's first and last name?"`. For more information about how to write questions (or "prompts"), see [Query Group](https://docs.sensible.so/docs/query-group-tips) extraction tips. | | -| | | _**CHAIN PROMPTS**_ | | -| source_ids | array of field IDs in the current config
or
object | If specified, prompts an LLM to extract data from another field's output. For example, if you extract a field `_checking_transactions` and specify it in this parameter, then Sensible searches for the answer to `what is the largest transaction?` in `_checking_transactions`, rather than searching the whole document to locate the [context](doc:prompt). Note that the `_checking_transactions` field must precede the `largest_transaction` field in the fields array in this example. For more information about this example, see [Example: Transform fields](doc:query-group#example-transform-fields).

You can use a JavaScript-flavored regular expression to specify all field IDs that contain a pattern. For example, to specify all the field IDs containing the text `wage` extracted from a W-2 form, you can write `"source_ids": { "pattern": ".*wage.*" }`. For an example, see [Example: Chain prompts with regex](doc:query-group#example-chain-prompts-with-regex).
When you use regex, Sensible populates the IDs in the array in the same order in which they're defined in the config. Sensible automatically expands your pattern to include string beginning and ending characters. For example, it expands the pattern `".*wage.*"` to `"^.*wage.*$"`.

To extract repeating data, such as a list, specify the Source Ids parameter for the [List](doc:list#parameters) method rather than for the Query Group method.

For more information about chaining prompts, see [Advanced LLM prompt configuration](doc:prompt#locate-context-by-pipelining-prompts).

The source field IDs can reference document data extracted from the current extraction, or caller-provided data such as data from a previous extraction or from a system of record. To supply caller-provided values, use the [Extra Data](doc:extra-data) method to create a field from a value in the request's `extra_data` object, then include that field ID in `source_ids`. | If you configure this parameter, then the following parameters aren't supported:
- Anchor parameter in the field
- Confidence Signals
- Multimodal Engine parameter
- Search By Summarization parameter
- Page Range parameter | -| | | _**EXTRACT FROM IMAGES**_ | | -| multimodalEngine | object | Configure this parameter to:
- Extract data from images embedded in a document, for example, photos, charts, or illustrations.
- Troubleshoot extracting from complex text layouts, such as overlapping lines, lines between lines, and handwriting. For example, use this as an alternative to the [Signature](doc:signature) method, the [Nearest Checkbox](doc:nearest-checkbox) method, the [OCR engine](doc:ocr-engine), and line [preprocessors](doc:preprocessors).

This parameter sends an image of the document region containing the target data to a multimodal LLM so that you can ask questions about text and non-text images. This bypasses Sensible's [OCR](doc:ocr) and direct-text extraction processes for the region.
This parameter has the following parameters:

- `region`: The document region to send as an image to the multimodal LLM. Configurable with the following options :

    - To automatically select top-scoring document chunks as the region, specify `"region": "automatic"`. If you configure this option, then help Sensible locate the region by including queries in the group that target text [lines](doc:lines) near the image you want to extract from.

    - To manually specify a region, specify an [anchor](doc:anchor) close to the region you want to capture. Specify the region's dimensions in inches relative to the anchor using the [Region](doc:region) method's parameters, for example:
`"region": { "start": "below", "width": 8, "height": 1.2, "offsetX": -2.5,"offsetY": -0.25}`


- `onlyImages`: boolean. default: false. Configure this to troubleshoot image resolution. If set to true, Sensible sends only the images it detects overlapping the region and omits any [lines](doc:lines) overlapping the region. Sends the images at their original resolution. For an example, see [Example: troubleshoot image resolution](doc:query-group#example-troubleshoot-image-resolution). | If you configure this parameter, then the Confidence Signals parameter isn't supported. | -| | | _**TROUBLESHOOT PROMPT**_ | | -| llmEngine | object | Configures the LLM model Sensible uses to extract data from [context](doc:prompt).
Configure this parameter to troubleshoot situations in which Sensible correctly identifies the part of the document that contains the answers to your prompts, but the LLM's answer contains problems. For example, Sensible returns an LLM error because the answer isn't properly formatted, or the LLM doesn't follow instructions in your prompt.

Contains the following parameters:
`provider`:
- If set to `open-ai` (default), Sensible uses an OpenAI model.
- If set to `anthropic`, Sensible uses an Anthropic model.
- If set to `google`, Sensible uses a Google model.
For more information about models, see [LLM models](doc:llm-models). | | -| confidenceSignals | boolean
or
`"strict"`
or
object
default: `true` | If true, Sensible prompts the LLM to report any uncertainties it has about the accuracy of its response. For more information, see [Qualifying LLM accuracy](doc:confidence).

If `"strict"`, Sensible returns null for a field if its confidence signal is `incorrect_answer`.

If an object, has `engine` and `strict` parameters, for example `{ "engine": "google", "strict": true }` :
- `engine`: the LLM provider to use for confidence signals. `"open-ai"`, `"anthropic"`, or `"google"`. Default: `"open-ai"`. For example, specify a provider to avoid output changes if Sensible updates the default provider. For more information, see [LLM models](doc:llm-models).
- `strict`: see previous description for behavior.

If you specify an object, at least one property must be present. For example: `{ "engine": "open-ai" }`. | Not supported if you specify the Multimodal Engine parameter or Source Ids parameter | -| | | _**FIND CONTEXT**_ | | -| searchBySummarization | boolean,
or
`outline`, or
`page` (equivalent to `true`)
default: `false`
| (Recommended) Configure this to search for [context](doc:prompt) using summaries of document chunks.
If you set `page`, each page is a chunk.
If you set `outline`, an LLM outlines the document, and each segment of the outline is a chunk.
For more information, see [Advanced LLM prompt configuration](doc:prompt#recommended-locate-context-by-summarizing-document).
This parameter is compatible with documents up to 1,280 pages long.
For an example, see the [Multicolumn](doc:multicolumn#examples) preprocessor. | If you configure this parameter for a document 5 pages or under in length, Sensible submits the entire document as context, bypassing summarization.
If you configure this parameter for a document over 5 pages long, then Sensible sets the Chunk Count parameter to 5 and ignores any configured value. | -| pageRange | object | Configures the possible page range for finding the context in the document.
If specified, Sensible creates chunks in the page range and ignores other pages. For example, use this parameter to improve performance, or to avoid extracting unwanted data if your prompt has multiple candidate answers.

Contains the following parameters:
`startPage`: Zero-based index of the page at which Sensible starts creating chunks (inclusive).
`endPage`: Zero-based index of the page at which Sensible stops creating chunks (exclusive). | Sensible ignores this parameter when searching for a field's [anchor](doc:anchor). If you want to exclude the field's anchor using a page range, use the [Page Range](doc:page-range) preprocessor instead. | -| | | _**CONFIGURE CONTEXT SIZE**_ | | -| chunkCount | number.
default: 5 | The number of top-scoring document chunks Sensible combines as context as part of the full prompt it submits to an LLM. | | +| key | value | description | interactions | +| :-------------------- | :----------------------------------------------------------- | :----------------------------------------------------------- | ------------------------------------------------------------ | +| id (**required**) | `queryGroup` | | | +| queries | array of objects | An array of query objects, where each extracts a single fact and outputs a single field. Each query contains the following parameters:
`id` (**required**) - The ID for the extracted field.
`description` (**required**) - A free-text question about information in the document. For example, `"what's the policy period?"` or `"what's the client's first and last name?"`. For more information about how to write questions (or "prompts"), see [Query Group](https://docs.sensible.so/docs/query-group-tips) extraction tips. | | +| | | _**CHAIN PROMPTS**_ | | +| source_ids | array of field IDs in the current config
or
object | If specified, prompts an LLM to extract data from another field's output. For example, if you extract a field `_checking_transactions` and specify it in this parameter, then Sensible searches for the answer to `what is the largest transaction?` in `_checking_transactions`, rather than searching the whole document to locate the [context](doc:prompt). Note that the `_checking_transactions` field must precede the `largest_transaction` field in the fields array in this example. For more information about this example, see [Example: Transform fields](doc:query-group#example-transform-fields).

You can use a JavaScript-flavored regular expression to specify all field IDs that contain a pattern. For example, to specify all the field IDs containing the text `wage` extracted from a W-2 form, you can write `"source_ids": { "pattern": ".*wage.*" }`. For an example, see [Example: Chain prompts with regex](doc:query-group#example-chain-prompts-with-regex).
When you use regex, Sensible populates the IDs in the array in the same order in which they're defined in the config. Sensible automatically expands your pattern to include string beginning and ending characters. For example, it expands the pattern `".*wage.*"` to `"^.*wage.*$"`.

To extract repeating data, such as a list, specify the Source Ids parameter for the [List](doc:list#parameters) method rather than for the Query Group method.

For more information about chaining prompts, see [Advanced LLM prompt configuration](doc:prompt#locate-context-by-pipelining-prompts).

The source field IDs can reference document data extracted from the current extraction, or caller-provided data such as data from a previous extraction or from a system of record. To supply caller-provided values, use the [Extra Data](doc:extra-data) method to create a field from a value in the request's `extra_data` object, then include that field ID in `source_ids`. | If you configure this parameter, then the following parameters aren't supported:
- Anchor parameter in the field
- Confidence Signals
- Multimodal Engine parameter
- Search By Summarization parameter
- Page Range parameter | +| | | _**EXTRACT FROM IMAGES**_ | | +| multimodalEngine | object | Configure this parameter to:
- Extract data from images embedded in a document, for example, photos, charts, or illustrations.
- Troubleshoot extracting from complex text layouts, such as overlapping lines, lines between lines, and handwriting. For example, use this as an alternative to the [Signature](doc:signature) method, the [Nearest Checkbox](doc:nearest-checkbox) method, the [OCR engine](doc:ocr-engine), and line [preprocessors](doc:preprocessors).

This parameter sends an image of the document region containing the target data to a multimodal LLM so that you can ask questions about text and non-text images. This bypasses Sensible's [OCR](doc:ocr) and direct-text extraction processes for the region.
This parameter has the following parameters:

- `region`: The document region to send as an image to the multimodal LLM. Configurable with the following options :

    - To automatically select top-scoring document chunks as the region, specify `"region": "automatic"`. If you configure this option, then help Sensible locate the region by including queries in the group that target text [lines](doc:lines) near the image you want to extract from.

    - To manually specify a region, specify an [anchor](doc:anchor) close to the region you want to capture. Specify the region's dimensions in inches relative to the anchor using the [Region](doc:region) method's parameters, for example:
`"region": { "start": "below", "width": 8, "height": 1.2, "offsetX": -2.5,"offsetY": -0.25}`


- `onlyImages`: boolean. default: false. Configure this to troubleshoot image resolution. If set to true, Sensible sends only the images it detects overlapping the region and omits any [lines](doc:lines) overlapping the region. Sends the images at their original resolution. For an example, see [Example: troubleshoot image resolution](doc:query-group#example-troubleshoot-image-resolution).
For alternatives to the Multimodal parameter, see [Image processing](doc:images). | If you configure this parameter, then the Confidence Signals parameter isn't supported. | +| | | _**TROUBLESHOOT PROMPT**_ | | +| llmEngine | object | Configures the LLM model Sensible uses to extract data from [context](doc:prompt).
Configure this parameter to troubleshoot situations in which Sensible correctly identifies the part of the document that contains the answers to your prompts, but the LLM's answer contains problems. For example, Sensible returns an LLM error because the answer isn't properly formatted, or the LLM doesn't follow instructions in your prompt.

Contains the following parameters:
`provider`:
- If set to `open-ai` (default), Sensible uses an OpenAI model.
- If set to `anthropic`, Sensible uses an Anthropic model.
- If set to `google`, Sensible uses a Google model.
For more information about models, see [LLM models](doc:llm-models). | | +| confidenceSignals | boolean
or
`"strict"`
or
object
default: `true` | If true, Sensible prompts the LLM to report any uncertainties it has about the accuracy of its response. For more information, see [Qualifying LLM accuracy](doc:confidence).

If `"strict"`, Sensible returns null for a field if its confidence signal is `incorrect_answer`.

If an object, has `engine` and `strict` parameters, for example `{ "engine": "google", "strict": true }` :
- `engine`: the LLM provider to use for confidence signals. `"open-ai"`, `"anthropic"`, or `"google"`. Default: `"open-ai"`. For example, specify a provider to avoid output changes if Sensible updates the default provider. For more information, see [LLM models](doc:llm-models).
- `strict`: see previous description for behavior.

If you specify an object, at least one property must be present. For example: `{ "engine": "open-ai" }`. | Not supported if you specify the Multimodal Engine parameter or Source Ids parameter | +| | | _**FIND CONTEXT**_ | | +| searchBySummarization | boolean,
or
`outline`, or
`page` (equivalent to `true`)
default: `false`
| (Recommended) Configure this to search for [context](doc:prompt) using summaries of document chunks.
If you set `page`, each page is a chunk.
If you set `outline`, an LLM outlines the document, and each segment of the outline is a chunk.
For more information, see [Advanced LLM prompt configuration](doc:prompt#recommended-locate-context-by-summarizing-document).
This parameter is compatible with documents up to 1,280 pages long.
For an example, see the [Multicolumn](doc:multicolumn#examples) preprocessor. | If you configure this parameter for a document 5 pages or under in length, Sensible submits the entire document as context, bypassing summarization.
If you configure this parameter for a document over 5 pages long, then Sensible sets the Chunk Count parameter to 5 and ignores any configured value. | +| pageRange | object | Configures the possible page range for finding the context in the document.
If specified, Sensible creates chunks in the page range and ignores other pages. For example, use this parameter to improve performance, or to avoid extracting unwanted data if your prompt has multiple candidate answers.

Contains the following parameters:
`startPage`: Zero-based index of the page at which Sensible starts creating chunks (inclusive).
`endPage`: Zero-based index of the page at which Sensible stops creating chunks (exclusive). | Sensible ignores this parameter when searching for a field's [anchor](doc:anchor). If you want to exclude the field's anchor using a page range, use the [Page Range](doc:page-range) preprocessor instead. | +| | | _**CONFIGURE CONTEXT SIZE**_ | | +| chunkCount | number.
default: 5 | The number of top-scoring document chunks Sensible combines as context as part of the full prompt it submits to an LLM. | | ## Examples