Skip to content

fix(tokenize): use camelCase @SerializedName values on TextAnalyzer + add round-trip test#568

Open
g-despot wants to merge 3 commits intomainfrom
fix-tokenizer
Open

fix(tokenize): use camelCase @SerializedName values on TextAnalyzer + add round-trip test#568
g-despot wants to merge 3 commits intomainfrom
fix-tokenizer

Conversation

@g-despot
Copy link
Copy Markdown
Contributor

Summary

The TextAnalyzer record was annotated with snake_case @SerializedName values (ascii_fold, ascii_fold_ignore, stopword_preset), but Weaviate's REST schema expects camelCase (asciiFold, asciiFoldIgnore, stopwordPreset) — so every per-property analyzer setting was silently dropped on the wire. Verified by comparing the stored schema after Property.text(... textAnalyzer(TextAnalyzer.of(t -> t.foldAscii(true).keepAscii("é")))): pre-patch the property comes back with no textAnalyzer field at all; post-patch it stores { "asciiFold": true, "asciiFoldIgnore": ["é"] }.

This PR flips the three @SerializedName values to camelCase and adds TextAnalyzerITest covering both the schema round-trip (collection.config.get() returns the persisted analyzer) and the behavioral effect (a cafe filter on a folded Café Crème row only matches the property with foldAscii(true)). The new test fails on the snake_case version and passes after the fix — confirmed by toggling locally.

Test plan

  • WEAVIATE_VERSION=1.37.2 mvn verify -Dit.test=TextAnalyzerITest -Dmaven.javadoc.skip=true — passes against the patched record, fails 2/2 when the @SerializedName values are reverted to snake_case
  • Full docs Java v6 suite (pytest -m java_v6 -k "Tokenization or SearchProfile") — green against the patched SNAPSHOT

🤖 Generated with Claude Code

Copy link
Copy Markdown

@orca-security-eu orca-security-eu Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

@g-despot g-despot requested a review from bevzzz April 30, 2026 13:53
@g-despot g-despot requested a review from a team as a code owner April 30, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant