feat(pii): Base SpanData PII on relay-conventions#5997
Conversation
This replaces the default PII value on `SpanData` with a function that looks up a field's name (as defined by the `metastructure` annotation) in `sentry-conventions`. If the name isn't found, the function returns `Pii::True` to be safe.
| ) | ||
| ( | ||
| [^/\\\r\n]+ | ||
| [^/\\\r\n\x00]+ |
There was a problem hiding this comment.
If this regex can match a null byte it causes problems with redactions.
| "url.full": { | ||
| "type": "string", | ||
| "value": "https://www.service.io/users/01234-qwerty/settings/98765-adfghj", | ||
| "value": "https://www.service.io/users/[user]/settings/98765-adfghj", |
There was a problem hiding this comment.
This attribute has become pii: "true" in the current conventions version.
| "sentry.description": { | ||
| "type": "string", | ||
| "value": "GET https://www.service.io/users/01234-qwerty/settings/98765-adfghj", | ||
| "value": "GET https://www.service.io/users/[user]/settings/98765-adfghj", |
There was a problem hiding this comment.
This attribute is filled from url.full, hence inherits the redaction.
Dav1dde
left a comment
There was a problem hiding this comment.
This is a change with potentially big impact, are you confident we have sufficient test coverage already?
| match self.pii() { | ||
| Pii::True => Some(Cow::Borrowed(&PII_TRUE_FIELD_ATTRS)), | ||
| Pii::False => None, | ||
| Pii::Maybe => Some(Cow::Borrowed(&PII_MAYBE_FIELD_ATTRS)), | ||
| match self.attrs().pii { | ||
| PiiMode::Static(Pii::True) => Some(Cow::Borrowed(&PII_TRUE_FIELD_ATTRS)), | ||
| PiiMode::Static(Pii::False) => None, | ||
| PiiMode::Static(Pii::Maybe) => Some(Cow::Borrowed(&PII_MAYBE_FIELD_ATTRS)), | ||
| PiiMode::Dynamic(f) => Some(Cow::Owned(DEFAULT_FIELD_ATTRS.pii_dynamic(f))), |
There was a problem hiding this comment.
Previously, when deriving the attributes for recursion, we would eagerly resolve a dynamic PII value function into a static value. Now we instead pass the dynamic function through.
| /// Other fields in `span.data`. | ||
| #[metastructure( | ||
| additional_properties, | ||
| pii = "true", |
There was a problem hiding this comment.
Additional properties lose unconditional PII scrubbing safety net
Medium Severity
The other field (additional properties bag) had its explicit pii = "true" removed and now inherits the struct-level dynamic span_data_pii_from_conventions function. When inner_attrs propagates this dynamic function to child entries, any attribute that exists in sentry-conventions with pii = false will no longer be scrubbed — even for unknown, user-supplied data landing in additional_properties. This is a relaxation of the previous unconditional scrubbing policy for the catch-all bag. Combined with the new PiiMode::Dynamic arm in inner_attrs always returning Some(...), deeply nested values within these entries also inherit the dynamic lookup rather than the previous flat Pii::True.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit e48ff37. Configure here.
There was a problem hiding this comment.
As long as we have test coverage for these cases, I don't see that as a problem
|
Oh, and to actually answer this:
No, not necessarily 😞. I would probably extend the |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7fa8c6d. Configure here.
| PiiMode::Static(Pii::True) => Some(Cow::Borrowed(&PII_TRUE_FIELD_ATTRS)), | ||
| PiiMode::Static(Pii::False) => None, | ||
| PiiMode::Static(Pii::Maybe) => Some(Cow::Borrowed(&PII_MAYBE_FIELD_ATTRS)), | ||
| PiiMode::Dynamic(f) => Some(Cow::Owned(DEFAULT_FIELD_ATTRS.pii_dynamic(f))), |
There was a problem hiding this comment.
Dynamic PII in inner_attrs over-scrubs nested field values
Medium Severity
The PiiMode::Dynamic(f) arm in inner_attrs passes span_data_pii_from_conventions through to children of complex-typed fields (e.g., gen_ai.input.messages of type Annotated<Value>). When such a field contains a nested object, each sub-key (like "role" or "content") is looked up in conventions via state.keys().next(). Sub-keys not found in conventions get Pii::True, whereas previously (with pii = "maybe" on SpanData) they inherited Pii::Maybe. This upgrades scrubbing from "only specific path selectors" to "generic selectors like $string match," causing unintended data removal in nested structures.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 7fa8c6d. Configure here.


This replaces the default PII value on
SpanDatawith a function thatlooks up a field's name (as defined by the
metastructureannotation)in
sentry-conventions. If the name isn't found, the function returnsPii::Trueto be safe.It also updates
sentry-conventionsfor getsentry/sentry-conventions@523f1f0.Additionally, it fixes a very annoying edge case bug in PII string processing: the logic assumes that a null byte can be freely inserted into strings to mark already removed chunks, but some PII regexes do match null bytes. I've corrected one such regex that otherwise would cause a test failure.
Closes INGEST-919.