purolator: make tracking status description-aware for Undeliverable/Other scans#971
purolator: make tracking status description-aware for Undeliverable/Other scans#971HansDaigle wants to merge 1 commit into
Conversation
|
@HansDaigle is attempting to deploy a commit to the karrio Team on Vercel. A member of the Team first needs to authorize it. |
|
Implementation note: Purolator can emit the same |
|
Follow-up: added French-description handling as well (accent-insensitive normalization + FR fallback keywords for Undeliverable), since Purolator settings support language=fr. Added dedicated test coverage in modules/connectors/purolator/tests/purolator/test_tracking.py. |
danh91
left a comment
There was a problem hiding this comment.
Review
This one is different from the Canada Post PR (#970), and it deserves a different verdict. The core approach here is justified — but a few specific items need addressing before it lands.
Why description-aware mapping is warranted here
In main, the current Purolator connector maps all five scan types with no description awareness:
in_transit = [""] # nothing maps here
delivered = ["Delivery"]
delivery_failed = ["Undeliverable"]
out_for_delivery = ["OnDelivery"]That means every Undeliverable event — "Available for pickup", "Returned to sender", "Arrived at sort facility", "Delivery delayed" — maps to delivery_failed. That's not a minor inaccuracy, it's categorically wrong for most Undeliverable events. ProofOfPickUp doesn't map at all, so pickups fall to unknown.
Purolator intentionally designed five coarse scan types as administrative buckets. The description is the actual semantic signal. This is the API design, not an edge case — it's fundamentally different from Canada Post, where numeric codes are specific enough to carry meaning on their own. Here, not using descriptions means systematically wrong status output.
The normalization approach is also robust in a way that addresses the main objection to description matching:
- Unicode NFKD normalization + combining-character stripping → accent-insensitive for French/English
- Whitespace normalization → handles extra spaces
- Trailing period normalization → handles "text." vs "text"
- Apostrophe unification → handles
'vs' - Keyword fallback for
Undeliverable→ resilient to wording drift in the most important category
147,766 events and 141 distinct description variants is credible validation. This isn't theoretical.
What's solid
normalize_tracking_description— correct and robustUNDELIVERABLE_KEYWORD_RULES— the right approach; exact matches for known variants, keyword fallback for new wording- The two-layer lookup (normalized exact match → keyword fallback → default) is the right design
picked_up,return_to_sender,ready_for_pickup,delivery_delayed,pendingadded correctly- Null-safe scan/depot handling (
if scans else None,getattr(scan, "Depot", None)) — good defensive fix
Issues that need addressing
1. Delivery → in_transit override
"Delivery": {
"__default__": TrackingStatus.delivered.name,
"Transferring to Shipping Centre - please wait for further instructions": TrackingStatus.in_transit.name,
}This is the most dangerous entry in the mapping. You're overriding a ScanType=Delivery event to in_transit, asserting that Purolator mislabeled it. If it was observed in your production dataset, document how frequent it is and under what circumstances Purolator emits Delivery for an in-facility transfer. If it was rare (e.g., 1 out of 147K), reconsider whether it belongs in a generic mapping or should just pass through as delivered with the description visible. The description already passes through verbatim — integrators who need to act on this specific description can read it.
2. Undeliverable → pending entries
"Undeliverable": {
"Shipment created - interim manifest received": TrackingStatus.pending.name,
"Shipment created - final manifest received": TrackingStatus.pending.name,
"Shipment created": TrackingStatus.pending.name,
"Shipper created a label": TrackingStatus.pending.name,
...
}Why does Purolator emit ScanType=Undeliverable for pre-pickup events? These descriptions sound like initial label/manifest events that should appear under Other (where you already map them to pending). If Purolator is genuinely emitting Undeliverable for these, document it — the combination is surprising enough that it warrants an inline comment explaining what you observed. If they weren't actually seen under Undeliverable in your dataset, remove them.
3. TrackingIncidentReason lookup is inconsistent with the normalized mapping
In tracking.py:
reason=next(
(
r.name
for r in list(provider_units.TrackingIncidentReason)
if scan.ScanType in r.value or scan.Description in r.value
),
None,
)scan.Description in r.value does exact string membership check against the enum lists. This isn't normalized — accents, whitespace, trailing periods, apostrophes all break it. You've built a robust normalization layer for status; the reason lookup bypasses it entirely.
Two options: either apply normalize_tracking_description to both sides of the comparison, or drop the reason mapping from this PR and address it separately once the status mapping is stable. The status improvement is the high-value piece here — don't let an inconsistent reason lookup create new issues.
Rebase
Same drift issue as your other open PRs. The functional diff here is meaningful and worth reviewing clean.
Path forward
- Justify or remove the
Delivery→in_transitoverride — if it stays, add an inline comment citing the observed event frequency from your dataset - Remove
Undeliverable→pendingentries or add inline comments explaining why Purolator emits those combinations - Either normalize the reason lookup or defer it to a follow-up PR
- Rebase
danh91
left a comment
There was a problem hiding this comment.
The approach is sound and the improvement is real. Using description-aware mapping for Undeliverable events is the right call — Purolator intentionally designed five coarse scan types as buckets, and the description is the actual semantic signal. Without this, all Undeliverable events ("Available for pickup", "Returned to sender", "Weather delay", "No one home") collapse into delivery_failed, which is genuinely wrong for most of them.
The normalization layer (unicode NFKD, accent-stripping, whitespace and punctuation normalization) combined with the keyword fallback for unseen Undeliverable variants is robust and well-validated against real production data.
One ask before merging: please rebase on current main. The branch has drifted significantly — the functional diff is small and clean, but the accumulated divergence makes it hard to review confidently and will cause conflicts on merge. Once rebased this is ready to go.
622b703 to
5877deb
Compare
|
Rebased against current |
Summary
ScanTypeplus audited scan descriptionsUndeliverablewordingReview updates
karrioapi/karrio:mainas a single focused commit.Validation
Result:
25 passed, 1 warning.