purolator: make tracking status description-aware for Undeliverable/Other scans by HansDaigle · Pull Request #971 · karrioapi/karrio

HansDaigle · 2026-02-20T15:35:40Z

Summary

normalize Purolator tracking status using ScanType plus audited scan descriptions
keep safe defaults for unknown descriptions and keyword fallback for unseen Undeliverable wording
emit shipment-level and event-level normalized statuses from the shared mapper
keep the diff limited to Purolator tracking parser, units, and tests

Review updates

Rebuilt on current karrioapi/karrio:main as a single focused commit.
Removed branch drift so the PR only contains the Purolator tracking-status normalization change.

Validation

.venv/bin/python -m pytest modules/connectors/purolator/tests/purolator -q

Result: 25 passed, 1 warning.

vercel · 2026-02-20T15:35:45Z

@HansDaigle is attempting to deploy a commit to the karrio Team on Vercel.

A member of the Team first needs to authorize it.

HansDaigle · 2026-02-20T15:36:21Z

Implementation note: Purolator can emit the same ScanType with different Description values that imply different outcomes. This PR intentionally maps by ScanType + Description (with keyword fallback) so these cases do not collapse into a single status.

HansDaigle · 2026-02-20T15:50:47Z

Follow-up: added French-description handling as well (accent-insensitive normalization + FR fallback keywords for Undeliverable), since Purolator settings support language=fr. Added dedicated test coverage in modules/connectors/purolator/tests/purolator/test_tracking.py.

danh91

Review

This one is different from the Canada Post PR (#970), and it deserves a different verdict. The core approach here is justified — but a few specific items need addressing before it lands.

Why description-aware mapping is warranted here

In main, the current Purolator connector maps all five scan types with no description awareness:

in_transit = [""]           # nothing maps here
delivered  = ["Delivery"]
delivery_failed = ["Undeliverable"]
out_for_delivery = ["OnDelivery"]

That means every Undeliverable event — "Available for pickup", "Returned to sender", "Arrived at sort facility", "Delivery delayed" — maps to delivery_failed. That's not a minor inaccuracy, it's categorically wrong for most Undeliverable events. ProofOfPickUp doesn't map at all, so pickups fall to unknown.

Purolator intentionally designed five coarse scan types as administrative buckets. The description is the actual semantic signal. This is the API design, not an edge case — it's fundamentally different from Canada Post, where numeric codes are specific enough to carry meaning on their own. Here, not using descriptions means systematically wrong status output.

The normalization approach is also robust in a way that addresses the main objection to description matching:

Unicode NFKD normalization + combining-character stripping → accent-insensitive for French/English
Whitespace normalization → handles extra spaces
Trailing period normalization → handles "text." vs "text"
Apostrophe unification → handles ' vs '
Keyword fallback for Undeliverable → resilient to wording drift in the most important category

147,766 events and 141 distinct description variants is credible validation. This isn't theoretical.

What's solid

normalize_tracking_description — correct and robust
UNDELIVERABLE_KEYWORD_RULES — the right approach; exact matches for known variants, keyword fallback for new wording
The two-layer lookup (normalized exact match → keyword fallback → default) is the right design
picked_up, return_to_sender, ready_for_pickup, delivery_delayed, pending added correctly
Null-safe scan/depot handling (if scans else None, getattr(scan, "Depot", None)) — good defensive fix

Issues that need addressing

1. Delivery → in_transit override

"Delivery": {
    "__default__": TrackingStatus.delivered.name,
    "Transferring to Shipping Centre - please wait for further instructions": TrackingStatus.in_transit.name,
}

This is the most dangerous entry in the mapping. You're overriding a ScanType=Delivery event to in_transit, asserting that Purolator mislabeled it. If it was observed in your production dataset, document how frequent it is and under what circumstances Purolator emits Delivery for an in-facility transfer. If it was rare (e.g., 1 out of 147K), reconsider whether it belongs in a generic mapping or should just pass through as delivered with the description visible. The description already passes through verbatim — integrators who need to act on this specific description can read it.

2. Undeliverable → pending entries

"Undeliverable": {
    "Shipment created - interim manifest received": TrackingStatus.pending.name,
    "Shipment created - final manifest received": TrackingStatus.pending.name,
    "Shipment created": TrackingStatus.pending.name,
    "Shipper created a label": TrackingStatus.pending.name,
    ...
}

Why does Purolator emit ScanType=Undeliverable for pre-pickup events? These descriptions sound like initial label/manifest events that should appear under Other (where you already map them to pending). If Purolator is genuinely emitting Undeliverable for these, document it — the combination is surprising enough that it warrants an inline comment explaining what you observed. If they weren't actually seen under Undeliverable in your dataset, remove them.

3. TrackingIncidentReason lookup is inconsistent with the normalized mapping

In tracking.py:

reason=next(
    (
        r.name
        for r in list(provider_units.TrackingIncidentReason)
        if scan.ScanType in r.value or scan.Description in r.value
    ),
    None,
)

scan.Description in r.value does exact string membership check against the enum lists. This isn't normalized — accents, whitespace, trailing periods, apostrophes all break it. You've built a robust normalization layer for status; the reason lookup bypasses it entirely.

Two options: either apply normalize_tracking_description to both sides of the comparison, or drop the reason mapping from this PR and address it separately once the status mapping is stable. The status improvement is the high-value piece here — don't let an inconsistent reason lookup create new issues.

Rebase

Same drift issue as your other open PRs. The functional diff here is meaningful and worth reviewing clean.

Path forward

Justify or remove the Delivery → in_transit override — if it stays, add an inline comment citing the observed event frequency from your dataset
Remove Undeliverable → pending entries or add inline comments explaining why Purolator emits those combinations
Either normalize the reason lookup or defer it to a follow-up PR
Rebase

Superseded by updated review.

danh91

The approach is sound and the improvement is real. Using description-aware mapping for Undeliverable events is the right call — Purolator intentionally designed five coarse scan types as buckets, and the description is the actual semantic signal. Without this, all Undeliverable events ("Available for pickup", "Returned to sender", "Weather delay", "No one home") collapse into delivery_failed, which is genuinely wrong for most of them.

The normalization layer (unicode NFKD, accent-stripping, whitespace and punctuation normalization) combined with the keyword fallback for unseen Undeliverable variants is robust and well-validated against real production data.

One ask before merging: please rebase on current main. The branch has drifted significantly — the functional diff is small and clean, but the accumulated divergence makes it hard to review confidently and will cause conflicts on merge. Once rebased this is ready to go.

HansDaigle · 2026-05-12T13:24:48Z

Rebased against current karrioapi/karrio:main and confirmed the PR remains a focused Purolator tracking-status normalization change. Local verification: .venv/bin/python -m pytest modules/connectors/purolator/tests/purolator -q -> 25 passed, 1 warning.

danh91 force-pushed the main branch from 0ec3464 to 258e650 Compare February 28, 2026 21:28

danh91 closed this Mar 1, 2026

danh91 reopened this Mar 1, 2026

danh91 previously requested changes Mar 23, 2026

View reviewed changes

danh91 approved these changes Mar 23, 2026

View reviewed changes

fix(purolator): normalize tracking status by description

5877deb

HansDaigle force-pushed the fix/purolator-tracking-status-mapping branch from 622b703 to 5877deb Compare May 12, 2026 04:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

purolator: make tracking status description-aware for Undeliverable/Other scans#971

purolator: make tracking status description-aware for Undeliverable/Other scans#971
HansDaigle wants to merge 1 commit into
karrioapi:mainfrom
HansDaigle:fix/purolator-tracking-status-mapping

HansDaigle commented Feb 20, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Feb 20, 2026

Uh oh!

HansDaigle commented Feb 20, 2026

Uh oh!

HansDaigle commented Feb 20, 2026

Uh oh!

danh91 left a comment

Uh oh!

danh91 left a comment

Uh oh!

HansDaigle commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

HansDaigle commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review updates

Validation

Uh oh!

vercel Bot commented Feb 20, 2026

Uh oh!

HansDaigle commented Feb 20, 2026

Uh oh!

HansDaigle commented Feb 20, 2026

Uh oh!

danh91 left a comment

Choose a reason for hiding this comment

Review

Why description-aware mapping is warranted here

What's solid

Issues that need addressing

Rebase

Path forward

Uh oh!

danh91 left a comment

Choose a reason for hiding this comment

Uh oh!

HansDaigle commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HansDaigle commented Feb 20, 2026 •

edited

Loading