Skip to content

purolator: make tracking status description-aware for Undeliverable/Other scans#971

Open
HansDaigle wants to merge 1 commit into
karrioapi:mainfrom
HansDaigle:fix/purolator-tracking-status-mapping
Open

purolator: make tracking status description-aware for Undeliverable/Other scans#971
HansDaigle wants to merge 1 commit into
karrioapi:mainfrom
HansDaigle:fix/purolator-tracking-status-mapping

Conversation

@HansDaigle

@HansDaigle HansDaigle commented Feb 20, 2026

Copy link
Copy Markdown
Contributor

Summary

  • normalize Purolator tracking status using ScanType plus audited scan descriptions
  • keep safe defaults for unknown descriptions and keyword fallback for unseen Undeliverable wording
  • emit shipment-level and event-level normalized statuses from the shared mapper
  • keep the diff limited to Purolator tracking parser, units, and tests

Review updates

  • Rebuilt on current karrioapi/karrio:main as a single focused commit.
  • Removed branch drift so the PR only contains the Purolator tracking-status normalization change.

Validation

.venv/bin/python -m pytest modules/connectors/purolator/tests/purolator -q

Result: 25 passed, 1 warning.

@vercel

vercel Bot commented Feb 20, 2026

Copy link
Copy Markdown

@HansDaigle is attempting to deploy a commit to the karrio Team on Vercel.

A member of the Team first needs to authorize it.

@HansDaigle

Copy link
Copy Markdown
Contributor Author

Implementation note: Purolator can emit the same ScanType with different Description values that imply different outcomes. This PR intentionally maps by ScanType + Description (with keyword fallback) so these cases do not collapse into a single status.

@HansDaigle

Copy link
Copy Markdown
Contributor Author

Follow-up: added French-description handling as well (accent-insensitive normalization + FR fallback keywords for Undeliverable), since Purolator settings support language=fr. Added dedicated test coverage in modules/connectors/purolator/tests/purolator/test_tracking.py.

danh91
danh91 previously requested changes Mar 23, 2026

@danh91 danh91 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

This one is different from the Canada Post PR (#970), and it deserves a different verdict. The core approach here is justified — but a few specific items need addressing before it lands.


Why description-aware mapping is warranted here

In main, the current Purolator connector maps all five scan types with no description awareness:

in_transit = [""]           # nothing maps here
delivered  = ["Delivery"]
delivery_failed = ["Undeliverable"]
out_for_delivery = ["OnDelivery"]

That means every Undeliverable event — "Available for pickup", "Returned to sender", "Arrived at sort facility", "Delivery delayed" — maps to delivery_failed. That's not a minor inaccuracy, it's categorically wrong for most Undeliverable events. ProofOfPickUp doesn't map at all, so pickups fall to unknown.

Purolator intentionally designed five coarse scan types as administrative buckets. The description is the actual semantic signal. This is the API design, not an edge case — it's fundamentally different from Canada Post, where numeric codes are specific enough to carry meaning on their own. Here, not using descriptions means systematically wrong status output.

The normalization approach is also robust in a way that addresses the main objection to description matching:

  • Unicode NFKD normalization + combining-character stripping → accent-insensitive for French/English
  • Whitespace normalization → handles extra spaces
  • Trailing period normalization → handles "text." vs "text"
  • Apostrophe unification → handles ' vs '
  • Keyword fallback for Undeliverable → resilient to wording drift in the most important category

147,766 events and 141 distinct description variants is credible validation. This isn't theoretical.


What's solid

  • normalize_tracking_description — correct and robust
  • UNDELIVERABLE_KEYWORD_RULES — the right approach; exact matches for known variants, keyword fallback for new wording
  • The two-layer lookup (normalized exact match → keyword fallback → default) is the right design
  • picked_up, return_to_sender, ready_for_pickup, delivery_delayed, pending added correctly
  • Null-safe scan/depot handling (if scans else None, getattr(scan, "Depot", None)) — good defensive fix

Issues that need addressing

1. Deliveryin_transit override

"Delivery": {
    "__default__": TrackingStatus.delivered.name,
    "Transferring to Shipping Centre - please wait for further instructions": TrackingStatus.in_transit.name,
}

This is the most dangerous entry in the mapping. You're overriding a ScanType=Delivery event to in_transit, asserting that Purolator mislabeled it. If it was observed in your production dataset, document how frequent it is and under what circumstances Purolator emits Delivery for an in-facility transfer. If it was rare (e.g., 1 out of 147K), reconsider whether it belongs in a generic mapping or should just pass through as delivered with the description visible. The description already passes through verbatim — integrators who need to act on this specific description can read it.

2. Undeliverablepending entries

"Undeliverable": {
    "Shipment created - interim manifest received": TrackingStatus.pending.name,
    "Shipment created - final manifest received": TrackingStatus.pending.name,
    "Shipment created": TrackingStatus.pending.name,
    "Shipper created a label": TrackingStatus.pending.name,
    ...
}

Why does Purolator emit ScanType=Undeliverable for pre-pickup events? These descriptions sound like initial label/manifest events that should appear under Other (where you already map them to pending). If Purolator is genuinely emitting Undeliverable for these, document it — the combination is surprising enough that it warrants an inline comment explaining what you observed. If they weren't actually seen under Undeliverable in your dataset, remove them.

3. TrackingIncidentReason lookup is inconsistent with the normalized mapping

In tracking.py:

reason=next(
    (
        r.name
        for r in list(provider_units.TrackingIncidentReason)
        if scan.ScanType in r.value or scan.Description in r.value
    ),
    None,
)

scan.Description in r.value does exact string membership check against the enum lists. This isn't normalized — accents, whitespace, trailing periods, apostrophes all break it. You've built a robust normalization layer for status; the reason lookup bypasses it entirely.

Two options: either apply normalize_tracking_description to both sides of the comparison, or drop the reason mapping from this PR and address it separately once the status mapping is stable. The status improvement is the high-value piece here — don't let an inconsistent reason lookup create new issues.


Rebase

Same drift issue as your other open PRs. The functional diff here is meaningful and worth reviewing clean.


Path forward

  1. Justify or remove the Deliveryin_transit override — if it stays, add an inline comment citing the observed event frequency from your dataset
  2. Remove Undeliverablepending entries or add inline comments explaining why Purolator emits those combinations
  3. Either normalize the reason lookup or defer it to a follow-up PR
  4. Rebase

@danh91 danh91 dismissed their stale review March 23, 2026 05:44

Superseded by updated review.

@danh91 danh91 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach is sound and the improvement is real. Using description-aware mapping for Undeliverable events is the right call — Purolator intentionally designed five coarse scan types as buckets, and the description is the actual semantic signal. Without this, all Undeliverable events ("Available for pickup", "Returned to sender", "Weather delay", "No one home") collapse into delivery_failed, which is genuinely wrong for most of them.

The normalization layer (unicode NFKD, accent-stripping, whitespace and punctuation normalization) combined with the keyword fallback for unseen Undeliverable variants is robust and well-validated against real production data.

One ask before merging: please rebase on current main. The branch has drifted significantly — the functional diff is small and clean, but the accumulated divergence makes it hard to review confidently and will cause conflicts on merge. Once rebased this is ready to go.

@HansDaigle HansDaigle force-pushed the fix/purolator-tracking-status-mapping branch from 622b703 to 5877deb Compare May 12, 2026 04:08
@HansDaigle

Copy link
Copy Markdown
Contributor Author

Rebased against current karrioapi/karrio:main and confirmed the PR remains a focused Purolator tracking-status normalization change. Local verification: .venv/bin/python -m pytest modules/connectors/purolator/tests/purolator -q -> 25 passed, 1 warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants