Fix GroupedLinear FP8 calibration loop by fallintoplace · Pull Request #3101 · NVIDIA/TransformerEngine

fallintoplace · 2026-06-06T14:39:38Z

Summary

remove the unused outer calibration loop in GroupedLinear
calibrate each input and weight quantizer once per GEMM
avoid repeating calibration num_gemms times for every GEMM

Validation

ran python3 -m py_compile transformer_engine/pytorch/module/grouped_linear.py
ran git diff --check -- transformer_engine/pytorch/module/grouped_linear.py

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>

greptile-apps · 2026-06-06T14:43:10Z

Greptile Summary

This PR fixes a bug in _GroupedLinear.forward where FP8 calibration was redundantly applied num_gemms × num_gemms times instead of once per GEMM. The original code had an outer for i in range(num_gemms) loop whose variable was immediately shadowed by two identical inner loops, making the outer loop a no-op wrapper that caused each calibration to repeat num_gemms times for every GEMM.

Root cause removed: the outer for i loop (with its shadowed variable) and the two separate inner loops are collapsed into a single correct loop that calls input_quantizers[i].calibrate(inputmats[i]) and weight_quantizers[i].calibrate(weights[i]) exactly once each.
Scope: change is confined to the non-grouped-tensor forward path under fp8_calibration=True; the grouped-tensor path is unaffected.

Confidence Score: 5/5

Safe to merge — the change is a minimal, well-scoped removal of a redundant nested loop in the FP8 calibration path with no behavioural side effects beyond fixing the over-calibration.

The diff is six lines touching a single code path that only runs during FP8 calibration. The original loop structure was unambiguously buggy (outer loop variable shadowed by two inner loops), and the replacement correctly calls each quantizer exactly once. The surrounding GEMM code, the grad bookkeeping, and every other code path are untouched.

No files require special attention.

Important Files Changed

Filename	Overview
transformer_engine/pytorch/module/grouped_linear.py	Removes redundant nested calibration loops; single loop now calibrates each input and weight quantizer exactly once per GEMM — straightforward and correct fix.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[forward called with fp8_calibration=True] --> B[Build inputmats via torch.split]
    B --> C[Build weights_fp8]
    C --> D[general_grouped_gemm]
    D --> E{fp8_calibration?}
    E -- Yes --> F["for i in range(num_gemms)"]
    F --> G["input_quantizers[i].calibrate(inputmats[i])"]
    G --> H["weight_quantizers[i].calibrate(weights[i])"]
    H --> F
    F -- done --> I[cpu_offloading / grad bookkeeping]
    E -- No --> I

_{Reviews (2): Last reviewed commit: "Merge branch 'main' into fix/grouped-lin..." | Re-trigger Greptile}

vthumbe1503 · 2026-06-09T01:49:04Z

/te-ci pytorch

vthumbe1503

LGTM

github-actions Bot added the community-contribution PRs from external contributor outside the core maintainers, representing community-driven work. label Jun 6, 2026

fallintoplace marked this pull request as ready for review June 6, 2026 14:40

fallintoplace requested a review from ksivaman as a code owner June 6, 2026 14:40

Fix grouped linear FP8 calibration loop

08c86d7

Signed-off-by: Minh Vu <vuhoangminh97@gmail.com>

fallintoplace force-pushed the fix/grouped-linear-calibration branch from 506ef2d to 08c86d7 Compare June 6, 2026 14:41

Merge branch 'main' into fix/grouped-linear-calibration

bd983db

vthumbe1503 approved these changes Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GroupedLinear FP8 calibration loop#3101

Fix GroupedLinear FP8 calibration loop#3101
fallintoplace wants to merge 2 commits into
NVIDIA:mainfrom
fallintoplace:fix/grouped-linear-calibration

fallintoplace commented Jun 6, 2026

Uh oh!

greptile-apps Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

vthumbe1503 commented Jun 9, 2026

Uh oh!

vthumbe1503 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fallintoplace commented Jun 6, 2026

Summary

Validation

Uh oh!

greptile-apps Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

vthumbe1503 commented Jun 9, 2026

Uh oh!

vthumbe1503 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Jun 6, 2026 •

edited

Loading