fix reduction scope for column scale bounds in p?geequ#163
Merged
langou merged 1 commit intoMay 11, 2026
Merged
Conversation
When calculating the condition numbers and bounds of the column scale factors in P?GEEQU, the reduction scope for RCMAX, RCMIN, and INFO was incorrectly set to 'Columnwise'. Because column scale factors are distributed across process rows, this meant that the bounds were only reduced locally within each process column. This caused the global condition number to be incorrect and prevented the detection of zero scale factors across the process grid, potentially leading to a deadlock when some processes exit with INFO > 0 while others continue. Changed the reduction scope from 'Columnwise' to 'Rowwise' and updated the corresponding topology variable to ROWCTOP. This ensures that column scale factor bounds are correctly reduced across the entire process grid.
langou
approved these changes
May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix reduction scope for column scale bounds in p?geequ
Summary
p{s,d,c,z}geequcomputeCOLCND = min(C(j)) / max(C(j))over the global column-scale vector. The min/max reduction is currently issued along the column axis of the BLACS process grid, but at that pointChas already been replicated along that same axis — so the reduction is a no-op and each process column ends up with the min/max of only its own subset of global columns. The returnedCOLCND(andINFO, on the singular-column path) is therefore wrong wheneverNPCOL ≥ 2and the per-process-column scale ranges differ.The fix switches three reductions in each of
p{s,d,c,z}geequ.ffrom'Columnwise'/COLCTOPto'Rowwise'/ROWCTOPso the bounds are combined across process columns.R,ROWCND,AMAX, and the per-column entries ofCare unaffected.Root cause
In
SRC/pdgeequ.f(other precisions identical), after the per-column max is computed and replicated along each process column:After the patch,
RCMIN/RCMAXare the true global min/max ofC, soCOLCND = max(RCMIN, SMLNUM) / min(RCMAX, BIGNUM)matches the serial LAPACK?GEEQUresult. The singular-column report (INFO = M + j_first_zero) is fixed the same way.Reproducer
Build a matrix whose column-scale spread is asymmetric across process columns, e.g.
A(i,j) = 10^(j-1). WithM = N = 12,NB = 8, onNPROW × NPCOL = 1 × 2:Serial
DGEEQUon the same matrix returnsCOLCND = 1.0e-11. Before the patch,PDGEEQUreturns the rank-0 process column's local value1.0e-7— a factor-10^4 error. After the patch it returns1.0e-11.A minimal Fortran example
Observed (this matrix, several process grids):
R,ROWCND, andAMAXare correct on all grids and all builds; onlyCOLCND(and the singular-columnINFOreport on the same code path) is affected. MKL inherits the bug from netlib; this PR is the only build that matches serial LAPACK onNPCOL ≥ 2.