Skip to content

fix reduction scope for column scale bounds in p?geequ#163

Merged
langou merged 1 commit into
Reference-ScaLAPACK:masterfrom
kyungminlee:fix-geequ-colscale-reduction
May 11, 2026
Merged

fix reduction scope for column scale bounds in p?geequ#163
langou merged 1 commit into
Reference-ScaLAPACK:masterfrom
kyungminlee:fix-geequ-colscale-reduction

Conversation

@kyungminlee

Copy link
Copy Markdown
Contributor

Fix reduction scope for column scale bounds in p?geequ

Summary

p{s,d,c,z}geequ compute COLCND = min(C(j)) / max(C(j)) over the global column-scale vector. The min/max reduction is currently issued along the column axis of the BLACS process grid, but at that point C has already been replicated along that same axis — so the reduction is a no-op and each process column ends up with the min/max of only its own subset of global columns. The returned COLCND (and INFO, on the singular-column path) is therefore wrong whenever NPCOL ≥ 2 and the per-process-column scale ranges differ.

The fix switches three reductions in each of p{s,d,c,z}geequ.f from 'Columnwise'/COLCTOP to 'Rowwise'/ROWCTOP so the bounds are combined across process columns. R, ROWCND, AMAX, and the per-column entries of C are unaffected.

Root cause

In SRC/pdgeequ.f (other precisions identical), after the per-column max is computed and replicated along each process column:

  321  CALL DGAMX2D( ICTXT, 'Columnwise', COLCTOP, 1, NQ, C(JJA), 1, ...)   ! replicates C down each proc-column
       ...
       DO 100 J = JJA, JJA+NQ-1
          RCMIN = MIN( RCMIN, C(J) )      ! local min over THIS proc-column's NQ cols
          RCMAX = MAX( RCMAX, C(J) )
  100  CONTINUE
- 332  CALL DGAMX2D( ICTXT, 'Columnwise', COLCTOP, 1, 1, RCMAX, 1, ...)     ! BUG: along axis already replicated → no-op
- 334  CALL DGAMN2D( ICTXT, 'Columnwise', COLCTOP, 1, 1, RCMIN, 1, ...)     ! BUG
+ 332  CALL DGAMX2D( ICTXT, 'Rowwise',    ROWCTOP, 1, 1, RCMAX, 1, ...)     ! combine across proc-columns
+ 334  CALL DGAMN2D( ICTXT, 'Rowwise',    ROWCTOP, 1, 1, RCMIN, 1, ...)
       ...
- 346  CALL IGAMX2D( ICTXT, 'Columnwise', COLCTOP, 1, 1, INFO, 1, ...)      ! same axis bug on the singular-column path
+ 346  CALL IGAMX2D( ICTXT, 'Rowwise',    ROWCTOP, 1, 1, INFO, 1, ...)

After the patch, RCMIN/RCMAX are the true global min/max of C, so COLCND = max(RCMIN, SMLNUM) / min(RCMAX, BIGNUM) matches the serial LAPACK ?GEEQU result. The singular-column report (INFO = M + j_first_zero) is fixed the same way.

Reproducer

Build a matrix whose column-scale spread is asymmetric across process columns, e.g. A(i,j) = 10^(j-1). With M = N = 12, NB = 8, on NPROW × NPCOL = 1 × 2:

  • proc col 0 owns global cols 1..8 → C(j) ∈ {10^11, …, 10^4} → local COLCND = 10^-7
  • proc col 1 owns global cols 9..12 → C(j) ∈ {10^3, …, 10^0} → local COLCND = 10^-3
  • true global → C ∈ {10^11, …, 10^0} → COLCND = 10^-11

Serial DGEEQU on the same matrix returns COLCND = 1.0e-11. Before the patch, PDGEEQU returns the rank-0 process column's local value 1.0e-7 — a factor-10^4 error. After the patch it returns 1.0e-11.

A minimal Fortran example

NPROW = 1 ; NPCOL = 2 ; M = 12 ; N = 12 ; NB = 8
...
DO J = 1, NQ
   GJ = INDXL2G( J, NB, MYCOL, 0, NPCOL )
   DO I = 1, NP
      A( I + (J-1)*LLDA ) = 10.0D0**(GJ-1)   ! constant within each global column
   END DO
END DO
CALL PDGEEQU( M, N, A, 1, 1, DESCA, R, C, ROWCND, COLCND, AMAX, INFO )
! Compare COLCND against serial DGEEQU on rank 0.

Observed (this matrix, several process grids):

grid netlib master this PR serial DGEEQU
1×1 1.0e-11 1.0e-11 1.0e-11
1×2 1.0e-07 1.0e-11 1.0e-11
2×2 1.0e-07 1.0e-11 1.0e-11
1×4 1.0e-07 1.0e-11 1.0e-11
2×3 1.0e-07 1.0e-11 1.0e-11

R, ROWCND, and AMAX are correct on all grids and all builds; only COLCND (and the singular-column INFO report on the same code path) is affected. MKL inherits the bug from netlib; this PR is the only build that matches serial LAPACK on NPCOL ≥ 2.

When calculating the condition numbers and bounds of the column
scale factors in P?GEEQU, the reduction scope for RCMAX, RCMIN,
and INFO was incorrectly set to 'Columnwise'. Because column scale
factors are distributed across process rows, this meant that the
bounds were only reduced locally within each process column.

This caused the global condition number to be incorrect and prevented
the detection of zero scale factors across the process grid, potentially
leading to a deadlock when some processes exit with INFO > 0 while
others continue.

Changed the reduction scope from 'Columnwise' to 'Rowwise' and
updated the corresponding topology variable to ROWCTOP. This ensures
that column scale factor bounds are correctly reduced across the
entire process grid.
@langou langou merged commit ff7d5d6 into Reference-ScaLAPACK:master May 11, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants