Weitzman Diversity Approximator on Pareto Fronts

Source code for the paper:
A Permutation-Based Reformulation for Approximating Weitzman Diversity for Benchmarking in Multi-Objective Optimization
[Parallel Problem Solving From Nature, 2026]

[Authors]
Mahboubeh Nezhadmoghaddam
Adrián Isaí Morales-Paredes
Julio Juárez
Jesús Guillermo Falcón-Cardona
Víctor Adrián Sosa Hernández

This repository contains the implementations of four heuristic algorithms for approximating the Weitzman diversity indicator for Pareto fronts (euclidean distance is used), along with a branch & bound implementation for the ground truth (n<= 36) and all scripts required to reproduce the paper's results.

Background

The Weitzman diversity of a finite set $A$ is defined recursively as

$$W(A) = \max_{a_i \in A} \bigl[ W(A \setminus {a_i}) + d(a_i,, A \setminus {a_i}) \bigr]$$

where $d(a_i, Q) = \min_{a_j \in Q} d(a_i, a_j)$ is the minimum distance from point $a_i$ to the remaining set $Q$, and $W({a_i}) = 0$.

Computing $W(A)$ requires evaluating all $n!$ removal sequences and is only feasible for small values of $n$. This paper studies four heuristic algorithms that approximate $W(A)$ efficiently by constructing removal sequences.

Requirements

Python >= 3.10
NumPy >= 1.24, SciPy >= 1.10, Matplotlib >= 3.7
NetworkX >= 3.0, PyYAML >= 6.0, tqdm >= 4.64, pandas >= 2.0

All dependencies are declared in pyproject.toml and installed automatically.

Installation

# Clone the repository
git clone https://github.com/AdrianMP1/Weitzman-Approximator.git Weitzman_Project
cd Weitzman_Project

# (Recommended) activate your environment first
conda activate <your-env>       # or: source .venv/bin/activate

# Install the weitzman package in editable mode
pip install -e .

Verify the installation:

python -c "import weitzman; print('OK')"

Reproducing the paper results

All commands must be run from the Weitzman_Project/ directory.

Run the full batch [Everything with a single execution]

Runs all four algorithms on every (kind, geom, card) data cell, computes the brute-force ground truth for small instances ($n \le 12$ by default) while running B&B for instances $n \leq 28$ (it can be set up to 36, but the execution time is significant), and writes the aggregated results/data.csv automatically.

python main.py batch --config experiment.yaml

Results are saved to results/batch/<kind>_<geom>_<card>/ (one directory per cell) and produces the results/figures/ along with the .csv files. The B&B solver is a bottleneck on instances with $n \geq 36$, by default it is set as n_max: 28 on exact_solver on configs/experiment.yaml. The $n=36$ is feasible, but takes some hours.

Cells that already have results are skipped automatically; use --force to re-run everything from scratch.

[Optional] - Re-aggregate without re-running

If you need to rebuild data.csv without re-running any algorithms (e.g. after adding a new metric):

python main.py aggregate

[Extra] Ad-hoc single-directory run (development)

For quick checks on one data folder without the full batch:

python main.py run --config debug.yaml -v

Useful flags:

Flag	Effect
`--config debug.yaml`	Fast single-algorithm run on `m3_p4`
`--seed N`	Override the random seed in the config
`--no-plots`	Skip figure generation (run only)
`--force`	Re-run even if output exists (batch only)
`-v` / `-vv`	Increase log verbosity

Repository structure

Weitzman_Project/
├── main.py                          <- single entry point (run / batch / plot)
├── pyproject.toml
├── configs/
│   ├── experiment.yaml              <- full experiment (all algorithms, all instances)
│   └── debug.yaml                   <- fast single-algorithm sanity check
├── data/
│   ├── CovLoss(Concave-Convex-Linear)/
│   │   ├── PFAs_CovLoss_Concave/m3_p4 … m3_p19/
│   │   ├── PFAs_CovLoss_Convex/m3_p4 … m3_p19/
│   │   └── PFAs_CovLoss_Linear/m3_p4 … m3_p19/
│   └── UnifLoss(Concave-Convex-Linear)/
│       ├── PFAs_UnifLoss_Concave/m3_p4 … m3_p19/
│       ├── PFAs_UnifLoss_Convex/m3_p4 … m3_p19/
│       └── PFAs_UnifLoss_Linear/m3_p4 … m3_p19/
├── experiments/
│   ├── run_batch.py                 <- orchestrates the full pipeline end-to-end
│   ├── run_heuristics.py            <- run all configured algorithms (used by run_batch)
│   ├── run_brute_force.py           <- O(n!) exact solver for small n
│   ├── run_exact_solver.py          <- B&B exact solver for small n
│   ├── aggregate_results.py         <- collects per-instance .npy to produce a data.csv
│   ├── compute_kendall_tau.py       <- Kendall tau correlation table
│   └── plot_results.py              <- regenerate figures from a finished run
└── weitzman/
    ├── algorithms/                  <- one module per heuristic + brute force
    ├── metrics/                     <- Weitzman computation (B&B), Pure Diversity
    ├── plotting/                    <- trendlines, box plots
    ├── io/                          <- .POF loaders, config loader, writers
    └── utils/                       <- core math, run context, logging

Naming convention for data subfolders: m3_pX denotes instances on a three-objective ($m = 3$) and a parameter $p = X$. The number of points is $\binom{p+2}{2}$ (e.g. m3_p4 -> 15 points, m3_p10 -> 66 points, m3_p19 -> 210 points).

Data

The data/ directory is organised by Coverage and Uniformity, then for Pareto front geometry:

Group	Geometries	Instances per geometry
`CovLoss`	Concave, Convex, Linear	16 sizes × 6 coverage degree values
`UnifLoss`	Concave, Convex, Linear	16 sizes × 6 uniformity degree values

Each .POF file contains one three-objective Pareto front. Files at size m3_p4 have 15 points; files at m3_p19 have 210 points.

Configuration reference

Both configs (experiment.yaml and debug.yaml) share the same schema.

experiment:
  name: "Weitzman_Project"
  seed: 42

data:
  instances_dir: "data/CovLoss/PFAs_CovLoss_Linear/m3_p4" # For single runs, not batch
  instance_pattern: ".POF"
  labeled: false          # true for files with " -> label" suffix on each line

algorithms:
  # ["all"] runs every registered algorithm.
  # Alternatively list any subset:
  # ["farthest_neighbour", "twice_around", "christofides", "global_max_min"]
  names: ["all"]

  config:
    farthest_neighbour:
      kind: "max"         # "max" = farthest-first, "min" = nearest-first
      reverse: true

    twice_around:
      mst_mode: "max"     # "max" = maximum spanning tree, "min" = minimum spanning tree
      reverse: true

    christofides:
      mst_mode: "max"     # same as TAT
      reverse: true

    global_max_min:       # No parameters
      parameter_value: "None" # Dummy variable

exact_solver:
  n_max: 28               # run B&B exact solver for instances with n <= n_max

brute_force:
  n_range: [4, 12]        # only instances with n in [low, high] are processed

plots:
  show: false
  dpi: 200
  fontsize: 12

Algorithms

Key	Full name	Complexity	Description
`farthest_neighbour`	Farthest-Neighbour	$O(n^3)$	Greedy farthest-first insertion; exhaustive starting vertices (FN)
`twice_around`	Twice-Around-the-Tree	$O(n^2 \log n)$	Maximum spanning tree -> doubled edges -> Euler circuit -> Hamiltonian shortcut (TAT)
`christofides`	Christofides-inspired	$O(n^3)$	Maximum spanning tree -> odd-degree matching -> Euler circuit -> Hamiltonian shortcut (CHR)
`global_max_min`	Global Max-Min	$O(n^3)$	Greedy sequence: $i^* = \arg\max_{u \notin Q} \min_{v \in Q} d(u, v)$; exhaustive starting vertices (GMM)

The brute-force solver enumerates all $n!$ sequences and serves as ground truth (B&B too); it is not a heuristic and is not included in the algorithm registry.

Output format

After python main.py batch, the results tree is:

results/
├── data.csv                             <- aggregated results (all cells, all algorithms)
├── kendall_tau_coverage_pivot.csv       <- Kendall Tau values (all cards, all algorithms, all geometries)
├── kendall_tau_uniformity_pivot.csv     <- Kendall Tau values (all cards, all algorithms, all geometries)
├── batch/
│   └── <kind>_<geom>_<card>/            <- e.g. coverage_Linear_m3_p4
│       ├── <algorithm>/
│       │   ├── values/
│       │   │   └── values_<instance>.npy <- shape (n,): W score per starting vertex
│       │   ├─── sequences/
│       │   │     └── sequences_<instance>.npy
│       │   └── timing.json              <- execution time summary
│       ├── exact/
│       │   ├── values/
│       │   │   └── values_<instance>.npy <- shape (n,): W score per starting vertex
│       │   ├── sequences/
│       │   │   └── sequences_<instance>.npy
│       │   └── timing.json              <- execution time summary
│       └── factorial/                   <- brute-force (only for n ≤ n_range[1])
│           ├── values/
│           │   └── values_<NNN>_points.npy  <- shape (n!,): all W values
│           ├── best_sequences/
│           └── worst_sequences/
└── figures/                             <- after python main.py trendline or batch
    ├── Coverage/
    │   ├── Linear/
    │   │   └── trendline_<kind>_<geom>_<card>.{pdf,png}
    │   ├── Concave/
    │   └── Convex/
    └── Uniformity/

data.csv columns:

Column	Description
`kind`	`coverage` or `uniformity`
`geom`	`Concave`, `Convex`, or `Linear`
`card`	number of Pareto front points
`lattice_deg`	coverage/uniformity parameter (0.6 – 1.0)
`algorithm`	registry key (e.g. `farthest_neighbour`)
`min`, `q1`, `median`, `q3`, `max`	distribution of values across starting vertices
`W-value`	exact W(A) from B&B (NaN or empty when n > n_max)
`PD`	Pure Diversity metric with Euclidean distance
`time_s`	execution time in seconds

For ad-hoc single-directory runs (python main.py run), output goes to results/runs/run_<YYYYMMDD_HHMMSS_hostname>/ with the same per-algorithm layout.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Weitzman Diversity Approximator on Pareto Fronts

Table of Contents

Background

Requirements

Installation

Reproducing the paper results

Run the full batch [Everything with a single execution]

[Optional] - Re-aggregate without re-running

[Extra] Ad-hoc single-directory run (development)

Repository structure

Data

Configuration reference

Algorithms

Output format

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
configs		configs
data		data
experiments		experiments
weitzman		weitzman
.gitignore		.gitignore
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Weitzman Diversity Approximator on Pareto Fronts

Table of Contents

Background

Requirements

Installation

Reproducing the paper results

Run the full batch [Everything with a single execution]

[Optional] - Re-aggregate without re-running

[Extra] Ad-hoc single-directory run (development)

Repository structure

Data

Configuration reference

Algorithms

Output format

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages