Source code for the paper:
A Permutation-Based Reformulation for Approximating Weitzman Diversity for Benchmarking in Multi-Objective Optimization
[Parallel Problem Solving From Nature, 2026]
[Authors]
Mahboubeh Nezhadmoghaddam
Adrián Isaí Morales-Paredes
Julio Juárez
Jesús Guillermo Falcón-Cardona
Víctor Adrián Sosa Hernández
This repository contains the implementations of four heuristic algorithms for approximating the Weitzman diversity indicator for Pareto fronts (euclidean distance is used), along with a branch & bound implementation for the ground truth (n<= 36) and all scripts required to reproduce the paper's results.
- Background
- Requirements
- Installation
- Reproducing the paper results
- Repository structure
- Data
- Configuration reference
- Algorithms
- Output format
The Weitzman diversity of a finite set
where
Computing
- Python >= 3.10
- NumPy >= 1.24, SciPy >= 1.10, Matplotlib >= 3.7
- NetworkX >= 3.0, PyYAML >= 6.0, tqdm >= 4.64, pandas >= 2.0
All dependencies are declared in pyproject.toml and installed automatically.
# Clone the repository
git clone https://github.com/AdrianMP1/Weitzman-Approximator.git Weitzman_Project
cd Weitzman_Project
# (Recommended) activate your environment first
conda activate <your-env> # or: source .venv/bin/activate
# Install the weitzman package in editable mode
pip install -e .Verify the installation:
python -c "import weitzman; print('OK')"All commands must be run from the Weitzman_Project/ directory.
Runs all four algorithms on every (kind, geom, card) data cell, computes the brute-force ground truth for small instances (results/data.csv automatically.
python main.py batch --config experiment.yamlResults are saved to results/batch/<kind>_<geom>_<card>/ (one directory per
cell) and produces the results/figures/ along with the .csv files. The B&B solver is a bottleneck on instances with n_max: 28 on exact_solver on configs/experiment.yaml. The
Cells that already have results are skipped automatically; use --force to
re-run everything from scratch.
If you need to rebuild data.csv without re-running any algorithms
(e.g. after adding a new metric):
python main.py aggregateFor quick checks on one data folder without the full batch:
python main.py run --config debug.yaml -vUseful flags:
| Flag | Effect |
|---|---|
--config debug.yaml |
Fast single-algorithm run on m3_p4 |
--seed N |
Override the random seed in the config |
--no-plots |
Skip figure generation (run only) |
--force |
Re-run even if output exists (batch only) |
-v / -vv |
Increase log verbosity |
Weitzman_Project/
├── main.py <- single entry point (run / batch / plot)
├── pyproject.toml
├── configs/
│ ├── experiment.yaml <- full experiment (all algorithms, all instances)
│ └── debug.yaml <- fast single-algorithm sanity check
├── data/
│ ├── CovLoss(Concave-Convex-Linear)/
│ │ ├── PFAs_CovLoss_Concave/m3_p4 … m3_p19/
│ │ ├── PFAs_CovLoss_Convex/m3_p4 … m3_p19/
│ │ └── PFAs_CovLoss_Linear/m3_p4 … m3_p19/
│ └── UnifLoss(Concave-Convex-Linear)/
│ ├── PFAs_UnifLoss_Concave/m3_p4 … m3_p19/
│ ├── PFAs_UnifLoss_Convex/m3_p4 … m3_p19/
│ └── PFAs_UnifLoss_Linear/m3_p4 … m3_p19/
├── experiments/
│ ├── run_batch.py <- orchestrates the full pipeline end-to-end
│ ├── run_heuristics.py <- run all configured algorithms (used by run_batch)
│ ├── run_brute_force.py <- O(n!) exact solver for small n
│ ├── run_exact_solver.py <- B&B exact solver for small n
│ ├── aggregate_results.py <- collects per-instance .npy to produce a data.csv
│ ├── compute_kendall_tau.py <- Kendall tau correlation table
│ └── plot_results.py <- regenerate figures from a finished run
└── weitzman/
├── algorithms/ <- one module per heuristic + brute force
├── metrics/ <- Weitzman computation (B&B), Pure Diversity
├── plotting/ <- trendlines, box plots
├── io/ <- .POF loaders, config loader, writers
└── utils/ <- core math, run context, logging
Naming convention for data subfolders: m3_pX denotes instances on a
three-objective (m3_p4 -> 15 points,
m3_p10 -> 66 points, m3_p19 -> 210 points).
The data/ directory is organised by Coverage and Uniformity, then for Pareto front geometry:
| Group | Geometries | Instances per geometry |
|---|---|---|
CovLoss |
Concave, Convex, Linear | 16 sizes × 6 coverage degree values |
UnifLoss |
Concave, Convex, Linear | 16 sizes × 6 uniformity degree values |
Each .POF file contains one three-objective Pareto front. Files at size m3_p4
have 15 points; files at m3_p19 have 210 points.
Both configs (experiment.yaml and debug.yaml) share the same schema.
experiment:
name: "Weitzman_Project"
seed: 42
data:
instances_dir: "data/CovLoss/PFAs_CovLoss_Linear/m3_p4" # For single runs, not batch
instance_pattern: ".POF"
labeled: false # true for files with " -> label" suffix on each line
algorithms:
# ["all"] runs every registered algorithm.
# Alternatively list any subset:
# ["farthest_neighbour", "twice_around", "christofides", "global_max_min"]
names: ["all"]
config:
farthest_neighbour:
kind: "max" # "max" = farthest-first, "min" = nearest-first
reverse: true
twice_around:
mst_mode: "max" # "max" = maximum spanning tree, "min" = minimum spanning tree
reverse: true
christofides:
mst_mode: "max" # same as TAT
reverse: true
global_max_min: # No parameters
parameter_value: "None" # Dummy variable
exact_solver:
n_max: 28 # run B&B exact solver for instances with n <= n_max
brute_force:
n_range: [4, 12] # only instances with n in [low, high] are processed
plots:
show: false
dpi: 200
fontsize: 12| Key | Full name | Complexity | Description |
|---|---|---|---|
farthest_neighbour |
Farthest-Neighbour | Greedy farthest-first insertion; exhaustive starting vertices (FN) | |
twice_around |
Twice-Around-the-Tree | Maximum spanning tree -> doubled edges -> Euler circuit -> Hamiltonian shortcut (TAT) | |
christofides |
Christofides-inspired | Maximum spanning tree -> odd-degree matching -> Euler circuit -> Hamiltonian shortcut (CHR) | |
global_max_min |
Global Max-Min | Greedy sequence: |
The brute-force solver enumerates all
After python main.py batch, the results tree is:
results/
├── data.csv <- aggregated results (all cells, all algorithms)
├── kendall_tau_coverage_pivot.csv <- Kendall Tau values (all cards, all algorithms, all geometries)
├── kendall_tau_uniformity_pivot.csv <- Kendall Tau values (all cards, all algorithms, all geometries)
├── batch/
│ └── <kind>_<geom>_<card>/ <- e.g. coverage_Linear_m3_p4
│ ├── <algorithm>/
│ │ ├── values/
│ │ │ └── values_<instance>.npy <- shape (n,): W score per starting vertex
│ │ ├─── sequences/
│ │ │ └── sequences_<instance>.npy
│ │ └── timing.json <- execution time summary
│ ├── exact/
│ │ ├── values/
│ │ │ └── values_<instance>.npy <- shape (n,): W score per starting vertex
│ │ ├── sequences/
│ │ │ └── sequences_<instance>.npy
│ │ └── timing.json <- execution time summary
│ └── factorial/ <- brute-force (only for n ≤ n_range[1])
│ ├── values/
│ │ └── values_<NNN>_points.npy <- shape (n!,): all W values
│ ├── best_sequences/
│ └── worst_sequences/
└── figures/ <- after python main.py trendline or batch
├── Coverage/
│ ├── Linear/
│ │ └── trendline_<kind>_<geom>_<card>.{pdf,png}
│ ├── Concave/
│ └── Convex/
└── Uniformity/
data.csv columns:
| Column | Description |
|---|---|
kind |
coverage or uniformity |
geom |
Concave, Convex, or Linear |
card |
number of Pareto front points |
lattice_deg |
coverage/uniformity parameter (0.6 – 1.0) |
algorithm |
registry key (e.g. farthest_neighbour) |
min, q1, median, q3, max |
distribution of values across starting vertices |
W-value |
exact W(A) from B&B (NaN or empty when n > n_max) |
PD |
Pure Diversity metric with Euclidean distance |
time_s |
execution time in seconds |
For ad-hoc single-directory runs (python main.py run), output goes to
results/runs/run_<YYYYMMDD_HHMMSS_hostname>/ with the same per-algorithm layout.