Lunar Lander

This project contains a notebook-based exercise for evolving a Lunar Lander controller with genetic programming.

Requirements

Python lower than 3.10

Install

Create and activate a virtual environment with a Python version lower than 3.10, then install the dependencies:

pip install -r requirements.txt

Run

Open solution.ipynb and run the notebook cells in order.

Automated experiments

Most experiments do not need to run the whole notebook. Use run_experiment.py to run the required training/evaluation steps and log the output:

.venv/bin/python run_experiment.py --mode train --experiment-name baseline

Each run creates a timestamped folder in experiment_runs/ containing:

run_config.json and requested_config.json with the hyperparameters
metrics.json with train/test scores, score standard deviation, and survival/crash rates
generation_history.csv with fitness, tree size, score standard deviation, and survival/crash rates per generation
best_tree.txt and optimized best_tree.pkl
best_lander_before_optimization.gif and optimized best_lander.gif unless --no-video is used
generation_artifacts/ with an optimized pickled model plus before/after optimization GIFs every 10 generations

Useful examples:

# Fast smoke run
.venv/bin/python run_experiment.py --mode train --pop-size 8 --max-gens 1 --n-jobs 1 --coeff-opt-steps 0

# Full configured run with a fixed seed
.venv/bin/python run_experiment.py --mode train --seed 42 --pop-size 128 --max-gens 35 --num-episodes 15

# Default 20-generation run with generation 10 and 20 artifacts
.venv/bin/python run_experiment.py --mode train --experiment-name default_20gens --max-gens 20

# Hyperparameter sweep only
.venv/bin/python run_experiment.py --mode sweep --n-trials 20 --sweep-gens 10 --sweep-episodes 5

# Sweep first, then train once with the best sweep parameters
.venv/bin/python run_experiment.py --mode sweep-train --n-trials 20 --sweep-gens 10 --sweep-episodes 5

You can also keep settings in a JSON file and run that from the terminal or a notebook cell:

{
  "mode": "train",
  "experiment_name": "baseline_seed_42",
  "seed": 42,
  "pop_size": 64,
  "max_gens": 20,
  "num_episodes": 10,
  "n_jobs": 1,
  "video": true
}

.venv/bin/python run_experiment.py --config-json experiment_config.json

For the most reproducible runs, use --n-jobs 1. Parallel fitness evaluation is faster, but random tree initialization can be less deterministic across worker processes.

How `run_experiment.py` works

run_experiment.py is the main script for repeatable runs outside the notebook. It creates a new timestamped directory for every run, stores the requested configuration, trains the genetic programming population, evaluates the final best model, and writes the metrics/artifacts needed for analysis and the presentation.

In train mode, the script:

Loads defaults from config.py, optionally overridden by CLI arguments or --config-json.
Sets the runtime seed and training hyperparameters.
Builds the GP leaf nodes and evolves the population for --max-gens.
Evaluates every individual over --num-episodes episodes per generation.
Records generation-level fitness, survival/crash, and variance metrics in generation_history.csv.
Saves optimized model checkpoints and before/after optimization GIFs every --artifact-interval generations. The default interval is 10.
Selects the best evolutionary tree, optimizes its constants, evaluates it on the test episodes, and saves the optimized final model.

In sweep mode, the script uses Optuna to search hyperparameters and writes the trial results. In sweep-train mode, it first runs the sweep and then trains one final model with the best found parameters.

The final model files are:

best_tree.pkl: optimized final model.
best_tree.txt: readable representation of the optimized final model.
best_lander_before_optimization.gif: final best tree before coefficient optimization.
best_lander.gif: final best tree after coefficient optimization.

The periodic checkpoint files are stored in generation_artifacts/:

generation_0010_tree.pkl: optimized model from generation 10.
generation_0010_before_optimization.gif: raw evolutionary tree before coefficient optimization.
generation_0010_after_optimization.gif: same tree after coefficient optimization.

`generation_history.csv` columns

Each row summarizes the whole population at one generation. Generation 0 is the initialized population before any evolutionary step. Later rows are after each generation has been evaluated.

An episode is counted as crashed when the environment terminates and the final reward is <= -100, which is the Lunar Lander crash penalty. Any episode that does not meet that crash condition is counted as survived. An agent is counted as survived only if it has zero crashed episodes during its fitness evaluation.

Column	Meaning
`generation`	Generation index. `0` is the initial population.
`best_fitness`	Highest fitness value in the population for this generation.
`mean_fitness`	Average fitness across all agents in the population.
`std_fitness`	Standard deviation of fitness across the population. Lower values mean agents are more consistent with each other.
`best_tree_size`	Number of nodes in the best-fitness tree.
`mean_tree_size`	Average number of nodes across all trees in the population.
`population_size`	Number of agents/trees evaluated in the generation.
`survived_agents`	Number of agents with zero crashed episodes.
`crashed_agents`	Number of agents with at least one crashed episode.
`agent_survival_rate`	`survived_agents / population_size`.
`agent_crash_rate`	`crashed_agents / population_size`.
`total_episodes`	Total evaluated episodes in this generation, normally `population_size * num_episodes`.
`survived_episodes`	Number of episodes not classified as crashes.
`crashed_episodes`	Number of episodes classified as crashes.
`episode_survival_rate`	`survived_episodes / total_episodes`.
`episode_crash_rate`	`crashed_episodes / total_episodes`.
`mean_episode_score`	Average episode reward across every episode played by every agent in the generation.
`std_episode_score`	Standard deviation of those episode rewards. Lower values mean episode outcomes are more consistent.

Fitness is the sum of rewards collected across the training episodes for one agent. Because each agent is evaluated over multiple episodes, mean_episode_score is useful for understanding typical per-episode behavior, while best_fitness shows which individual the evolutionary process is selecting.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
comparisons/presentation_plots		comparisons/presentation_plots
experiment_configs		experiment_configs
experiment_runs		experiment_runs
genepro		genepro
.gitignore		.gitignore
README.md		README.md
compare_generation_models.py		compare_generation_models.py
config.py		config.py
evolved_lander.gif		evolved_lander.gif
evolved_lander_RL.gif		evolved_lander_RL.gif
finalize_run.py		finalize_run.py
hyperparameter_sweep.py		hyperparameter_sweep.py
improvements.txt		improvements.txt
plot_presentation_comparisons.py		plot_presentation_comparisons.py
plot_run.py		plot_run.py
render_random_gifs.py		render_random_gifs.py
requirements-pinned.txt		requirements-pinned.txt
requirements.txt		requirements.txt
run_experiment.py		run_experiment.py
solution.ipynb		solution.ipynb
sweep_config.py		sweep_config.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lunar Lander

Requirements

Install

Run

Automated experiments

How `run_experiment.py` works

`generation_history.csv` columns

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lunar Lander

Requirements

Install

Run

Automated experiments

How run_experiment.py works

generation_history.csv columns

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

How `run_experiment.py` works

`generation_history.csv` columns

Packages