-
Generate chess dataset for predicting the best move and describing board positions.
-
Evaluate OpenAI (and compatible open-source models via OpenRouter) on the datasets.
-
Train Qwen and SmolVLM on the datasets.
-
Evaluate Qwen and SmolVLM on the datasets.
Install dependencies:
pip install -r requirements.txtStockfish Installation Instructions
To install stockfish (to generate the best moves) follow the following instructions:
mkdir stockfish_engine
cd stockfish_engine
wget https://github.com/official-stockfish/Stockfish/archive/refs/tags/sf_17.1.zip
unzip sf_17.1.zip
cd Stockfish-sf_17.1/srcmake clean
make -j build ARCH=x86-64-modern./stockfishpython generate_random.py --save_folder dataset_folder --dataset_size 1024-
Generates N images each of size 384x384 and a corresponding 8x8 npy array.
-
To change size of images go to
chess_ui.pyand changeself.setFixedSize(384,384)
python generate_best_move.py --dataset_size 1024 --dataset_name dataset_folder --num_attempts 1200-
Generate
dataset_sizeimages and saves a a singlebest_moves.pklfile made of the 10 best moves for that position and acolor.pklfile showing which color should play the next move. -
num_attemps is a hyperparameter to control the number of attempts to generate a valid position (a position with atleast 1 valid move). Generally a value just above
dataset_sizeis good.
python test_openai.py --dataset dataset_folder | tee openai_results.txt- Will save the results in
openai_results.txt. The reason we save it in a text file is because often the outputs are not neatly formatted and manual post processing is required for fair evaluation.
python test_openai_best.py --folder dataset_folder | tee openai_results_best.txtpython sft_train.py --model_name "HuggingFaceTB/SmolVLM2-2.2B-Instruct" --save_name "save_weights_here/" --dataset_name "dataset_name/" --task "describe_board"taskcan bedescribe_boardorbest_move
For simplicity there are 2 scripts for training Qwen. One for describing the board position and one for predicting the best move.
python sft_qwen.py --model "Qwen/Qwen2-VL-7B-Instruct" --dataset_path "dataset_name/" --limit 1024 --name "save_name/"python sft_qwen_move.py --model "Qwen/Qwen2-VL-7B-Instruct" --dataset_path "dataset_name/" --limit 1024 --name "save_name/"python run_eval_smol.py --model "saved_weights/" --dataset_path "dataset_name/" --limit 128 --task "describe_board"python run_eval_qwen.py --model "saved_weights/" --dataset_path "dataset_name/" --limit 128python run_eval_qwen_move.py --model "saved_weights/" --dataset_path "dataset_name/" --limit 128All the evaluation scripts above print the output. Therefore they should be used as python command | tee text_results.txt
This is because outputs often need manual formatting. To get the accuracy numbers once you have the text_results.txt use the following script:
- Clean/Process
python parse_text_describe.py --input "text_results.txt" --output "output.npy"This will process the output.txt and also print what lines can't be processed so that you can manually clean them and rerun the script. Results are saved in output.npy
- Print accuracy:
python get_accuracy.py --processed_output "output.npy" --test_dataset "dataset_name/" --task "describe_board"- Clean/Process:
python parse_best_move.py --input "text_results.txt" --output "output.pkl"- Print accuracy:
python get_accuracy.py --processed_output "output.pkl" --test_dataset "dataset_name/" --task "best_move"python generate_img_diffusion.py --input_img "input_chess_board.png" --save_img "output_chess_board.png"input_imgis the image of the board position.output_imgis the image of the board position after the best move (generated by diffusion xl)