Benchmark your laptop or desktop for local LLMs — CPU/RAM/GPU probes, quick microbenchmarks, optional live Ollama tokens/sec, a rough estimated LLM tokens per year, and the equivalent hourly Claude output-token cost. Includes helpers to wire Cursor to Ollama (including over an ngrok tunnel). One dependency: psutil.
Repository: github.com/evilmucedin/llm-local-performance-test
- Comparable numbers across machines (fixed Ollama
num_ctx/num_predicton the full run). - Claude cost comparison — prints the estimated hourly dollar value of the annual token estimate at Claude output-token pricing.
--simplemode — no Ollama required; good for CI or air-gapped checks.cursor-ollama— start/configure Ollama and merge OpenAI-compatible Ollama settings into Cursor’ssettings.json.- Apache-2.0 — use and fork freely.
git clone https://github.com/evilmucedin/llm-local-performance-test.git
cd llm-local-performance-test
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .Or run from a clone without install (still need pip install psutil in your environment):
python run.py --simple# Full run (contacts Ollama API at OLLAMA_HOST if running; uses the best installed model by default)
llm-local-perf
python -m llm_local_perf
python run.py
# Full run with an explicit Ollama model
llm-local-perf --ollama-model qwen2.5-coder:7b
python run.py --ollama-model qwen2.5-coder:7b
# Lighter path: no Ollama inference
llm-local-perf --simpleBy default, full mode selects the best installed Ollama model it can find, preferring non-embedding models with the largest parsed parameter count or downloaded size. Full-mode token/year and Claude-equivalent hourly cost estimates are adjusted by the selected Ollama model size when the model tag includes a size such as :7b, :32b, or :0.5b.
Environment:
OLLAMA_HOST— Ollama base URL (defaulthttp://localhost:11434).OLLAMA_BENCH_TIMEOUT— per-request timeout for the main benchmark in seconds (default5800).
For GPU listing on Linux/NVIDIA, nvidia-smi should be on PATH if you want discrete VRAM reported.
- Architecture — package layout, execution modes, estimation flow, pricing assumptions, and output conventions.
- Coding-agent guide — validation commands and advice for coding tools such as Claude Code and Pi.
- Claude guide — short Claude-specific entry point that links back to the canonical agent guide.
On Ubuntu/Debian-like machines, use the helper script to install Ollama and pull the best large coding model that fits the current hardware:
./scripts/install-best-coding-ollama-ubuntu.shThe script queries the Ollama Library for available coding-model tags, reads published model sizes/context windows, checks local NVIDIA VRAM and system RAM, prints the compatibility table, then pulls the highest-priority compatible model. It currently prefers large coding models such as qwen2.5-coder:32b, qwen3-coder:30b, codestral:22b, and deepseek-coder-v2:16b when hardware allows them.
Useful overrides:
DRY_RUN=1 ./scripts/install-best-coding-ollama-ubuntu.sh # inspect selection only
MODEL=qwen2.5-coder:32b ./scripts/install-best-coding-ollama-ubuntu.sh
GPU_HEADROOM_GB=4 RAM_HEADROOM_GB=8 ./scripts/install-best-coding-ollama-ubuntu.shThe cursor-ollama command ensures a local Ollama daemon, resolves/pulls a model, writes Cursor’s OpenAI-compatible Ollama settings, and optionally launches Cursor.
cursor-ollama [--no-launch]
cursor-ollama --ollama-host 'https://your-subdomain.ngrok-free.app' # tunnel URL for CursorOLLAMA_LOCAL_HOST— local API forollama serve, pulls, and health checks (defaulthttp://localhost:11434).OLLAMA_HOST— base URL stored in Cursor settings; defaults toOLLAMA_LOCAL_HOST. Set this to your ngrok (or other) public URL when Cursor must use the tunneled endpoint.CURSOR_OLLAMA_MODEL— default model (defaultqwen2.5-coder:7b).
Posting is up to you — here is neutral text you can use on Mastodon, Bluesky, X, Hacker News, or Reddit (follow each site’s self-promotion rules):
Short
Free Python tool: benchmark your PC for local LLMs (Ollama tokens/sec + rough “tokens/year” estimate) + optional Cursor↔Ollama setup. Apache-2.0.
https://github.com/evilmucedin/llm-local-performance-test
Show HN–style title + body
Show HN: llm-local-performance-test — benchmark your machine for local LLM throughput
CLI: microbenchmarks, optional Ollama run with fixed generation settings for fair comparison, naive yearly token estimate. Includescursor-ollamato point Cursor at Ollama (e.g. via ngrok). Python 3.10+, depends only on psutil.
GitHub “About” topics (set under repo ⚙️)
Ideas: ollama, llm, benchmark, local-llm, machine-learning, cursor-editor, python, nvidia, inference
Origins: derived from experimental llmTest5-style scripts; this repo is the packaged, maintained form.
See LICENSE.