llm-local-performance-test

Benchmark your laptop or desktop for local LLMs — CPU/RAM/GPU probes, quick microbenchmarks, optional live Ollama tokens/sec, a rough estimated LLM tokens per year, and the equivalent hourly Claude output-token cost. Includes helpers to wire Cursor to Ollama (including over an ngrok tunnel). One dependency: psutil.

Repository: github.com/evilmucedin/llm-local-performance-test

Why use it

Comparable numbers across machines (fixed Ollama num_ctx / num_predict on the full run).
Claude cost comparison — prints the estimated hourly dollar value of the annual token estimate at Claude output-token pricing.
--simple mode — no Ollama required; good for CI or air-gapped checks.
cursor-ollama — start/configure Ollama and merge OpenAI-compatible Ollama settings into Cursor’s settings.json.
Apache-2.0 — use and fork freely.

Install

git clone https://github.com/evilmucedin/llm-local-performance-test.git
cd llm-local-performance-test
python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e .

Or run from a clone without install (still need pip install psutil in your environment):

python run.py --simple

Run

# Full run (contacts Ollama API at OLLAMA_HOST if running; uses the best installed model by default)
llm-local-perf

python -m llm_local_perf
python run.py

# Full run with an explicit Ollama model
llm-local-perf --ollama-model qwen2.5-coder:7b
python run.py --ollama-model qwen2.5-coder:7b

# Lighter path: no Ollama inference
llm-local-perf --simple

By default, full mode selects the best installed Ollama model it can find, preferring non-embedding models with the largest parsed parameter count or downloaded size. Full-mode token/year and Claude-equivalent hourly cost estimates are adjusted by the selected Ollama model size when the model tag includes a size such as :7b, :32b, or :0.5b.

Environment:

OLLAMA_HOST — Ollama base URL (default http://localhost:11434).
OLLAMA_BENCH_TIMEOUT — per-request timeout for the main benchmark in seconds (default 5800).

For GPU listing on Linux/NVIDIA, nvidia-smi should be on PATH if you want discrete VRAM reported.

Documentation

Architecture — package layout, execution modes, estimation flow, pricing assumptions, and output conventions.
Coding-agent guide — validation commands and advice for coding tools such as Claude Code and Pi.
Claude guide — short Claude-specific entry point that links back to the canonical agent guide.

Ubuntu Ollama coding-model installer

On Ubuntu/Debian-like machines, use the helper script to install Ollama and pull the best large coding model that fits the current hardware:

./scripts/install-best-coding-ollama-ubuntu.sh

The script queries the Ollama Library for available coding-model tags, reads published model sizes/context windows, checks local NVIDIA VRAM and system RAM, prints the compatibility table, then pulls the highest-priority compatible model. It currently prefers large coding models such as qwen2.5-coder:32b, qwen3-coder:30b, codestral:22b, and deepseek-coder-v2:16b when hardware allows them.

Useful overrides:

DRY_RUN=1 ./scripts/install-best-coding-ollama-ubuntu.sh       # inspect selection only
MODEL=qwen2.5-coder:32b ./scripts/install-best-coding-ollama-ubuntu.sh
GPU_HEADROOM_GB=4 RAM_HEADROOM_GB=8 ./scripts/install-best-coding-ollama-ubuntu.sh

Cursor + Ollama

The cursor-ollama command ensures a local Ollama daemon, resolves/pulls a model, writes Cursor’s OpenAI-compatible Ollama settings, and optionally launches Cursor.

cursor-ollama [--no-launch]
cursor-ollama --ollama-host 'https://your-subdomain.ngrok-free.app'   # tunnel URL for Cursor

OLLAMA_LOCAL_HOST — local API for ollama serve, pulls, and health checks (default http://localhost:11434).
OLLAMA_HOST — base URL stored in Cursor settings; defaults to OLLAMA_LOCAL_HOST. Set this to your ngrok (or other) public URL when Cursor must use the tunneled endpoint.
CURSOR_OLLAMA_MODEL — default model (default qwen2.5-coder:7b).

Tell others (copy-paste)

Posting is up to you — here is neutral text you can use on Mastodon, Bluesky, X, Hacker News, or Reddit (follow each site’s self-promotion rules):

Short

Free Python tool: benchmark your PC for local LLMs (Ollama tokens/sec + rough “tokens/year” estimate) + optional Cursor↔Ollama setup. Apache-2.0.
https://github.com/evilmucedin/llm-local-performance-test

Show HN–style title + body

Show HN: llm-local-performance-test — benchmark your machine for local LLM throughput
CLI: microbenchmarks, optional Ollama run with fixed generation settings for fair comparison, naive yearly token estimate. Includes cursor-ollama to point Cursor at Ollama (e.g. via ngrok). Python 3.10+, depends only on psutil.

GitHub “About” topics (set under repo ⚙️)
Ideas: ollama, llm, benchmark, local-llm, machine-learning, cursor-editor, python, nvidia, inference

Origins: derived from experimental llmTest5-style scripts; this repo is the packaged, maintained form.

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
scripts		scripts
src/llm_local_perf		src/llm_local_perf
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-local-performance-test

Why use it

Install

Run

Documentation

Ubuntu Ollama coding-model installer

Cursor + Ollama

Tell others (copy-paste)

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-local-performance-test

Why use it

Install

Run

Documentation

Ubuntu Ollama coding-model installer

Cursor + Ollama

Tell others (copy-paste)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages