Skip to content

evilmucedin/llm-local-performance-test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-local-performance-test

License Python 3.10+ GitHub stars

Benchmark your laptop or desktop for local LLMs — CPU/RAM/GPU probes, quick microbenchmarks, optional live Ollama tokens/sec, a rough estimated LLM tokens per year, and the equivalent hourly Claude output-token cost. Includes helpers to wire Cursor to Ollama (including over an ngrok tunnel). One dependency: psutil.

Repository: github.com/evilmucedin/llm-local-performance-test

Why use it

  • Comparable numbers across machines (fixed Ollama num_ctx / num_predict on the full run).
  • Claude cost comparison — prints the estimated hourly dollar value of the annual token estimate at Claude output-token pricing.
  • --simple mode — no Ollama required; good for CI or air-gapped checks.
  • cursor-ollama — start/configure Ollama and merge OpenAI-compatible Ollama settings into Cursor’s settings.json.
  • Apache-2.0 — use and fork freely.

Install

git clone https://github.com/evilmucedin/llm-local-performance-test.git
cd llm-local-performance-test
python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -e .

Or run from a clone without install (still need pip install psutil in your environment):

python run.py --simple

Run

# Full run (contacts Ollama API at OLLAMA_HOST if running; uses the best installed model by default)
llm-local-perf

python -m llm_local_perf
python run.py

# Full run with an explicit Ollama model
llm-local-perf --ollama-model qwen2.5-coder:7b
python run.py --ollama-model qwen2.5-coder:7b

# Lighter path: no Ollama inference
llm-local-perf --simple

By default, full mode selects the best installed Ollama model it can find, preferring non-embedding models with the largest parsed parameter count or downloaded size. Full-mode token/year and Claude-equivalent hourly cost estimates are adjusted by the selected Ollama model size when the model tag includes a size such as :7b, :32b, or :0.5b.

Environment:

  • OLLAMA_HOST — Ollama base URL (default http://localhost:11434).
  • OLLAMA_BENCH_TIMEOUT — per-request timeout for the main benchmark in seconds (default 5800).

For GPU listing on Linux/NVIDIA, nvidia-smi should be on PATH if you want discrete VRAM reported.

Documentation

  • Architecture — package layout, execution modes, estimation flow, pricing assumptions, and output conventions.
  • Coding-agent guide — validation commands and advice for coding tools such as Claude Code and Pi.
  • Claude guide — short Claude-specific entry point that links back to the canonical agent guide.

Ubuntu Ollama coding-model installer

On Ubuntu/Debian-like machines, use the helper script to install Ollama and pull the best large coding model that fits the current hardware:

./scripts/install-best-coding-ollama-ubuntu.sh

The script queries the Ollama Library for available coding-model tags, reads published model sizes/context windows, checks local NVIDIA VRAM and system RAM, prints the compatibility table, then pulls the highest-priority compatible model. It currently prefers large coding models such as qwen2.5-coder:32b, qwen3-coder:30b, codestral:22b, and deepseek-coder-v2:16b when hardware allows them.

Useful overrides:

DRY_RUN=1 ./scripts/install-best-coding-ollama-ubuntu.sh       # inspect selection only
MODEL=qwen2.5-coder:32b ./scripts/install-best-coding-ollama-ubuntu.sh
GPU_HEADROOM_GB=4 RAM_HEADROOM_GB=8 ./scripts/install-best-coding-ollama-ubuntu.sh

Cursor + Ollama

The cursor-ollama command ensures a local Ollama daemon, resolves/pulls a model, writes Cursor’s OpenAI-compatible Ollama settings, and optionally launches Cursor.

cursor-ollama [--no-launch]
cursor-ollama --ollama-host 'https://your-subdomain.ngrok-free.app'   # tunnel URL for Cursor
  • OLLAMA_LOCAL_HOST — local API for ollama serve, pulls, and health checks (default http://localhost:11434).
  • OLLAMA_HOST — base URL stored in Cursor settings; defaults to OLLAMA_LOCAL_HOST. Set this to your ngrok (or other) public URL when Cursor must use the tunneled endpoint.
  • CURSOR_OLLAMA_MODEL — default model (default qwen2.5-coder:7b).

Tell others (copy-paste)

Posting is up to you — here is neutral text you can use on Mastodon, Bluesky, X, Hacker News, or Reddit (follow each site’s self-promotion rules):

Short

Free Python tool: benchmark your PC for local LLMs (Ollama tokens/sec + rough “tokens/year” estimate) + optional Cursor↔Ollama setup. Apache-2.0.
https://github.com/evilmucedin/llm-local-performance-test

Show HN–style title + body

Show HN: llm-local-performance-test — benchmark your machine for local LLM throughput
CLI: microbenchmarks, optional Ollama run with fixed generation settings for fair comparison, naive yearly token estimate. Includes cursor-ollama to point Cursor at Ollama (e.g. via ngrok). Python 3.10+, depends only on psutil.

GitHub “About” topics (set under repo ⚙️)
Ideas: ollama, llm, benchmark, local-llm, machine-learning, cursor-editor, python, nvidia, inference


Origins: derived from experimental llmTest5-style scripts; this repo is the packaged, maintained form.

License

See LICENSE.

About

App for testing performance of a laptop or a PC for local LLM generation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors