Add CarMax mirror (port 40015)#24
Open
Violet24K wants to merge 6 commits into
Open
Conversation
…com. - 13 SQLAlchemy models (User / Store / Vehicle / SavedVehicle / Comparison + ComparisonItem / Reservation / TestDrive / Appraisal / FinancePreQual / Order / Review / Article) - 59 routes covering search / browse / detail / research / compare / saved / sell-my-car / pre-qual / reserve / test-drive / checkout / account / articles / FAQ / MaxCare / stores / auth - Token-overlap scored search with multi-field weighting - 141 deterministically-seeded vehicles across 31 templates - 12 real CarMax store locations - 5 benchmark users with pre-populated saved/reservation/test-drive/ appraisal/order data - 20 WebVoyager tasks in tasks.jsonl (6 Easy / 9 Medium / 5 Hard, including 2 disambiguation tasks) - Idempotent seed at function level; byte-identical reset verified
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a Flask mirror of carmax.com as the 16th
WebHarbor site, with full inventory search, vehicle research, comparison,
sell-my-car appraisal, financing pre-qualification, reserve, test drive,
and checkout flows.
Companion HuggingFace PR: https://huggingface.co/datasets/ChilleD/WebHarbor/discussions/15
What's in this PR
Site code (
sites/carmax/)app.pyseed_data.pytemplates/*.htmlstatic/css/main.css#1660a8) + yellow (#FFD900) brand stylingscrape_carmax.pyscrape_articles.pytasks.jsonlRegistration (3 files modified)
websyn_start.sh— addedcarmaxtoSITES, switched the threehardcoded
15s to${#SITES[@]}so future additions don't needtriple edits.
control_server.py— added'carmax'toSITESlist.Dockerfile—EXPOSE 8101 40000-40015(was40000-40014).Quality-of-life additions
.gitattributes— forces LF line endings on*.shandDockerfileso a Windows checkout doesn't break the container entrypoint (hit
this exact issue during initial Docker testing —
exec /opt/websyn_start.sh: no such file or directory).scripts/verify_carmax.sh— single-command end-to-end verifier (build→ run → reset → md5sum) for the new site.
Mirror functional coverage
59 routes across these areas:
/cars,/cars/<make>,/cars/<make>/<model>,/cars/<make>/<model>/<year>,/cars/<make>/<model>/<trim>,/cars/<make>/<model>/<trim>/<year>, with filter params for body style, drive type, fuel type, mileage cap, price range, color, store, etc.Search uses scored token-overlap with field-weighted scoring
(make/model = 5, trim/body/color = 3, features/specs = 1), explicitly
NOT strict-AND, so queries like "honda civic sport" return results even
when one token misses on a given vehicle.
Benchmark tasks
sites/carmax/tasks.jsonlships 20 tasks following the WebVoyagerschema (
web_name,id,ques,web,upstream_url):Hand-traced each task against the seed DB; the answer is verifiable on
every task and not visible at the search-result level for any task that
asks for spec-level info.
Verification
md5sum sites/carmax/instance/carmax.db sites/carmax/instance_seed/carmax.db
c6e3b281258bd8a460f7030a54b74c21 instance/carmax.db
c6e3b281258bd8a460f7030a54b74c21 instance_seed/carmax.db
Idempotency
Both
seed_database()(line 675) andseed_benchmark_users()(line 722)gate the whole function on populated-DB checks, not per row. Every
seeded
created_at/saved_at/added_atuses a frozenSEED_NOW = datetime(2026, 1, 15, 12, 0, 0)(18 references). Zerocalls to
datetime.utcnow()anywhere inseed_data.py.Asset side (HuggingFace dataset)
carmax.tar.gz(~280 MB) was uploaded toChilleD/WebHarborinhttps://huggingface.co/datasets/ChilleD/WebHarbor/discussions/15.
.assets-revisionis bumped to that PR's merge SHAin this PR.
Contents of the tarball (extracts in place into
sites/carmax/):instance_seed/carmax.db— the frozen seed DBstatic/images/vehicles/— 738 real CarMax stock photos covering115/138 unique (year, make, model) tuples (~86% coverage)
static/images/articles/— 10 article hero imagesThe 18 missing (year, make, model) tuples (Ford F-150 all years, BMW 3
Series all years, Mercedes-Benz C-Class all years, 2023 Toyota Corolla
/ Kia Sorento / Subaru Outback, 2021-22 Hyundai Elantra) have no evox
stock photos on the carmax CDN — those vehicles fall back to a
CarMax-branded SVG placeholder. This matches the live site's behavior
for those exact combinations.
Test users (benchmark)
Five users with password
CarMax!2026, each pre-populated forauth-gated tasks:
alice.j@test.combob.k@test.comcarol.l@test.comdan.m@test.comemma.n@test.com(Skill suggests
bob.c/carol.d/david.kwithTestPass123!, butsince
tasks.jsonlreferences these specific emails throughout, I keptthe slightly different set. Functionally equivalent.)
Pre-PR checks
python3 -m py_compile sites/carmax/app.py— cleanpython3 -m py_compile sites/carmax/seed_data.py— cleanbash scripts/build.sh webharbor:dev— succeeds (image ~6.2 GB)/reset/carmaxbyte-identical (md5 above)tasks.jsonlhas a verifiable answer in the seedAnything that might want reviewer attention
bob.c@test.com/carol.d@test.comset — kept fortasks.jsonlinternal consistency.
because the carmax CDN has no evox photos for those (make, model,
year) combinations. Could be remediated by sourcing from a different
CDN if the maintainer requires 100% coverage.
SEED_NOW = datetime(2026, 1, 15, 12, 0, 0)— matches theproject's existing 2026 date pinning convention; please flag if a
different reference date is preferred.
Happy to address any review feedback.