Turning 18 months of San Francisco 311 case photos into a clustered, browsable map of what the city is actually complaining about — and a causal estimate of where those complaints sit longest.
The pipeline pulls public 311 cases from the Socrata API, downloads each attached photo, captions it with a vision model, embeds the captions, and runs Leiden community detection over the kNN graph of embeddings. The resulting clusters get auto-named by a second model pass and projected into 2D / 3D with UMAP for an interactive browser atlas. Then we run an embedding-matched causal study on top of the same vectors to estimate which neighborhoods take longer to close cases that look the same to a model.
- 355 cases · 982 photos · 8 discovered visual clusters, automatically named.
- Mission cases take +2.21 hours longer to close than visually-matched controls in the rest of SF (95% CI [0.53, 3.95], p = 0.002).
- South of Market closes them 1.79 hours faster than matched controls (p < 0.001).
- Mean cosine similarity between treated cases and their k=3 nearest matched controls: 0.81 — controls really are visually similar.
Six stages, each checkpointed to parquet so reruns are cheap:
| stage | tech | output |
|---|---|---|
pull |
Socrata API + async httpx | cases.parquet (18 mo of 311 cases) |
download |
async httpx, on-disk cache | photos_index.parquet + photos/*.jpg |
caption |
gpt-4o-mini (low-detail vision) |
captions.parquet (scene/issues/severity) |
embed |
text-embedding-3-small (1536d) |
embeddings.parquet |
cluster |
Leiden + sub-Leiden + UMAP | clusters.parquet, cluster_names.json |
causal |
kNN matching on embeddings | viz/causal.json (ATT per neighborhood) |
Run end-to-end:
uv sync
cp .env.example .env # add OPENAI_API_KEY
uv run python pipeline.py --stages allOr run any subset:
uv run python pipeline.py --stages caption,embed,clusterEach stage skips work that's already cached on disk, so a rerun on the same data is seconds, not hours.
Cluster names are generated by a second LLM pass over the top-K captions in each Leiden community — no hand labels. The largest clusters (trash/disarray, parking, overgrowth) are exactly what a city ops team would expect; the smaller ones (mobility & maintenance) surface things that don't show up cleanly in 311's own service-name taxonomy.
Question. Holding the visual content of the complaint constant, does the neighborhood where the case sits change how long it takes to close?
Setup. For each treated case in neighborhood N, find the k=3 nearest controls by cosine distance on caption embeddings, drawn from cases in the rest of SF. The within-pair difference in response_hours averaged across the treated set is the ATT under unconfoundedness given the visual content captured by the embedding. 95% CIs come from a paired bootstrap; p-values from a permutation test that re-shuffles the treatment label.
Mission is the headline: 35 treated cases, +2.2 hours slower than visually-matched controls, robust across k ∈ {1, 3, 5}. South of Market is the inverse — visually-similar complaints close noticeably faster there. The other neighborhoods have tiny samples (n ≤ 7) and CIs that cross zero, so they're descriptive only.
Caveat that always applies to matched designs: this rules out confounding from anything captured in the photo (severity, scene type, foreground objects) but cannot rule out confounding from anything the photo can't see — time of day the report came in, the specific responder dispatched, weather, etc.
viz/index.html is a single-file local atlas: 3D UMAP scatter on the left (auto-rotating, click any point to see the photo), Mapbox map on the right. viz/causality.html is the matched-pairs explorer for the causal study.
cd viz && python3 -m http.server 8000
# then open http://localhost:8000/No build step, no framework — just static HTML + the prebuilt data.json / causal.json.
Python 3.11, uv, httpx, polars, openai, leidenalg + python-igraph, umap-learn, scikit-learn, duckdb, pyarrow, scipy.
LLM choices were made for cost:
- Captioning:
gpt-4o-miniat low detail ≈ $0.0001 / image — fine for trash / encampment / overgrowth, misses small detail. - Embedding:
text-embedding-3-small(1536d) ≈ $0.02 / 1M tokens. - Cluster naming:
gpt-4o-miniagain, one call per cluster.
Total OpenAI spend for the full 355-case run was under $0.50.
pipeline.py # CLI entry; argparse --stages
stages/
pull.py # Socrata API → cases.parquet
download.py # async photo download
caption.py # vision captions → structured tags
embed.py # text-embedding-3-small
cluster.py # Leiden + sub-Leiden + UMAP + naming
causal_match.py # kNN embedding matching → ATT
common/
io.py # parquet helpers, paths, schema contracts
http.py # shared httpx.AsyncClient + retry
env.py # .env loading
progress.py # throughput logging
viz/
index.html # 3D UMAP + map atlas
causality.html # matched-pairs explorer
build_data.py # parquet → viz/data.json
build_readme_images.py # regenerate the figures in this README
tests/ # one offline smoke test per stage
docs/
superpowers/specs/ # design doc
superpowers/plans/ # implementation plan
img/ # README figures
uv run python viz/build_readme_images.pyRegenerates docs/img/{umap,clusters,causal,pipeline}.png from viz/data.json and viz/causal.json.



