fix(install): visible wait on held apt lock instead of a silent spinner (#740)#213
fix(install): visible wait on held apt lock instead of a silent spinner (#740)#213LukasWodka wants to merge 1 commit into
Conversation
…er (#740)
On a fresh cloud VM, unattended-upgrades/apt-daily hold the dpkg
frontend lock for the first few minutes after boot. The system-deps
step runs apt-get update/install under spin_cmd, which redirects output
and animates a spinner, hiding the fact that apt is simply blocked on
the lock. The install looks frozen for minutes and users abort.
Add wait_apt_lock(): before the apt spinner in install_system_deps, poll
the dpkg/apt locks via fuser and surface a clear, non-spinner message
("Waiting for the system package lock - unattended-upgrades can hold it
for a few minutes on a fresh VM...") plus a ticking heartbeat so it is
obviously alive. Bounded by TRACEBLOC_APT_LOCK_TIMEOUT (default 300s);
on timeout it prints actionable guidance and proceeds (apt queues behind
the holder) rather than looping forever. Apt-only - a no-op on other PMs.
The lock probe is split into _apt_lock_held so it can be stubbed at the
function boundary in tests (the bats sandbox cannot take a real kernel
lock). Also documents the "no silent op > a few seconds" progress
contract for the known long-running install steps.
Tests: extend setup-linux.bats with a held-then-released lock (asserts
the wait message + proceed), a never-clearing lock (bounded timeout,
no infinite spin), a free-lock silent no-op, a non-apt no-op, the
install_system_deps ordering (wait before spinner), and the no-fuser
fallback.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
👋 Heads-up — Code review queue is at 17 / 8 Above the WIP limit. The team convention is to review existing PRs before opening new work. Open PRs currently in Code review (oldest first):
Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.) |
|
Closing as superseded. The simpler What this PR adds that the shipped version doesn't:
If we want any of those refinements later, refile each as a small targeted PR against current Closed during conflict cleanup 2026-06-08. |
Closes #740
Summary
On a fresh cloud VM,
unattended-upgrades/apt-dailyhold the dpkg frontend lock for the first few minutes after boot. The system-deps step runsapt-get update/installunderspin_cmd, which redirects output and animates a spinner — hiding that apt is just blocked on the lock. The install looks frozen for minutes and users abort. This adds a visible wait-for-lock before the apt spinner, with a heartbeat and a bounded timeout, so no long apt step is silent. (Package-name skew was already handled in #720; this is the lock/visibility dimension only.)Related
Closes #740 · Part of tracebloc/backend#736 (install-journey epic) · Builds on #720 (conntrack package-name skew).
Type of change
What changed
scripts/lib/setup-linux.sh:wait_apt_lock()— called ininstall_system_depsbefore the$PM_UPDATE/ install spinner. Polls the dpkg/apt locks; while held, prints a clear non-spinner message ("Waiting for the system package lock — unattended-upgrades can hold it for a few minutes on a fresh VM…") plus a same-line heartbeat with a ticking elapsed counter (proof of life). Bounded byTRACEBLOC_APT_LOCK_TIMEOUT(default 300s); on timeout itwarns with the likely holder + actionablelsof/systemctl statusguidance and proceeds (apt queues behind the holder) rather than looping forever. Apt-only — a silent no-op on dnf/yum/zypper/pacman (out of scope here)._apt_lock_held()— the single lock probe (fuseronlock-frontend/lists/lock/dpkg/lock), split out so tests can stub it at the function boundary. Iffuseris absent it reports "free" so we never block on an unknowable state (apt's own internal waiting then takes over)._apt_lock_holder_hint()— best-effort name of the holding service for the timeout message.spin_cmd— the wait runs before the spinner, so there is no double-spin.scripts/tests/setup-linux.bats: 7 new tests (see below).Test plan
Verified locally (macOS, bats 1.13.0)
bash -non every shell script inscripts/— all parse.shellcheck --severity=error(mirrors the CI gate) on the libs + entrypoints — clean. Zero warnings onsetup-linux.sheven at--severity=warning.bats scripts/tests/setup-linux.bats— all 7 new tests pass:wait_apt_lock: held lock emits a visible wait, then proceeds when it clearswait_apt_lock: never-clearing lock times out cleanly (no infinite spin)wait_apt_lock: free lock is a silent no-opwait_apt_lock: non-apt distro skips the apt lock wait entirelyinstall_system_deps: waits on the apt lock before the install spinner_apt_lock_held: nofuser→ reports free (does not block)Tests mock the lock at the function boundary (
_apt_lock_heldreturns "held" for the first N probes, then "free", simulatingunattended-upgradesreleasing it) and stubsleepso they run instantly — the bats sandbox can't take a real kernel lock. Limitation: these assert the loop/messaging/timeout logic, not a real held/var/lib/dpkg/lock-frontend. The real-lock path is exercised by thedistro-prereqsjob's actual apt run in CI.Pre-existing local failures (NOT from this PR)
Two existing tests —
install_docker_engine: Amazon Linux -> dnf dockerand… RHEL clone (#719) -> docker-ce dnf repo— fail on a cleandevelopcheckout on macOS because their branch is guarded by[[ -f /etc/os-release ]], which is false on macOS, so the mockedgrepnever runs. They pass in CI (Linux, where/etc/os-releaseexists) and are untouched by this change.Needs CI
installer-tests.yaml→unit-bash(bats) on Linux: full suite incl. the two macOS-only failures above.installer-tests.yaml→distro-prereqs: real apt path on Ubuntu/Debian images exerciseswait_apt_lockagainst a live (usually free) lock.Deployment notes
New optional env var
TRACEBLOC_APT_LOCK_TIMEOUT(seconds, default300) tunes how long to wait before proceeding. No other config or rollout changes.Checklist