A dashboard deadlock from missing per-task timeouts
My homelab home page started returning 504 for its /api/dashboard route. Nothing else was obviously broken. The page just hung.
What was happening
The dashboard aggregator fans out to a dozen subsystems (mail queue, bandwidth, torrent stats, etc.) in parallel and renders whatever comes back. Bandwidth was one of those subsystems. When I looked at the MariaDB process list, six queries against the bandwidth schema were sitting in Sending data state. They had been there for about eight hours, and they had eaten the entire five-connection pool.
That alone wouldn't have killed the dashboard. The killer was that the aggregator did this:
const results = await Promise.allSettled(tasks);
No per-task timeout. allSettled waits for every promise to either resolve or reject. The bandwidth task was never going to do either, so the whole dashboard waited with it.
What I found
The stuck queries refused to die. KILL QUERY returned success but did nothing. systemctl restart mariadb hung at the stop step for several minutes before timing out. Only systemctl kill -s KILL mariadb actually killed the process and let it come back. Connection pool drained, dashboard responded again.
The underlying query plan was fine — it was a stuck network read against a downstream replica that had partially failed. The right MySQL-side fix is a sensible wait_timeout and MAX_EXECUTION_TIME hint. But the dashboard had no business waiting on MySQL forever in the first place.
The fix
Wrap every aggregator task in its own deadline:
const withDeadline = (p, ms, label) =>
Promise.race([
p,
new Promise((_, reject) =>
setTimeout(() => reject(new Error(`${label} timeout`)), ms)
),
]);
const results = await Promise.allSettled(
tasks.map(t => withDeadline(t.run(), 4000, t.name))
);
Four seconds is generous for a dashboard tile. Anything slower than that should render as "unavailable" and let the page load.
What I'd do differently
Promise.allSettled is a trap when any of the tasks are I/O-bound against external services. It looks defensive — "I'll just collect whatever finishes" — but it has no opinion on what counts as finishing. If you're using it, you almost always want a deadline wrapper around each task. I'll add that as a lint-rule-in-my-head for any future aggregator code.