A risk email every fifteen minutes about nothing
The risk-monitor cron on a side project I run was emailing me every 15 minutes with "risk factor appeared" or "risk factor resolved" for low-severity factors that nobody needed to know about. The fix was a one-block severity gate that I should have written the first day.
What was happening
The cron evaluates several risk factors per user, assigns a
severity to each (LOW, MODERATE, HIGH, CRITICAL), and
fires an email when the set of active factors changes. The
"changed" comparison fired on any churn — including a
low-severity stationary_long factor that appeared, resolved,
re-appeared, and re-resolved on its own every few cron ticks
because the underlying check was a little flaky around its
threshold.
So my inbox got an email every 15 minutes about a factor that was, by its own severity tag, low enough not to warrant an email.
What I found
The email-send branch wasn't checking severity at all. It only checked "did the set change?" which is a fine condition for in-app feed updates but a terrible condition for paging email.
The fix
One gate, four conditions:
$shouldEmail =
$currentMaxSeverity >= RISK_MODERATE
|| $changeType === 'severity_increased'
|| ($changeType === 'risks_resolved'
&& $previousSeverity >= RISK_MODERATE);
if (!$shouldEmail) {
error_log("risk_monitor: change at LOW severity, suppressing email");
return;
}
sendRiskEmail(...);
In plain English:
- Don't email about factors below MODERATE.
- Do email if severity went up, even from LOW to MODERATE.
- Do email when factors clear, but only if the previous state was MODERATE or worse.
Net effect: the LOW factor churn is now visible in the in-app feed (where it belongs) and silent in email (where it didn't belong).
What I'd do differently
The original mistake was conflating two channels. The in-app feed is high-volume, low-friction — show me everything. Email is low-volume, high-friction — only show me things I need to act on. Different channels need different filters.
Same lesson applies to push notifications versus SMS versus phone calls in this app: each channel has a severity floor. Anything that wakes someone up at 03:00 should be at the top of the severity ladder, not the bottom. I would not get that right by default — I have to write it down as a rule and enforce it in the dispatch code.