The dict.get() default that wasn't

A message-ingest pipeline on my homelab went quiet for a week before I noticed. Not loudly broken — no alert fired for days, the process was "running," the log file even had recent writes. It just wasn't storing anything. When I finally dug in, the root cause was a single line that looks completely fine until you stare at it:

src = env.get("source") if env.get("source", "").startswith("+") else env.get("sourceNumber")

The intent is obvious: if source is a +-prefixed phone number, use it; otherwise fall back. The env.get("source", "") is supposed to make that null-safe — return an empty string when there's no source, and "".startswith("+") is a harmless False.

Except that's not what dict.get does.

The footgun

dict.get(key, default) returns the default only when the key is absent. If the key is present with a value of None, you get None back — the default is never consulted:

>>> {}.get("source", "")          # key absent
''
>>> {"source": None}.get("source", "")   # key present, value None
>>>                                # -> None, NOT ""

The upstream JSON had "source": null on certain envelope shapes. So env.get("source", "") returned None, and None.startswith("+") threw AttributeError. That exception killed the parser, which broke the pipe feeding it, which made the producer throw a "connection closed, reconnecting" warning and loop. Every cycle: crash, reconnect, crash.

The fix is boring, which is the point:

src = env.get("source") if (env.get("source") or "").startswith("+") else env.get("sourceNumber")

(d.get(k) or "") coerces both "absent" and "present-but-null" to the empty string. Any time a value can be JSON null, that's the pattern you want — not the default argument.

Why it stayed hidden for a week

One bug got it broken; a second one kept it broken quietly. The job is a cron task guarded by flock -n so two copies never run at once. When the crashing process eventually wedged — stuck holding the lock instead of exiting — every subsequent cron invocation hit the non-blocking lock, got rejected, and exited silently with no output.

That had a nasty side effect on monitoring. My healthcheck asserted freshness by watching the log file's mtime. During the crash-loop phase the log kept getting fresh tracebacks appended every minute, so it looked healthy. Only once the process wedged and the log stopped growing did the monitor finally notice — days into an outage that had already lost a week of data the producer had acknowledged upstream and discarded.

What I changed my mind about

Monitor the thing you actually care about, not a proxy for it. The log was never the deliverable — the stored rows were. A healthcheck that asserted "the datastore got a write in the last N minutes" would have paged on day one. A log that's full of exceptions is more active than a healthy-but-idle one, so log-freshness is exactly backwards as a liveness signal for a pipeline that's supposed to be quietly succeeding.

Two cheap lessons, then:

dict.get(k, default) is not null-safe. Reach for (d.get(k) or "") whenever the value can be null.
Assert on the output artifact, not on a log file. Logs lie in both directions — silent when things succeed quietly, noisy when they fail.