Route files were losing the immutable bit
I set chattr +i on my backend's main.py after a sync bug kept
silently overwriting it. Months later, the iOS app started losing
connectivity on a recurring schedule. Routes were vanishing again.
The immutable bit had vanished too.
What was happening
Something — a container reboot, a maintenance script, an apt
upgrade, I never positively identified which — was clearing the
extended attributes on main.py from time to time. Once the file
was mutable again, the original sync bug (long since masked, not
fixed) would happily revert recent changes, including the route
definitions the iOS app depended on.
The user-visible failure was always the same: iOS app shows
"connecting…" indefinitely. curl https://x.example.com/<route>
returns 404. The route was right there in the source on disk —
just an older copy of the source.
What I found
Two changes I'd been treating as independent:
- Route definitions live in
main.py. main.pyis the file that gets reverted.
Co-locating them was the bug. As long as the routes I cared about lived in the file most at risk, every loss of the immutable bit became a route-availability incident.
The fix
Pull route definitions out of main.py and into a sidecar process
on its own port. A small FastAPI app listening on :8089,
deployed and supervised independently, mounted into the main
server's path tree via nginx instead of by Python import.
# proxy config: main app, with a route subtree carved out
location / {
proxy_pass http://127.0.0.1:8088;
}
location /api/routes/ {
proxy_pass http://127.0.0.1:8089/;
}
The sidecar's files don't get touched by whatever process is
clobbering main.py. Even if the main backend reverts, the route
surface keeps responding.
Re-applied the immutable bit on main.py in the same pass, of
course, because the original defense still has value — but
without route definitions inside it, the worst case is "the
assistant chat surface temporarily regresses to last week's
behavior," not "the iOS app can't reach the server at all."
What I'd do differently
The general lesson is don't put critical-availability state in a file you've already had to defend with a hack. The immutable bit was a workaround for one failure mode (sync overwrite). When something else started defeating the workaround, the smart move was to relocate the critical state, not to add a second workaround on top.
I'd also tail dmesg for setattr calls on the file the next time
I suspect attribute loss, instead of guessing at culprits. Linux
will tell you who reset the bit if you actually ask it.