Backfilling Signal history that the server doesn't have
I wanted my homelab assistant to have access to my Signal history the same way it has my iMessage and email. Signal's whole point is that the server doesn't keep your messages, which means there's no API to call. A freshly-linked device only sees messages going forward.
What was happening
Signal Desktop on the Mac mini has had everything since I first linked it months ago. The messages live in a SQLCipher database on disk:
~/Library/Application Support/Signal/sql/db.sqlite
SQLCipher = SQLite with transparent encryption. You can't just open it; you need the key. Signal Desktop stores that key in the macOS Keychain.
What I found
Two-phase plan, because the two halves of the problem don't talk to each other:
-
Future messages. Run
signal-cliin an LXC as a new device, linked via QR code. From that point on, a small cron job pipessignal-cli -o json receive --timeout 55into an ingest script every minute. New messages land in a local SQLite for the assistant to query. -
Past messages. Pull the decryption key out of the Keychain (one prompt to authorize), decrypt the Signal Desktop database into a plaintext SQLite, then bulk-import every message into the same store under a
signal/topic prefix.
Once both are running, the historical archive and the live feed get
unified behind a single cass_signal.py query tool. Live messages
get a ·live marker so I can tell at a glance whether a result
came from the archive or the running pipeline.
The fix
The signal-cli side is a one-minute cron entry:
* * * * * /usr/local/bin/cass-signal-receive.sh
The script wraps the receive call and pipes JSON into the ingest:
signal-cli -o json receive --timeout 55 \
| /opt/assistant/venv/bin/python3 \
/opt/assistant/cass_signal_ingest.py \
--db /opt/assistant/data/signal/incoming.db
The historical archive shipped first. About 4.5 months of messages, roughly 3K rows, all queryable with stats, per-thread dump, date filters, body search, and call history.
What I'd do differently
The "new device only sees forward" rule isn't really a Signal quirk — it's the design. I should have set up signal-cli the day I started seriously using Signal, so there'd be no archive recovery problem to solve later. Same lesson applies to anything end-to-end encrypted: if you ever want a personal archive, start collecting the moment you start using the platform, not when you finally have a reason.