Counting a word across Signal messages with the macOS keychain
I wanted to count how many times I'd used a specific word across all my outgoing Signal messages. Sounded like five lines of SQL. The interesting part was the SQLCipher-encrypted database and how to get the key out of the macOS keychain without breaking Signal.
What was happening
Signal Desktop on macOS stores its message database at ~/Library/Application Support/Signal/sql/db.sqlite, but it's encrypted at rest with a SQLCipher key. The key itself lives in the macOS Keychain, scoped to the Signal app's bundle identifier. You can't just open the file with sqlite3.
What I found
Three parts to extract the data cleanly:
Pull the key. Signal stores its 64-character hex key under the Signal Safe Storage keychain entry. security will print it if you ask:
security find-generic-password -ws 'Signal Safe Storage'
It prompts once on first run, then macOS remembers. Wrap that in a Python subprocess.check_output and you have the key.
Open the DB with SQLCipher, not stock sqlite3. Stock sqlite3 from Homebrew is built without SQLCipher. You need either pysqlcipher3 or to shell out to sqlcipher. Easiest is the CLI:
sqlcipher db.sqlite <<SQL
PRAGMA key = "x'$KEY'";
SELECT body FROM messages WHERE type = 'outgoing';
SQL
The x'...' syntax tells SQLCipher the key is hex-encoded raw bytes, not a passphrase.
Don't read the live DB. Signal Desktop holds the DB open with WAL mode and writes to it constantly. If you read it while Signal is running you get partial writes, lock failures, or worse, corrupt the WAL. Two options:
- Snapshot via
sqlite3 .backup(works with SQLCipher too). rsyncthe file with--inplace --no-whole-fileto a side directory and read the copy.
The whole counter is then:
import subprocess, json, re
from datetime import datetime, timezone
KEY = subprocess.check_output([
"security", "find-generic-password", "-ws", "Signal Safe Storage"
]).decode().strip()
# rsync db + wal to /tmp/signal-snapshot/ first
rows = subprocess.check_output([
"sqlcipher", "/tmp/signal-snapshot/db.sqlite",
f'PRAGMA key = "x\'{KEY}\'"; SELECT sent_at, body FROM messages WHERE type=\'outgoing\' AND body IS NOT NULL;'
]).decode().splitlines()
WORD = re.compile(r'\bpookie\b', re.IGNORECASE)
today = datetime.now(timezone.utc).strftime('%Y-%m-%d')
today_count = sum(
len(WORD.findall(body))
for line in rows
for sent_at, _, body in [line.split('|', 2)]
if datetime.fromtimestamp(int(sent_at)/1000, tz=timezone.utc).strftime('%Y-%m-%d') == today
)
The case-insensitive regex with \b boundaries handles "pookies" as two if you wrote "pookie pookie" but as one occurrence if you wrote "pookies". Tradeoff: word boundaries don't catch the plural as the same word, but they do prevent matching "spookie" or other substrings. I chose word boundaries because the false positives were more annoying than the false negatives.
A LaunchAgent runs the counter every 15 minutes and at login:
<plist version="1.0">
<dict>
<key>Label</key><string>com.cwfrazier.pookie</string>
<key>ProgramArguments</key>
<array>
<string>/Users/cwfrazier/bin/pookie-count.py</string>
</array>
<key>StartInterval</key><integer>900</integer>
<key>RunAtLoad</key><true/>
<key>StandardOutPath</key><string>/Users/cwfrazier/Library/Logs/pookie.log</string>
<key>StandardErrorPath</key><string>/Users/cwfrazier/Library/Logs/pookie.err</string>
</dict>
</plist>
It POSTs the daily/monthly/yearly totals to a small endpoint on the homelab with an X-Pookie-Secret header, which writes atomically to a JSON file the dashboard reads.
What I'd do differently
I should have started with a Signal Desktop snapshot rather than rsyncing the live file. The .backup command in SQLCipher takes care of WAL races for you. My rsync setup occasionally failed verification mid-WAL update and silently advanced the cursor anyway, which meant some days lost data until I noticed the dashboard showed (stale).
The other lesson: macOS LaunchAgents need Full Disk Access to read anything under ~/Library/Messages or to traverse most of Application Support reliably. The fix is to grant FDA to the actual resolved Python binary (not the venv shim) in System Settings → Privacy & Security. If you grant it to a wrapper script or to python3 symlink, the spawned process loses the privilege when launchd resolves the binary path.