My per-todo chat was a toy LLM in a tool-capable house
My web todo app has a little chat button on every todo row. The idea: open a todo, ask the assistant "what should I do here?", get a tool-driven answer with real context. The reality, for a while: the per-todo chat was wired straight to a vanilla model call with no tools. It would politely tell me "I don't have access to that," while the main assistant on the same host had SSH, AWS, the works.
What was happening
The chat route inside todo_routes.py was calling
categorize.chat(...), which made a plain Anthropic API call
with a system prompt and the message history. No tool definitions,
no MCP, no file system access. So every "Do it" or "check on
that" reply was correctly declining — the model genuinely
couldn't do it. The model wasn't lying; the route was wired
wrong.
Meanwhile main chat on the same backend used the Claude Agent
SDK with permission_mode="bypassPermissions", a memory MCP, a
working directory, and full tool access. Two chat surfaces, two
completely different power levels, both labeled "Cass."
What I found
The reason it shipped that way was friction. The relevant files
were chattr +i'd (defense against an unrelated sync bug — see
OneDrive was overwriting my backend),
so I couldn't just edit categorize.py in place. The path of
least resistance was a new module that mirrored the function
signatures and replaced the function references at module
load, not on disk.
# todo_agent.py — new file, not locked
import asyncio
from claude_agent_sdk import sdk_query
from main import SYSTEM_PROMPT
async def chat(item, history, user_msg, related=None):
system = SYSTEM_PROMPT + "\n\n" + _scope_block(item, related)
try:
return await asyncio.wait_for(
_run_agent(system, history, user_msg),
timeout=90,
)
except asyncio.TimeoutError:
# return whatever we've accumulated so far rather than nothing
return _partial_reply()
async def _run_agent(system, history, user_msg):
result = await sdk_query(
prompt=user_msg,
system_prompt=system,
history=history,
permission_mode="bypassPermissions",
mcp_servers=["memory"],
cwd="/opt/assistant/data/workspace",
)
return result.text
Then the monkey-patch, done right after route registration in
the mutable main.py:
import categorize
import todo_agent
categorize.chat = todo_agent.chat
categorize.describe = todo_agent.describe
todo_routes.py is locked, but at call time it does
categorize.chat(...) — which is attribute lookup on the module
object, which is in memory and mutable. Every call lands on
todo_agent.chat from then on without anyone having to edit a
locked file.
The fix
After the swap, the per-todo chat does what the main chat does.
Smoke test: a todo that said "Sonarr volume critically low"
prompted "Do it." The per-todo agent SSH'd into the right
container, ran df, pulled the real disk free number, fetched
related memory entries, and replied with an actual plan. That's
the experience I'd intended six months earlier.
A few caveats I deliberately left:
- Streaming back to the per-todo UI isn't wired yet. The UI shows "replying…" until the POST returns. Anything close to the 90s cap will look hung even when it's working.
- The per-todo chat doesn't persist a
ClaudeSDKClientbetween turns; each turn rebuilds from history. Fine for short bursts, bad for long ones. - If I ever undo the immutable bit on the right files, the
monkey-patch should go — replace
categorize.chatdirectly, drop the indirection.
What I'd do differently
The monkey-patch works, but it's load-bearing on a future maintainer (me) remembering it exists. If the patch line gets deleted, the per-todo chat silently regresses to the old toy-model behavior with no error, just bad replies. That's a nasty failure mode.
The standing rule I should have followed in the first place: when the chat in one part of the app calls a function from a "core" module, and the core module gets locked for a separate reason, the chat call should go through a dependency-injection seam, not through a direct module reference. Then swapping behavior is a config change, not a monkey-patch.