My per-todo chat was a toy LLM in a tool-capable house

FILE 0x8E·MY PER-TODO CHAT WAS A TOY LLM IN A TOOL-CAPABLE HOUSE

May 11, 2026 · llm, agents, python

My web todo app has a little chat button on every todo row. The idea: open a todo, ask the assistant "what should I do here?", get a tool-driven answer with real context. The reality, for a while: the per-todo chat was wired straight to a vanilla model call with no tools. It would politely tell me "I don't have access to that," while the main assistant on the same host had SSH, AWS, the works.

What was happening

The chat route inside todo_routes.py was calling categorize.chat(...), which made a plain Anthropic API call with a system prompt and the message history. No tool definitions, no MCP, no file system access. So every "Do it" or "check on that" reply was correctly declining — the model genuinely couldn't do it. The model wasn't lying; the route was wired wrong.

Meanwhile main chat on the same backend used the Claude Agent SDK with permission_mode="bypassPermissions", a memory MCP, a working directory, and full tool access. Two chat surfaces, two completely different power levels, both labeled "Cass."

What I found

The reason it shipped that way was friction. The relevant files were chattr +i'd (defense against an unrelated sync bug — see OneDrive was overwriting my backend), so I couldn't just edit categorize.py in place. The path of least resistance was a new module that mirrored the function signatures and replaced the function references at module load, not on disk.

# todo_agent.py — new file, not locked
import asyncio
from claude_agent_sdk import sdk_query
from main import SYSTEM_PROMPT

async def chat(item, history, user_msg, related=None):
    system = SYSTEM_PROMPT + "\n\n" + _scope_block(item, related)
    try:
        return await asyncio.wait_for(
            _run_agent(system, history, user_msg),
            timeout=90,
        )
    except asyncio.TimeoutError:
        # return whatever we've accumulated so far rather than nothing
        return _partial_reply()

async def _run_agent(system, history, user_msg):
    result = await sdk_query(
        prompt=user_msg,
        system_prompt=system,
        history=history,
        permission_mode="bypassPermissions",
        mcp_servers=["memory"],
        cwd="/opt/assistant/data/workspace",
    )
    return result.text

Then the monkey-patch, done right after route registration in the mutable main.py:

import categorize
import todo_agent
categorize.chat = todo_agent.chat
categorize.describe = todo_agent.describe

todo_routes.py is locked, but at call time it does categorize.chat(...) — which is attribute lookup on the module object, which is in memory and mutable. Every call lands on todo_agent.chat from then on without anyone having to edit a locked file.

The fix

After the swap, the per-todo chat does what the main chat does. Smoke test: a todo that said "Sonarr volume critically low" prompted "Do it." The per-todo agent SSH'd into the right container, ran df, pulled the real disk free number, fetched related memory entries, and replied with an actual plan. That's the experience I'd intended six months earlier.

A few caveats I deliberately left:

Streaming back to the per-todo UI isn't wired yet. The UI shows "replying…" until the POST returns. Anything close to the 90s cap will look hung even when it's working.
The per-todo chat doesn't persist a ClaudeSDKClient between turns; each turn rebuilds from history. Fine for short bursts, bad for long ones.
If I ever undo the immutable bit on the right files, the monkey-patch should go — replace categorize.chat directly, drop the indirection.

What I'd do differently

The monkey-patch works, but it's load-bearing on a future maintainer (me) remembering it exists. If the patch line gets deleted, the per-todo chat silently regresses to the old toy-model behavior with no error, just bad replies. That's a nasty failure mode.

The standing rule I should have followed in the first place: when the chat in one part of the app calls a function from a "core" module, and the core module gets locked for a separate reason, the chat call should go through a dependency-injection seam, not through a direct module reference. Then swapping behavior is a config change, not a monkey-patch.