Voice on watchOS without the Speech framework
I wanted to talk to my homelab assistant from the Apple Watch.
Action Button down, dictate, get a spoken reply back. Standard
project except for one detail: I have cerebral palsy, so the
dictation has to honor the system's "Listen for Atypical Speech"
setting, and SFSpeechRecognizer doesn't.
What was happening
My first instinct was to import Speech into the watchOS target,
spin up an SFSpeechRecognizer, and stream audio. That doesn't
work — the Speech module isn't available on watchOS at all. The
target won't even build.
Even if it did, the Atypical Speech model is a system-wide
accessibility setting tied to Siri's dictation, not something you
opt into by configuring SFSpeechRecognizer. The only way to get
the corrected transcription is to route through whatever surface
the OS uses for system dictation.
What I found
TextFieldLink in SwiftUI. It's a watchOS-only view that opens
the system text input panel — the same one you get from the
keyboard glyph in Messages — which honors every accessibility
setting the OS has. The user dictates into it and you get a
plain String back, already corrected.
The whole interaction model becomes:
- User taps a mic button (or, with one Shortcut hop, presses the Action Button).
- SwiftUI opens TextFieldLink.
- User dictates.
- App POSTs the text to a
/voice/chatendpoint on my server. - Server runs a fast model, returns a short reply.
- Watch displays the reply and speaks it via
AVSpeechSynthesizer.
No on-device STT code on my side. The accessibility behavior is the OS's responsibility, which is where it belongs.
The fix
@State private var dictated: String = ""
var body: some View {
TextFieldLink(prompt: Text("Ask")) {
Image(systemName: "mic.fill")
} onSubmit: { text in
dictated = text
Task { await send(text) }
}
}
Server side is a small endpoint that takes the dictated text and a
persisted conv_id (kept in UserDefaults so the conversation
survives across taps):
struct VoiceRequest: Encodable {
let text: String
let conv_id: String?
}
Bearer token bakes into Info.plist at build time, sourced from the server's env file via xcodegen so I can rotate it without editing Swift.
What I'd do differently
I burned an evening trying to make SFSpeechRecognizer work on
watchOS before I read the docs carefully enough to notice the
framework wasn't there. The next time I want a platform-native
input, I want to start by listing what's available on that
platform's SDK rather than assuming the iOS surface area carries
over. TextFieldLink turned out to be the better answer anyway —
the OS does all the accessibility work for free.