The best AI interface is voice-in, text-out.

2025-01-01 - Alex Gamble

I’m not a super fast typist, so communicating my question to ChatGPT feels slow. I really like the OpenAI Advanced Voice Mode, but having the model speak its response back to me feels slow too, particularly when the response includes code.

To me, the ideal interface would be a hybrid of these two: voice-in, text-out. I’ve built a prototype of this using the OpenAI Realtime API.

Using my Mac’s native transcription for inputs gets some of way there, but it’s not seamless:

I have to click the microphone icon in the ChatGPT interface.
I have to wait for the transcription to finish.
I have to submit the transcription to ChatGPT.
I have to wait for the model to respond.

Removing even these small amounts of friction makes the experience feel much more natural.

If you’d like to try out the protoype, I’ve open sourced the code on GitHub.