Alex Gamble

The best AI interface is voice-in, text-out.

2025-01-01 - Alex Gamble

I’m not a super fast typist, so communicating my question to ChatGPT feels slow. I really like the OpenAI Advanced Voice Mode, but having the model speak its response back to me feels slow too, particularly when the response includes code.

To me, the ideal interface would be a hybrid of these two: voice-in, text-out. I’ve built a prototype of this using the OpenAI Realtime API.

Using my Mac’s native transcription for inputs gets some of way there, but it’s not seamless:

Removing even these small amounts of friction makes the experience feel much more natural.

If you’d like to try out the protoype, I’ve open sourced the code on GitHub.

Follow me on Twitter or GitHub.