Conversational Audio AIs

Selvan
1 day ago

--

Conversational Audio AI

Gemini’s multi-model Live and elevan labs conversational AI both can take audio as input and generate audio as response.

Both of these offerings work on Websocket and comes with out-of-the-box VAD and handles interruptions automatically.

They are good for turned based conversation (chat). These type of models buffer audio signals till they detect pause in the incoming audio signal. Pause in the incoming audio is treated as user done talking in his turn, then do STT -> LLM -> TTS & Handle user interruptions and the loop continues.

Sending and receiving audio signals, VAD, interruptions, transcriptions all are just JSON data events over the websocket, either sent or received.

#elevanlabs #gemini2.0 #audio #llm

--

--

Selvan
Selvan

Written by Selvan

Excite about creating unique user experiences

No responses yet