close
close

OpenAI introduces new tool that will simplify AI voice assistant development

OpenAI introduces new tool that will simplify AI voice assistant development

With the new tool, developers can now manage both input (what the user says) and output (how the application responds) in a single API call. Illustration: Technology and Startup Desk

“>

With the new tool, developers can now manage both input (what the user says) and output (how the app responds) in a single API call. Illustration: Technology and Startup Desk

OpenAI recently introduced the ‘Realtime API’ in public beta, a tool designed to help developers create low-latency, voice-interactive applications.

According to OpenAI, this new API enables natural speech-to-speech conversations by integrating multiple processes such as speech recognition and text-to-speech in a single step. This is expected to simplify the development of applications that include real-time voice interactions.

What does it do?

Previously, developers who wanted to create artificial intelligence voice assistants had to use multiple steps for the system to work. First, they had to convert the audio into text using a speech recognition tool, then feed that text into an AI model to generate a response, and finally convert the response back into speech. This method can introduce delays and make conversations sound less natural.

Realtime API simplifies this process by combining everything in one step. Now developers can process both input (what the user says) and output (how the application responds) in a single API call. This means that conversations will flow more smoothly with less wait time, and responses will be more human, maintaining emotions and tone, as noted on OpenAI’s official blog.

How does it work?

The Realtime API connects applications to OpenAI’s GPT-4o model. The API uses a WebSocket connection that allows messages to be exchanged between the application and the AI ​​in real time. OpenAI says this new system is faster and more fluid than previous methods, which can sometimes feel robotic or laggy.

OpenAI is currently testing the Realtime API with select partners. For example, Speak, a language learning app, uses it to power role-playing conversations where users can practice speaking in a foreign language. Another app, Healthify, uses the API to allow users to have natural conversations with Ria, an AI coach who helps with nutrition and fitness advice.

When will it be published

Realtime API is now available in public beta for all paid developers. It works using tokens, priced depending on whether the input is text or voice. The input fee for audio is $0.06 per minute and the output fee is $0.24 per minute.

In addition to the Realtime API, OpenAI will soon release voice features in the Chat Completion API that will allow developers to input and output voice or text, albeit at slower speeds than real-time conversations.