Voice AI Settings

In this section, we will explore the configuration of various settings for Voice AI agents, including ASR (Automatic Speech Recognition), TTS (Text-to-Speech), and other voice-related parameters. You will gain insights into configuring these components effectively.

Connect Phone Numbers

If no numbers are configured to your AI agent, you can click on the “Click here” option shown below.

Connect phone numbers

Once you click on it, you will be redirected to the settings to integrate the phone number to your BOT. To know more on configuring your phone numbers you can refer to the Channels section.
If a number is already connected, the number details will be displayed below in this section.

Connect phone numbers

STT and TTS Settings

Before we proceed with configuring STT (Speech-to-Text) and TTS (Text-to-Speech), it is important to first understand what these terms mean.

STT (Speech-to-Text) or ASR (Automatic speech recognition is the process of converting spoken input into text, allowing the AI agent to interpret and process it.
TTS (Text-to-Speech) refers to the conversion of text-based output generated by the AI agent into spoken audio, enabling the system to communicate verbally.

Let's first learn the configurations for ASR. There are a lot of ASR options available with different providers like Google, Microsoft etc. Also, we will learn how to configure VAD timeout and Speech timeout for ASR.

By Clicking on the dropdown, you can select any of the provided ASR provider options from the list. And select the respective language required for your AI Agent.

Now let us understand a few ASR settings which are important for your AI Voice agent.
BOT Interruption: When you want your AI agent to be interrupted while speaking then this should be turned on.
Sensitivity level: When Interruption is enabled, the tendency at which an interruption should be detected is set using this parameter.
Speech Timeout: In the absence of any communication or voice activity between the user and the AI agent, the system will trigger a timeout based on the predefined duration. Once this time is reached, the system will prompt the user to confirm their availability on the call.
VAD Timeout: This refers to the duration for which the ASR (Automatic Speech Recognition) system will wait to detect if the user continues speaking after a brief pause. When a user pauses momentarily during speech, the ASR will remain active for this specified period before considering the input complete. Typically, the VAD timeout is set between 1000 ms and 1500 ms.

Now let us understand about TTS Settings and using Recorded audio. You can use multiple TTS within one AI Agent. If you want to use prerecorded files, just enable the toggle and you will be able to use your prerecorded audio. Please refer to the video below to see how to configure TTS for your AI Agent.

Call Disposition Flow

In certain cases, it is necessary to trigger an API call or perform an action immediately after the call is disconnected. To execute any post-call activities, these tasks should be configured within the designated flow. You can select the appropriate flow from the dropdown menu.

call disposition flow

Default Replies

When the speech timeout is exceeded, the AI agent begins to deliver predefined messages. After the first speech timeout, the default message is spoken. If no voice activity is detected, the second message is triggered. Finally, the agent delivers the last message before terminating the call.

default replies