AI Agent Mode’s

At Salpre AI, we provide two distinct types of AI Call Agents for Salpre Box users: Classic Mode and Voice2Voice Mode. Each mode is designed to meet different needs, giving you flexibility in how your AI interacts with callers.


1. Classic Mode

Classic Mode is our standard and most reliable model, built on the widely adopted STT → LLM → TTS pipeline.

How It Works

  1. Voice Input – The caller speaks, and the Salpre Box captures the audio.
  2. Speech to Text (STT) – The audio is sent to a speech recognition provider (e.g., Google, Deepgram) to convert the voice into text.
  3. Language Understanding (LLM) – The text is passed to a Large Language Model (like ChatGPT) that generates a natural, intelligent response.
  4. Text to Speech (TTS) – The response is sent to a TTS provider (e.g., ElevenLabs) to convert it back into realistic human-sounding speech.
  5. Response Delivery – The AI’s spoken response is sent directly back to the caller’s phone.

Advantages

  • High Accuracy – Uses best-in-class STT and TTS providers.
  • Intelligent Responses – Powered by advanced LLMs for nuanced and context-aware replies.
  • Language Flexibility – Supports a wide range of languages, depending on your chosen providers.
  • Customizable – You can select preferred STT/TTS vendors and configure the AI’s tone and style.

Ideal For

  • Businesses needing highly accurate and natural conversations.
  • Customer support and sales where context and reasoning matter most.
  • Multilingual teams who require broad language support.

2. Voice2Voice Mode

Voice2Voice Mode is our next-generation option, leveraging the latest direct voice-to-voice AI models such as OpenAI Voice Agents and ElevenLabs Voice Models.

How It Works

Unlike Classic Mode, Voice2Voice skips the text conversion steps. Instead, the AI processes and responds directly in voice, making it faster and more natural for real-time calls.

Advantages

  • Ultra-Low Latency – Responses are generated almost instantly.
  • Natural Voice Flow – Feels closer to speaking with a human, thanks to direct audio processing.
  • Future-Proof – We believe Voice2Voice is the future of real-time AI conversations.

Current Limitations

  • Lower Intelligence – Reasoning ability is more limited compared to LLM-driven Classic Mode.
  • Language Restrictions – Currently supports fewer languages than Classic Mode.
  • Evolving Technology – Still under rapid development; capabilities improve with each release.

Ideal For

  • Businesses prioritizing speed and real-time flow over deep reasoning.
  • Use cases like quick greetings, confirmations, or simple FAQ handling.
  • Teams interested in adopting cutting-edge AI voice technologies early.

Choosing the Right Mode

FeatureClassic ModeVoice2Voice Mode
LatencySlight delay (due to STT/LLM/TTS pipeline)Near-instant
IntelligenceAdvanced, context-aware, customizableBasic to medium
Language SupportWide (depending on STT/TTS providers)Limited (expanding)
Voice QualityNatural, customizable voicesVery natural, human-like
Best ForCustomer support, sales, multilingual useQuick interactions, cutting-edge experiments

Our Commitment

At Salpre AI, we are continuously improving both agent types. We regularly integrate new STT, TTS, and Voice2Voice models into Salpre Box, ensuring that our customers always have access to the most advanced voice AI technologies available.

We firmly believe that while Classic Mode offers mature reliability today, Voice2Voice will shape the future of AI-powered calls.

Next