Time To Get Loud With Voice-Driven Experience: How Leaders and Innovators Are Advancing Voice With Amazon Nova 2 Sonic
Let’s Talk: Responding to the Steady Call for Conversation
Talking beats typing. At least that’s what today’s digital behavior patterns reveal. People prefer talking to using devices, from their phone to their refrigerator, no matter how easy applications make typing out thoughts. Contributing to this behavior is an understanding of how the human brain works when ingesting, analyzing, and synthesizing thought. Study after study shows that humans think faster while talking than they can think while typing. This has special impact in the world of experience delivery, where people in search of solutions to problems want thoughtful conversations, not text boxes.
For the most part, this demand signal has been answered with one-sided, bot-driven conversations. Thankfully, those days are changing—quickly. With the advent of large language models (LLMs) and the age of GenAI, conversation and dialogue can be ingested and understood at deep and meaningful levels, generating voice-driven responses in the real time of a natural conversation. Responses are generated in seconds, empowering a true exchange of ideas. The very voices generated include an expanding landscape of languages that sound more authentic than ever and are capable of showing emotion, empathy, formality, and humor. Voice is an invitation for collaboration, and, increasingly, voice AI is building a trusted relationship.
One of the fastest innovators in next-generation voice experience is Amazon’s Nova 2 Sonic model, a speech-to-speech foundation model that brings speech and language understanding together with voice generation capabilities. Introduced in April 2025, with the subsequent release of Nova 2 Sonic in December 2025, the speech-to-speech model generates humanlike voice conversations that can be applied to an increasingly broad set of AI-powered applications. In a short period of time, Nova 2 Sonic has proven that although voice is key, speed is the difference.
Early voice models required multiple sequential steps or cascading pipelines to deliver conversational experiences: one step to ingest speech, another to transcribe, and yet another to analyze the transcription. The path to conversation might include multiple applications, models, and steps just to document sentiment, tone, or intent. Nova 2 Sonic, by comparison, unifies these processes into a single model, compressing the processing of speech recognition and voice generation directly, trained to comprehend and respond in kind to prosody and paralinguistic features such as pauses, interruptions, emotion, tone, style, and requests. It is available via the Amazon Bedrock platform and enables developers to build interactive and responsive voice interfaces that are multilingual and capable of function calling and tool use.
As real-time, contextual bidirectional voice conversations become central to the modern experience, use cases have multiplied quickly and far beyond the early applications of customer support and service. Outbound marketing campaigns can include personalized proactive voice outreach. Language learning platforms can provide conversational practice with nativelike pronunciation and real-time feedback. Employees can ask about health benefit updates or corporate policies and procedures in the middle of the night anywhere around the world. Field service technicians can call in and speak with their own contextually relevant personalized assistant, at the ready to synthesize and summarize job completion details on the go. Now, when it comes to conversations, voice is possible and readily available.
Amazon Nova 2 Sonic has earned a place on many developers’ foundation model short lists, thanks to its capacity to deliver low-latency, cost-efficient, and high-performing results in a single-model architecture. Customers have told Constellation Research that one of the most distinct differentiators is that the model is designed for complex conversations yet is available without the burden of overwhelming cost. This has led to the creation of fluid, bidirectional conversations that don’t just sound human but also interact and understand when humans interrupt, cross-talk, or address multiple threads of need.
While innovative applications continue to emerge, an investigation into three customers has revealed key strategies and best practices that are worth noting during the discovery phase developers may go through when considering introducing voice AI to their applications.
Taking the Lead With Nova 2 Sonic: 3 Key Best Practices
1. Seek Out Speed With Model Compression and Agility
Crescendo is an award-winning AI-native, people-first CX solution deployed primarily in contact centers. The goal of its team of AI experts with years of hands-on expertise with LLMs, automation, and infrastructure design was not to merely deploy voice AI capabilities but, rather, to rethink the efficiency, accuracy, and speed of a new construct for voice-first experiences.
Crescendo’s approach focuses on differentiation from competitor applications by intentionally rejecting old architectures or intent-based chatbot models that have proven to be too rigid and unintentionally fragile. When tasked with complex workflows or dynamic conversations, these old workflows can break under the pressure of a customer’s natural pace and flow of conversation.
“We are in a very heated market right now,” notes Tod Famous, chief product officer at Crescendo. “Delivering engagements using the cheapest model might win for some. But at Crescendo, cheap hasn’t been our interest.” Instead, speed, with precision, is the goal.
The accuracy of Nova Sonic was just the starting point: The agility of the model became an opportunity to differentiate and exceed expectations in an already hype-weary market. “Speed is what delivers the great experience,” says Famous. “People who are in a conversation just start a banter. When you call someone, it is human nature to get into the vibe of a conversation. When a call is slow or feels “robotic,” you do not want to be there. Satisfaction plummets.”
For the people-first strategy of Crescendo, the legacy drag of traditional models and architectures that required as many as three or four vendor solutions also brought unwanted unpredictability across the conversational value chain. “There is a noticeable delay, a latency when building a CX voice assistant,” says Famous. “Until now, developers have had to find ways to compress that latency or just accept that slow was the standard. Nova Sonic delivers the type of conversational cadence and reactiveness that you’d expect from a person. It’s noticeably better.”
2. Trusted Collaboration Isn’t a Request; It’s a Mandate
EBSCO Information Services is a leading provider of research databases and a major provider of library technologies. It is one of the largest providers of ebooks and clinical decision solutions for universities, colleges, K–12 schools, hospitals, corporations, governments, and public libraries worldwide. Research, subscription management, trust, reliability, and unwavering support of people sit at the heart of EBSCO’s mission to be a source of trusted research content. A team dedicated to advancing content innovation quickly identified an opportunity to integrate AI into the EBSCO experience. Working with Amazon Web Services (AWS) led to the deployment of a trusted conversational research assistant, rather than a chatbot, that serves as a collaborator.
According to Maria Riuoux Nagy, director of Strategic Proprietary Content Innovation at EBCSO, content creation comes with a deep and binding responsibility when you are the builder. For EBSCO, that responsibility is not negotiable. As a leading provider and curator of content, EBSCO could not allow any gray area in the validity, accuracy, and authenticity of AI-generated responses. “When you’re the authority on content, the expert in knowing, you are in the driver’s seat for AI. You have to validate AI’s output. That is what builds trust,” says Nagy.
The challenge presented to the Nova Sonic model: Deliver conversations that make users feel that they are collaborating with AI, not just asking AI a question. “Almost immediately, Nova Sonic handled dialogue differently,” says Nagy. “The first time I used it, I didn’t know what I wanted to say and I was hesitating. Nova Sonic just learned. It understood the pauses humans make and the interruptions of life like the dog barking or when Prime is being delivered. Instead of disconnecting or ignoring those moments, Nova Sonic kept the conversation’s context and content intact.”
EBSCO’s assistant is now differentiating the content experience via trusted conversational explorations with a collaborator, partner, and navigator. “This makes the difference between barking a command or typing a prompt and feeling that through a natural conversation, a customer is collaborating with the AI assistant,” Nagy says.
3. Don’t Sacrifice Accuracy for Cost Efficiency: Demand Both
In the world of healthcare, conversations present a major hurdle and roadblock: time. Healthcare providers, especially the nurses and doctors directly engaging with patients, do not have the time to take detailed summaries of discussions while also remaining present and focused on the patient in front of them. Voice transcriptions can work but are often time-consuming, inaccurate, and cost-prohibitive. Challenges such as stress, exhaustion, and burnout in medicine are real.
This is why Blake Anderson, MD, data scientist, CEO/CTO, and cofounder of Switchboard, MD, started to look at using the Nova Sonic model as a means to deliver a highly accurate, secure conversational interface to address specific health provider needs. Switchboard, MD is an agentic-AI-powered platform specifically designed to manage high volumes of patient communications in a secure environment. Core to Switchboard, MD’s vision is the belief that the industry needs to restore a “human connection” in medicine.
Anderson was no stranger to leveraging customer voice, handling real-time call flows with speech-to-text transcription tools. However, as call volume and spend in transcriptions increased, the question of cost versus quality emerged. He considered other open source models that proved to be more cost-effective but determined that accuracy out of the box failed the critical litmus test of trust. “Cheap but wrong is useless,” he says.
The goal was for real-time transcription and audio ingest flow, but those initial needs quickly evolved into a more focused speech-to-speech AI capability to “level up” the offering. Anderson soon discovered that Nova Sonic’s transcriptions were as good as voice conversations, delivering an immediate transcription of the audio coming in and what is generated to go out. The transcriptions were highly accurate, and the speed of delivery could drive performance when that data was used to further train and enhance actions.
Speech-to-speech interactions became a foundation for a broader intelligent voice system. Nova Sonic became the core of a speech-driven virtual assistant within the office environment, automating specific repetitive tasks and freeing office team time. But it also delivered a trigger for intelligent workflows that drive a more personalized and empathetic patient experience. Switchboard, MD was able to develop a solution that would not trade cost efficiency for speed or sacrifice accuracy for experience.
Conclusion
The common theme across all three organizations interviewed is that not just any conversation would satisfy a customer’s next-generation expectation. The baseline of expectation was to have human-realistic, natural, flowing conversations without sacrificing speed, performance, accuracy, or cost. Each organization made subtle shifts in interface, collection, data understanding, and process to deliver unique and differentiated outcomes.
For organizations looking to follow the lead of these innovators, three key questions emerge to better confirm whether driving speech-to-speech AI experiences forward is right for the brand.
1. Do voice conversations play a role in shaping and advancing relationships?
Conversations are central to establishing and growing relationships. Trust is also critical to establishing durability in those relationships. Can you afford to lose trust by delivering an inaccurate response or a slow and irrelevant conversation? Consider where and how trust can be built or broken. The three customers interviewed each noted that model trust was tested at every stage, from implementation to ongoing utilization, and at every turn, Nova Sonic delivered. From the capacity to test, retest, and validate output to accelerating exploration and experimentation, the Nova Sonic model enabled them to meet customers’ expectations, especially when developers and customers expected more precision.
2. What will serve your enterprise: speed or velocity?
There is a subtle yet critical difference when building for speed versus planning for velocity. Speed simply requires moving faster. Velocity requires direction and purpose. As these customers noted, speed was often a requirement but what differentiated their experience with the Nova Sonic model was the combination of speed and performance, with the quality of output accelerating the impact of their business outcomes.
3. Does the business have a vision of what AI conversations can deliver and the culture to explore beyond the initial use case?
Each Nova Sonic customer Constellation Research spoke with had at least two more concepts of new or more innovative applications where speech and voice could transform the mundane into the exceptional. Although obvious use cases such as virtual assistants, contact center virtual agents, and research assistants became quick starting points and early wins for AI, the reality of speech-to-speech foundation models is that use cases may be bound only by builder creativity. The more customers and users lean into their instinct to talk more than type, the more new voice-driven interfaces will emerge, giving way to an opportunity to shift from a more passive posture of listening to voices to delivering an engagement with AI talking back instead.
What stands out most in these customer conversations is not the technical advances in the Nova Sonic model offering and how this technology can be integrated and leveraged. Instead, it is the creativity and innovation that customers have unlocked by daring to question if, where, and how a person might be ready to have a conversation. In a time when business can be won or lost depending on whether the experience is positive or negative, leading with voice will be a differentiator—but not for long. Soon conversations that feel more like collaborations between a business and its customers, employees, partners, and markets will be an expectation. Organizations that are ready to embrace the change won’t need to break the bank or compromise on experience expectations if they carefully and thoughtfully embrace these new model innovations.