Amazon held an event to highlight new devices and a large language model (LLM) and generative AI tutorial broke out.
In a sign that products are becoming more about algorithms and generative AI than hardware design, the star of Amazon's Devices and Services event was an LLM that makes Alexa more conversational, can absorb real-time information, and reliably make the right API calls. The Amazon LLM model is also proactive and can use the company's Vision ID to know when you're about to talk and read body language. Amazon's Echo devices are a consumer use case with an enterprise LLM behind it.
Simply put, this new LLM should negate the need to repeatedly say Alexa. Interactions should also become less transactional. Dave Limp, outgoing chief of Amazon's devices unit, said the new proprietary LLM is built on 5 foundational capabilities.
- Conversation. Amazon was able to use data on what makes a conversation over the past 9 years. Conversations are built on words, body language, eye contact and gestures. That's why the model is built so Alexa can recognize cues on screened devices.
- Real-world applications. Alexa isn't just a chat box in a browser. As a result, it has to interact with APIs and make correct choices.
- Personalization. An LLM in the home must be personalized to you and your family.
- Personality. A more conversational Alexa will be able to have more opinions to go with jokes.
- Trust. Performance with privacy matters.
Limp's demo with Alexa and its new LLM illustrated some key upgrades. Alexa was able to stop and start a conversation and remember previous context. Amazon said the revamped Alexa will roll out early next year.
Rohit Prasad, senior vice president and head scientist of Amazon Artificial General Intelligence, said what makes Alexa's LLM unique is that it "it doesn't just tell you things but does things."
As a result, Amazon tuned the LLM for voice as well multiple points of context. Real-time connections to APIs will make Alexa better integrated into the smart home and natural. Key points about Alexa's new LLM upgrades:
- Alexa's automatic-speech recognition (ASR) system has been revamped with new machine learning models, algorithms and hardware and is moving to a large text-to-speech (LTTS) model that's trained on thousands of hours of audio data instead of the 100s of hours used previously.
- The ASR model is built on a multibillion parameter model trained on short goal-oriented speech as well as long-from conversations. Amazon said the large ASR model will move from CPUs to hardware accelerated processing and use frames of input speech based on 30-millisecond snapshots of the speech signal frequency spectrum.
- Alexa's new speech-to-speech model is LLM based and produces output directly from input speech. The move will enable Alexa to laugh and have conversational tools.
While the models stole the show, Amazon launched a series of features including automatic lighting, call translation and emergency assist services to go along with hardware including the Echo Show 8, Echo Hub and new Fire TV Sticks and Fire TVs with generative AI updates.