• Sesame introduces its base AI model, powering the viral virtual assistant Maya with enhanced intelligence and adaptability.
  • The model advances personalization, making AI interactions more intuitive and user-centric.
  • Analysts predict this development will transform AI-driven assistance, sparking industry-wide innovation.

U.S.-based AI voice assistant developer Sesame has recently unveiled its base AI model, the technological core powering its widely acclaimed virtual assistant, Maya.

The model, named CSM-1B, produces "residual vector quantization (RVQ) audio codes" from both text and audio inputs, as described by Sesame on the AI development platform Hugging Face.

RVQ is a technique for encoding audio into discrete tokens called codes. RVQ is used in a number of recent AI audio technologies, including Google’s SoundStream and Meta’s Encodec.

“The model open-sourced here is a base generation model,” Sesame writes in CSM-1B’s Hugging Face and GitHub repositories.
“It is capable of producing a variety of voices, but it has not been fine-tuned on any specific voice […] The model has some capacity for non-English languages due to data contamination in the training data, but it likely won’t do well.”

By pushing the boundaries of AI-driven communication, the model is designed to offer a more personalized, context-aware and natural experience.

The model’s key advancements include deep personalization, dynamic contextual understanding, and real-time learning capabilities.

Unlike traditional AI assistants, Sesame’s technology enables Maya to continuously refine its responses based on user preferences and situational context, delivering a more intuitive experience.

Sesame, co-founded by Oculus co-creator Brendan Iribe, gained viral attention in late February for its assistant technology, which nearly bridges the uncanny valley.

Its assistants, Maya and Miles, incorporate natural pauses, speech disfluencies, and can be interrupted mid-conversation, similar to OpenAI’s Voice Mode.

Leveraging sophisticated machine-learning algorithms, the model improves with every user interaction, ensuring a highly adaptive AI experience.


Edited by Harshajit Sarmah