The Korean startup Nari Labs launched o Day, an open-source text-to-speech model that claims to surpass the capabilities of leading commercial offerings like ElevenLabs and Sesame â developed by two technology university students with zero funding.
ADVERTISING
Day Details
- The 1,6 billion parameter model supports advanced features like emotional tones, multiple speaker tags, and nonverbal cues like laughter, coughs, and screams.
- The work was inspired by NotebookLM from Google, with Nari also utilizing the TPU Research Cloud program Google for computational access.
- Side-by-side testing shows Dia outperforming ElevenLabs Studio and Sesame CSM-1B in terms of synchrony, expressiveness, and handling of nonverbal scripts.
- Nari Labs founder Toby Kim said the startup plans to develop a consumer app focused on creating and remixing social content based on the template.
Why is it important
The Day is a living testament to the tweet of Sam Altman âYou can just do things,â with two inexperienced college students training an open-source model that rivals the leading voice technology on the market. Thereâs never been a better time to try building something, with AI unlocking new access to learning like never before.
Read also