With zero funding, students create SOTA speech AI that challenges market leaders

With zero funding, students create speech AI that challenges market leaders

The Korean startup Nari Labs launched o Day, an open-source text-to-speech model that claims to surpass the capabilities of leading commercial offerings like ElevenLabs and Sesame – developed by two technology university students with zero funding.

ADVERTISING

Day Details
  • The 1,6 billion parameter model supports advanced features like emotional tones, multiple speaker tags, and nonverbal cues like laughter, coughs, and screams.
  • The work was inspired by NotebookLM from Google, with Nari also utilizing the TPU Research Cloud program Google for computational access.
  • Side-by-side testing shows Dia outperforming ElevenLabs Studio and Sesame CSM-1B in terms of synchrony, expressiveness, and handling of nonverbal scripts.
  • Nari Labs founder Toby Kim said the startup plans to develop a consumer app focused on creating and remixing social content based on the template.
Why is it important

The Day is a living testament to the tweet of Sam Altman “You can just do things,” with two inexperienced college students training an open-source model that rivals the leading voice technology on the market. There’s never been a better time to try building something, with AI unlocking new access to learning like never before.

Read also

Scroll up