Spotify patents speech synthesis technology

According to document submitted to the United States Patent and Trademark Office (USPTO), the artificial intelligence technology uses a two-model system that first converts text into an audio representation and then adds speech attributes such as emotion, intent, accent and projection.

ADVERTISING

The technology has the potential to be used in a variety of applications, including audiobooks, podcasts and even games. However, it is also important to be aware of the potential risks of this technology.

Spotify's patent is an important step in the development of speech synthesis technology. With the development of technologies like this, it is possible for the human voice to be generated in an increasingly realistic and indistinguishable way from the real human voice. This can boost an entire market and spark many discussions about the issues involved.

Spotify's technology also has the potential to be used in more controversial applications, such as creating deep fakes. Deep fakes are videos or audio that have been manipulated to make someone appear to say or do something they never said or did. This technology can be used to spread misinformation or defame people.

ADVERTISING

The tool was created by a team of scientists and engineers from the company. The system works by feeding text into a synthesizer built with an AI prediction network configured to convert text into speech data. This speech data is then fed to a neural network-based Vocoder, or other synthesizer built specifically for vocal data, which adds speech attributes conveyed in the initial text, such as emotion, intent, projection, rhythm, and accent, when creating the said speech.

Spotify patents speech synthesis technology | Spotify project schematic (image taken from the document sent by Spotify)

Spotify's technology is still in development, but the company has plans to use it in its products and services. For example, the technology could be used to create personalized audiobooks for each user, or to generate podcasts that are more engaging and interesting for listeners.

See also:

What is a 'Speech to Text' AI? | Newsverso Glossary

Design-sem-nome-2023-06-22T172828.498-aspect-ratio-930-440

“Speech to Text” artificial intelligence is a technology that allows the automatic and accurate conversion of human speech into written text.

Web3 Sound Raises $20 Million to Boost Decentralized Music Industry, Snoop Dogg Among Investors

Music platform Web3 Sound has raised an impressive $20 million in funding. The initiative was led by renowned venture capital firm Andreessen Horowitz (a16z) and American rapper Snoop Dogg.

* The text of this article was partially generated by artificial intelligence tools, state-of-the-art language models that assist in the preparation, review, translation and summarization of texts. Text entries were created by the Curto News and responses from AI tools were used to improve the final content.
It is important to highlight that AI tools are just tools, and the final responsibility for the published content lies with the Curto News. By using these tools responsibly and ethically, our objective is to expand communication possibilities and democratize access to quality information. 🤖

ADVERTISING

Spotify patents speech synthesis technology

About the Author

Uesley Durães

About the Author

Uesley Durães

most read

Posts