What's the price of text-to-speech voice synthesis?Ĭompared to a human reader, it's of course very cheap. ![]() This also requires some time and fine tuning. You can also create your own custom voice, based on a voice talent's recordings, to develop a unique rendering, powered by machine-learning. However, doing this requires time-consuming manual inputs. On all services, using the API or the console, you can add SSML tags to your texts to insert pauses and other pronunciation instructions, which can in turn improve the expressivity of the performance. Can you control the expressivity of synthetic voices? I also used AudioHijack to capture the MP3 recording. Guy has a second voice option (Newscast) but it didn't really fit the use case. A glaring issue is that he doesn't understand that he's reading a poem, a feat which requires much more emotion than reciting a training manual. Guy isn't an accomplished actor but in my opinion he performs marginally better than his silicon colleagues. I've pasted each poem section as a single line in the Azure console, to avoid long pauses between each line. The most advanced voices are also named "neural" at Microsoft. You can test Microsoft's text-to-speech offering at ![]() Microsoft Azure Cognitive Services Text-To-Speech Note: you can download the MP3 rendition straight form the Amazon Polly public console. Let's see how Microsoft compares to its competitors. In terms of performance, I would say Matthew is slightly more engaged than his Google counterpart but he still lacks the emotion of a human actor. I prefer this voice to the one tested at Google but that's subjective (the tone of voice is different). I've hired Matthew to perform Thomas Hardy's poem "She Opened The Door". State-of-the-art AI synthetic voices are called Neural at Amazon. Let's see how Amazon Polly performs given the same poetic prompt. I've used AudioHijack to record the output of my browser. The text is properly read, no obvious mistakes but you'll have noticed that it lacks emotion.īear in mind that it's impossible to download the MP3 rendering from the public test page. Here's the test rendered by WaveNet Voice D. The most advanced synthetic voices at Google are named WaveNet voices, powered by machine learning algorithms. You can test Google's text-to-speech offering on As a bonus, we'll conclude the review with a human recording. The text prompt will be a poem by Thomas Hardy: "She Opened The Door". There will obviously be differences due to the tone of voice but I've tried to pick the best example for each provider. The purpose of this article is to give you my honest opinion about the way they render human voice based on the same text prompt.įor the sake of this quick experiment, I will use a male voice for all three services. Since I'm passionate about the possibilities of AI-assisted creative automation, I tested the three leading text-to-speech engines: Amazon Polly, Microsoft Azure Cognitive Services and Google Cloud. What does a synthetic voice sound like today? The original 1980s sound had become part of his public persona. You can listen to Stephen Hawking's final public lecture (A Brief History Of Time) in this video:ĭespite the advances in text-to-speech synthesis, Stephen Hawking refused to upgrade his voice. That computer-generated voice, created by MIT engineer Dennis Klatt, based on Klatt's own voice, used algorithms developed by Swiftkey, a British company later acquired by Microsoft. Text-to-speech has come a long way since the robotic voice developed for Stephen Hawking in 1986 a peculiar voice which he kept until his death in March 2018, aged 76.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |