Microsoft

Speech Service

  1. Améliorer la lecture des nombres pour la Suisse

    Bonjour,
    La voix fr-CH, French (Switzerland), Male, "fr-CH-Guillaume" ne lit pas les nombres comme nous le faisons en Suisse, elle le fait comme en France.
    En effet, 70 doit se dire "septante" et non "soixante-" et 90 doit se dire "nonante" et non "quatre-vingt-". En outre selon les régions 80 se dit "huitante". Ce qui en rapport à la majorité des autres langues devrait être la norme.
    Avec mes meilleures salutations,

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  2. does TTS support Speech SDK when using containers?

    Hi, can we use Speech SDK to access TTS service in container?
    Why if using container, STT only supports SDK, and TTS only supports REST API?

    By tests, it seems REST API is slower than SDK, why? Thanks.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  3. Mbps is read as Megabytes per second instead of Megabits per second

    MBps and Mbps are very different things. MBps is 8 times larger than Mbps, so Aria needs to know the difference between the two.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  4. This chapter has many issues with Aria Neural TTS

    This textbook (Computer Security Handbook by Seymour Bosworth et al., Chapter 33) seems to cause countless errors using the Aria Neural voice. (attached txt and pdf were trimmed to respect the copyright of the author)

    It messes up the chapter markers, saying "January First, Thirty Three" when it says "33.1.1" (as well as all the other section markers)

    802.11 is pronounced "eight hundred and two point one one"

    SSIDs is pronounced "sids"

    BSSIDs is pronounced "bsids"

    2Mb/s (as well as other Mb/s numbers) is pronounced "two em bee slash ess" which should be pronounced 2 Megabits per second.

    LAN is…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  5. 802.11 is mispronounced

    802.11 as a wireless standard should be pronounced "eight-o-two-eleven" instead of "eight hundred and two point one one"

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  6. Japanese sentence is mispronounced

    このレストランではタバコを吸ってはいけません。is mispronounced. The はいけません part should be pronounced waikemasen and not haikemasen. This is with the Nanami Neural voice.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  7. No SSML restrictions on creating TTS audio tuning files

    No SSML restrictions to one SSML elements in a SSML file, if you want to realise multiple tunings in the Audio Content Creation, like breaks, pronunciation, intonation etc.
    (source:
    Improve synthesis with the Audio Content Creation tool
    > Create an audio tuning file:
    "SSML restrictions Each SSML file can only contain a single piece of SSML."
    (https://docs.microsoft.com/en-US/azure/cognitive-services/speech-service/how-to-audio-content-creation)

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  8. BUG - The TTS engine (in Engish) doesn't pronounce well numbers after words. COVID-19, NASDAQ 100, S&P 500, NIKKEI 225

    This is a bug not a feature. covid-19 sounds like covid 19. NASDAQ 100
    S&P500
    etc.
    For the neural voices

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  9. Move Irish text to phonetic ipa conversion upstream.

    Irish (Gaeilge) tts works if correct IPA syntax is used when sending the request for tts with the IPA option included.

                                Example... 
    

    Ba chuid mhór den togra seo teacht ar ábhar a bheadh chomh maith nó níos fearr ná an fhuinseog agus an t-ábhar a bheith níos inmharthana.

                         translated to IPA..
    

    "bˠɑː xɪdʲ woːˈr dʲəɴʲ tʲɔɡˈrə ʃoː tʃæxt eˈr ɑːwəˈr ɑː vʲəh xɔv mˠɑːh ɴˠoː ɴʲiːsˠ fʲæˈrr ɴˠɑː en ɪɴʲʃoːɡ əɡʊsˠ en tʲɑːwəˈr ɑː vʲeɪh ɴʲiːsˠ ɪɴʲwəˈrhɑːɴˠɑː"

    The IPA (irish) text can be read and spoken accurately by Neural EN-GB or EN-US voice.

    I have attached a file that converts…

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  10. en-GB Neural voice Mia pronounces number 4 too quickly

    En-GB mia neural voice pronounces the number 4 too quickly in sentences. Whether it is the digit or the word, same problem. I’m able to workaround this by adjusting the prosody speed of just that number :)

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  11. School

    Teacher

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  12. Possibility to talk English, but in the different foreign accents.

    Like a German/French/Spanish/Italian person speaking English, all have their own accent. Perfect for applications like Air Traffic Control etc.

    6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  13. Pronounced Unites Incorrectly. (E.g mili Watt as mW)

    Some of the unites are not pronounces correctly.
    In my case i am using "9 Mega Watt" as (9MW) and its speaking correctly, but for 9 mili Watt (mW), its speaking wrong and saying Mega Watt. Can you update it, or provide separate access for customization acronyms.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  14. ARM32 Support for Microsoft.CognitiveServices.Speech API (on Raspbian)

    I'd like to see the speech API coming to Raspberry PI, meaning having FULL ARM32 support for the linux implementation of this SDK. This would enable the raspi maker community to use Azure Speech seamlessly in their devices.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  15. GovCloud - Cognitive Services Endpoints in overview

    It would be beneficial for GovCloud users to have secure access a list of available endpoints for the service listed under the overview instead of just the token issuing endpoint forcing one to comb through out of date documentation trying to find the secured GovCloud endpoints.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  16. Azure TTS Cognitive Service Voice Limit Issue

    I am very new to learn cognitive services of Text-to-Speech (TTS) of Microsoft Azure. I successfully able to convert the given text into an audio file by using TTS services of Azure. It works fine when I'm having a single voice element in my SSML XML document. The example of working SSML is;

    <speak version="1.0" xml:lang="en-US">
    
    <voice xml:lang="en-US" xml:gender="Male" name="en-US-Jessa24kRUS">
    Hello, this is my sample text to convert into audio?
    </voice>
    </speak>

    But, when I'm having multiple voice tags(on gender base), then it causes an error. The SSML of it is:

    <speak version="1.0" xml:lang="en-US">
    
    <voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> What’s your
    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  17. Voice has changed

    Something changed with the zh-CN-XiaoxiaoNeural voice - the 'sentiment' expression no longer works, as of a few days ago. Using exactly the same code, but the output is quite different (no longer sounds sentimental). Was an update posted? If so, where do we find a list of changes?

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  18. Neural TTS in french

    Hi,

    The new neural text 2 speech feature looks amazing, but one language is missing : French :)

    I don't know if it something on the list, or if it is coming soon, but I'm waiting for this feature to switch from Google Wavenet.
    I'm pretty sure that the french voice generated by this new neural TTS by MS Cognitive service will be a game changer.

    The french language is complex, the emphasis, the punctation etc... but if MS can provide the same awesome quality as they have done in english... we will be able to build something incredible...

    5 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  19. Stop TTS synthesizer

    Once the synthesizer starts synthesizing audio, it can't be stopped. It would be nice to have some method that would stop/interrupt currently running audio synthesis.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  20. Mark Labels for TTS Speech API

    Azure Speech API should offer json mark labels for Text to Speech audio. This allows developers to use the audio file and the json mark labels to create audio tracking text in the app. The competitor has a similar solution. I found Azure TTS to be superior but am forced to use the competitor's solution due to lack of json mark labels. Speech mark labels should be in json format and available for all languages. It should provide information such as the begin and end timestamp of each sound to the text, phrase and sentences.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Feedback and Knowledge Base