Microsoft

Speech Service

  1. does TTS support Speech SDK when using containers?

    Hi, can we use Speech SDK to access TTS service in container?
    Why if using container, STT only supports SDK, and TTS only supports REST API?

    By tests, it seems REST API is slower than SDK, why? Thanks.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  2. Mbps is read as Megabytes per second instead of Megabits per second

    MBps and Mbps are very different things. MBps is 8 times larger than Mbps, so Aria needs to know the difference between the two.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  3. This chapter has many issues with Aria Neural TTS

    This textbook (Computer Security Handbook by Seymour Bosworth et al., Chapter 33) seems to cause countless errors using the Aria Neural voice. (attached txt and pdf were trimmed to respect the copyright of the author)

    It messes up the chapter markers, saying "January First, Thirty Three" when it says "33.1.1" (as well as all the other section markers)

    802.11 is pronounced "eight hundred and two point one one"

    SSIDs is pronounced "sids"

    BSSIDs is pronounced "bsids"

    2Mb/s (as well as other Mb/s numbers) is pronounced "two em bee slash ess" which should be pronounced 2 Megabits per second.

    LAN is…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  4. 802.11 is mispronounced

    802.11 as a wireless standard should be pronounced "eight-o-two-eleven" instead of "eight hundred and two point one one"

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  5. Japanese sentence is mispronounced

    このレストランではタバコを吸ってはいけません。is mispronounced. The はいけません part should be pronounced waikemasen and not haikemasen. This is with the Nanami Neural voice.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  6. de vilm wil ik graag in hert deuts wat nu in het pools is

    film in het deuts wat nu in het pools is

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  7. No SSML restrictions on creating TTS audio tuning files

    No SSML restrictions to one SSML elements in a SSML file, if you want to realise multiple tunings in the Audio Content Creation, like breaks, pronunciation, intonation etc.
    (source:
    Improve synthesis with the Audio Content Creation tool
    > Create an audio tuning file:
    "SSML restrictions Each SSML file can only contain a single piece of SSML."
    (https://docs.microsoft.com/en-US/azure/cognitive-services/speech-service/how-to-audio-content-creation)

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  8. BUG - The TTS engine (in Engish) doesn't pronounce well numbers after words. COVID-19, NASDAQ 100, S&P 500, NIKKEI 225

    This is a bug not a feature. covid-19 sounds like covid 19. NASDAQ 100
    S&P500
    etc.
    For the neural voices

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  9. Move Irish text to phonetic ipa conversion upstream.

    Irish (Gaeilge) tts works if correct IPA syntax is used when sending the request for tts with the IPA option included.

                                Example... 
    

    Ba chuid mhór den togra seo teacht ar ábhar a bheadh chomh maith nó níos fearr ná an fhuinseog agus an t-ábhar a bheith níos inmharthana.

                         translated to IPA..
    

    "bˠɑː xɪdʲ woːˈr dʲəɴʲ tʲɔɡˈrə ʃoː tʃæxt eˈr ɑːwəˈr ɑː vʲəh xɔv mˠɑːh ɴˠoː ɴʲiːsˠ fʲæˈrr ɴˠɑː en ɪɴʲʃoːɡ əɡʊsˠ en tʲɑːwəˈr ɑː vʲeɪh ɴʲiːsˠ ɪɴʲwəˈrhɑːɴˠɑː"

    The IPA (irish) text can be read and spoken accurately by Neural EN-GB or EN-US voice.

    I have attached a file that converts…

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  10. Have scripts ready for voice input when creating a custom voice.

    To create a new voice, have scripts ready to be read by the user. As the computer recognizes the user's voice, the sound is recorded and synthesized into a custom voice for the user.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  11. en-GB Neural voice Mia pronounces number 4 too quickly

    En-GB mia neural voice pronounces the number 4 too quickly in sentences. Whether it is the digit or the word, same problem. I’m able to workaround this by adjusting the prosody speed of just that number :)

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  12. 1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  13. If it is possible to identify sounds into characters, then it is good for other language developers to map it to the corresponding words.

    If it is possible to identify sounds into characters, then it is good for other language developers to map it to the corresponding words.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  14. Translating speech service for a new language

    What are the possibilities for speech translation (into languages that are currently not available on the list) on Azure Speech Service?

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech Translation  ·  Flag idea as inappropriate…  ·  Admin →
  15. Retrain a previously trained model on custom speech portal

    I had previously trained a custom speech model and I trying to retrain that model but I am not seeing an option to retrain it, It only gives me an option to train the baseline model.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  16. Properties to define : Max Audio Recognition Time from Microphone OR Stop Recognition on silence

    Hello,

    I am doing speech recognition and am using Android SDK. I plan to move to containers in future. Stopping on silence is default as per documentation. How do i define the following the max audio time recognition time as below:

    If the user is speaking and has spoken more than 15 seconds. The sdk should automatically stop the recognizer on Android end. If the user has spoken less than 15 seconds and was silent in between then it should be based on silence detection. The speech sdk should stop the microphone on android either on silence detection (when spoken…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  17. segmentation length config for recognized result

    from https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/610#event-3282436941

    The reason and scenario of this asking is that, some time the output recognized text is too long to render friendly, e.g. mobile app with recognized text limit of 2 lines each max 20-char (or less).

    E.g.

    Utterance: I will go to bookstore this afternoon to check if any new arrivals. After that Jack will pick up me there to gym for practice. We need to prepare a match in two weeks. Dinner will be taken in gym to save commute overhead. I will arrive home around 8:30 in the evening.

    Current result from speech sdk:
    RECOGNIZED: Text=I…

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  18. Fluency format of recognized result

    from https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/598#event-3275944556

    Suggest add fluency format for scenario like formal meeting transcription, translation, etc. , which does not expect spoken text forms.

    E.g. :

    Utterance: "i want to ah, to book a flight to Denver, i mean, to Boston, the day, the day after, after Monday. "

    RECOGNIZED: "I want to are to book a flight to Denver. I mean to Boston the day, the day after after Monday."

    Expected: " I want to book a flight to Boston the day after Monday."

    Thank you.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  19. Adding custom headers to speech to text Websocket requests

    Adding the ability of adding custom headers to the speech to text sdk so that the intermediate servers can verify the headers to authenticate and authroize . This is required for container versions

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  20. School

    Teacher

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4
  • Don't see your idea?

Feedback and Knowledge Base