Microsoft

Speech Service

  1. About the display of the character acquired by SpeechToText (SpeechSDK)

    The results obtained by the SpeechRecognizer's Recognized event are not broken by punctuation marks, and sentences are connected even if the speaker changes.
    Therefore, it is not possible to know the timing of the change of the speaker.
    I want you to improve it so that an event occurs for each punctuation mark.

    38 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  2. Possibility to talk English, but in the different foreign accents.

    Like a German/French/Spanish/Italian person speaking English, all have their own accent. Perfect for applications like Air Traffic Control etc.

    6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  3. en-GB Neural voice Mia pronounces number 4 too quickly

    En-GB mia neural voice pronounces the number 4 too quickly in sentences. Whether it is the digit or the word, same problem. I’m able to workaround this by adjusting the prosody speed of just that number :)

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  4. Confidence score on word level

    The lack of a confidence score on word level feature is a show stopper for my company's project. It would be extremely useful for us to have the confidence score included within "Words" list , which consist of words and their timestamps.

    According to this answer: https://social.msdn.microsoft.com/Forums/en-US/4979ca92-aa0f-4d09-b010-fc2eeb1bde80/speech-results-confidence-score-on-word-level?forum=AzureCognitiveService#8ae67445-4e23-49ea-b694-a8d877dc2dd0
    the feature is not public and we suspect that it could be provided quickly.

    I'd be grateful for each vote for this idea!

    18 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    10 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  5. Retrain a previously trained model on custom speech portal

    I had previously trained a custom speech model and I trying to retrain that model but I am not seeing an option to retrain it, It only gives me an option to train the baseline model.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  6. 1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  7. If it is possible to identify sounds into characters, then it is good for other language developers to map it to the corresponding words.

    If it is possible to identify sounds into characters, then it is good for other language developers to map it to the corresponding words.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  8. Pronounced Unites Incorrectly. (E.g mili Watt as mW)

    Some of the unites are not pronounces correctly.
    In my case i am using "9 Mega Watt" as (9MW) and its speaking correctly, but for 9 mili Watt (mW), its speaking wrong and saying Mega Watt. Can you update it, or provide separate access for customization acronyms.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  9. Translating speech service for a new language

    What are the possibilities for speech translation (into languages that are currently not available on the list) on Azure Speech Service?

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech Translation  ·  Flag idea as inappropriate…  ·  Admin →
  10. Properties to define : Max Audio Recognition Time from Microphone OR Stop Recognition on silence

    Hello,

    I am doing speech recognition and am using Android SDK. I plan to move to containers in future. Stopping on silence is default as per documentation. How do i define the following the max audio time recognition time as below:

    If the user is speaking and has spoken more than 15 seconds. The sdk should automatically stop the recognizer on Android end. If the user has spoken less than 15 seconds and was silent in between then it should be based on silence detection. The speech sdk should stop the microphone on android either on silence detection (when spoken…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  11. BUG - The TTS engine (in Engish) doesn't pronounce well numbers after words. COVID-19, NASDAQ 100, S&P 500, NIKKEI 225

    This is a bug not a feature. covid-19 sounds like covid 19. NASDAQ 100
    S&P500
    etc.
    For the neural voices

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  12. IVR Access to DirectLine Speech API

    how an IVR system can call this API? more details about protocol used and sample would be highly appreciated

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Sample Requests  ·  Flag idea as inappropriate…  ·  Admin →
  13. Move Irish text to phonetic ipa conversion upstream.

    Irish (Gaeilge) tts works with neural voice but only if correct IPA syntax is used before sending the request for tts using the cognitive services tts API. I have found a quick way to convert Irish sentences into IPA text. Suggest that this conversion from Irish text to IPA text should be available at Microsoft server side so that tts for irish (Gaeilge) would be available for everyone. Please see attached example of irish Gaeilge tts MP3 using this method.

    http://lmknjb.com/irishTTS/IRISHTTSSAMPLE.mp3

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  14. segmentation length config for recognized result

    from https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/610#event-3282436941

    The reason and scenario of this asking is that, some time the output recognized text is too long to render friendly, e.g. mobile app with recognized text limit of 2 lines each max 20-char (or less).

    E.g.

    Utterance: I will go to bookstore this afternoon to check if any new arrivals. After that Jack will pick up me there to gym for practice. We need to prepare a match in two weeks. Dinner will be taken in gym to save commute overhead. I will arrive home around 8:30 in the evening.

    Current result from speech sdk:
    RECOGNIZED: Text=I…

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  15. Have scripts ready for voice input when creating a custom voice.

    To create a new voice, have scripts ready to be read by the user. As the computer recognizes the user's voice, the sound is recorded and synthesized into a custom voice for the user.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  16. Show multilanguage translations on a single screen during a presentation

    Hello,
    I have the following use case (international wedding): The presenter speaks in French. I would like to show the German and Catalan live translations on a single screen for the audience. Is this possible? I know that there is the conversation feature readily available, but not everybody in the audience has a smartphone.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  17. Norwegian language needs improvement in grammar

    Norwegian needs a grammatical rethink with regards to compound words.

    Currently in Norwegian we distinguish individual words by putting spaces between them, so for instance when I mention to the speech-to-text-service the console window ("konsollvindu" in Norwegian), it outputs it as "konsoll vindu" with a space between the words.

    This.

    Must.

    Absolutely.

    Be.

    Fixed.

    I am a linguist by degree. Please hire me to fix this if you need help, because you really do.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    5 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  18. LUIS Reference Grammar ID fails West Europe when included

    For West Europe region, the service is returning "Specified grammar type is not supported!" when we pass in LUIS reference grammar ID (a.k.a. IntentRecognizer).

    This causes the speech service to fail with a "WebSocket is already in CLOSING or CLOSED state." error when the LUIS reference grammar ID is passed in. If it is not included, the service works correctly.

    May be related to this issue in the Cognitive Services Speech SDK repo: https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/127

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  19. Language Support for Greek

    Is Greek on the roadmap? Please let me know when it is planned. If not, please add it.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  20. Fluency format of recognized result

    from https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/598#event-3275944556

    Suggest add fluency format for scenario like formal meeting transcription, translation, etc. , which does not expect spoken text forms.

    E.g. :

    Utterance: "i want to ah, to book a flight to Denver, i mean, to Boston, the day, the day after, after Monday. "

    RECOGNIZED: "I want to are to book a flight to Denver. I mean to Boston the day, the day after after Monday."

    Expected: " I want to book a flight to Boston the day after Monday."

    Thank you.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3
  • Don't see your idea?

Feedback and Knowledge Base