Microsoft

Speech Service

  1. About the display of the character acquired by SpeechToText (SpeechSDK)

    The results obtained by the SpeechRecognizer's Recognized event are not broken by punctuation marks, and sentences are connected even if the speaker changes.
    Therefore, it is not possible to know the timing of the change of the speaker.
    I want you to improve it so that an event occurs for each punctuation mark.

    38 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  2. Confidence score on word level

    The lack of a confidence score on word level feature is a show stopper for my company's project. It would be extremely useful for us to have the confidence score included within "Words" list , which consist of words and their timestamps.

    According to this answer: https://social.msdn.microsoft.com/Forums/en-US/4979ca92-aa0f-4d09-b010-fc2eeb1bde80/speech-results-confidence-score-on-word-level?forum=AzureCognitiveService#8ae67445-4e23-49ea-b694-a8d877dc2dd0
    the feature is not public and we suspect that it could be provided quickly.

    I'd be grateful for each vote for this idea!

    18 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    10 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  3. Possibility to talk English, but in the different foreign accents.

    Like a German/French/Spanish/Italian person speaking English, all have their own accent. Perfect for applications like Air Traffic Control etc.

    6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  4. 6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  5. Neural TTS in french

    Hi,

    The new neural text 2 speech feature looks amazing, but one language is missing : French :)

    I don't know if it something on the list, or if it is coming soon, but I'm waiting for this feature to switch from Google Wavenet.
    I'm pretty sure that the french voice generated by this new neural TTS by MS Cognitive service will be a game changer.

    The french language is complex, the emphasis, the punctation etc... but if MS can provide the same awesome quality as they have done in english... we will be able to build something incredible...

    5 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  6. Xamarin Android & Xamarin IOS SDK

    There are no Xamarin Android & Xamarin IOS SDK for the latest custom speech used by Microsoft.

    We hope we can use the SDK ASAP, since we think Xamarin is one of the Microsoft core product and widely used.

    5 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  7. Please support spelling words

    Let's say I'm building an application where I want to know a user's name or address. Whereas I can find the address online, the name might be something unique.

    In this case, I'd like to let the user spell his name to the speech service. However, the results are not very good currently.

    I'd love if there was an option to tell the cognitive services that I'm spelling something or that I only send letters to it.

    Adding a LUIS model or custom intents to the recognizer didn't improve the results either. Very clear names always lead to some characters…

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  8. Add support for other audio formats and bitrates

    Add support for other audio formats and bitrates

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  9. LUIS Reference Grammar ID fails West Europe when included

    For West Europe region, the service is returning "Specified grammar type is not supported!" when we pass in LUIS reference grammar ID (a.k.a. IntentRecognizer).

    This causes the speech service to fail with a "WebSocket is already in CLOSING or CLOSED state." error when the LUIS reference grammar ID is passed in. If it is not included, the service works correctly.

    May be related to this issue in the Cognitive Services Speech SDK repo: https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/127

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  10. Support more languages/derivatives for Custom Voice Fonts

    Support more languages/derivatives for Custom Voice Fonts. In particular, I'm looking to create a custom voice for en-GB, and building it using en-US doesn't quite work. I would also be looking to create custom voices for other flavours of English, en-AU being a priority.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  11. Is there a way to stream audio via WebSocket and get Speech to Text results AND get a copy of the recording on Azure Storage?

    We are currently using Bing Speech with LUIS, but looking to convert to Speech service.

    Right now we have multiple recorders that operate in the browser, Flash, WebRTC, HTML5.

    Each of these has to connect to Bing Speech to Text to get realtime translation and LUIS results to drive actions in the application. Additionally we are currently streaming the audio to Amazon S3. Ideally we would like to stream the audio only once, and have it picked up by Microsoft from Speech to Text AND be able to retrieve a URL for later use.

    Having to maintain two streams has…

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  12. 3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Sample Requests  ·  Flag idea as inappropriate…  ·  Admin →
    Completed  ·  Allison Light responded

    Thanks for letting us know about the broken code! We’ve updated our documentation to link to the built-in Windows 10 Speech API which is the suggested way to call Speech API through UWP applications. You can read more about it using the links below.

    Documentation: https://msdn.microsoft.com/en-us/library/windows/apps/windows.media.speechrecognition.aspx.
    Sample: https://github.com/Microsoft/Windows-universal-samples/tree/master/Samples/SpeechRecognitionAndSynthesis

  13. en-GB Neural voice Mia pronounces number 4 too quickly

    En-GB mia neural voice pronounces the number 4 too quickly in sentences. Whether it is the digit or the word, same problem. I’m able to workaround this by adjusting the prosody speed of just that number :)

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  14. Retrain a previously trained model on custom speech portal

    I had previously trained a custom speech model and I trying to retrain that model but I am not seeing an option to retrain it, It only gives me an option to train the baseline model.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  15. segmentation length config for recognized result

    from https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/610#event-3282436941

    The reason and scenario of this asking is that, some time the output recognized text is too long to render friendly, e.g. mobile app with recognized text limit of 2 lines each max 20-char (or less).

    E.g.

    Utterance: I will go to bookstore this afternoon to check if any new arrivals. After that Jack will pick up me there to gym for practice. We need to prepare a match in two weeks. Dinner will be taken in gym to save commute overhead. I will arrive home around 8:30 in the evening.

    Current result from speech sdk:
    RECOGNIZED: Text=I…

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  16. Pronounced Unites Incorrectly. (E.g mili Watt as mW)

    Some of the unites are not pronounces correctly.
    In my case i am using "9 Mega Watt" as (9MW) and its speaking correctly, but for 9 mili Watt (mW), its speaking wrong and saying Mega Watt. Can you update it, or provide separate access for customization acronyms.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  17. Language Support for Greek

    Is Greek on the roadmap? Please let me know when it is planned. If not, please add it.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  18. REST API support for custom phrase lists

    REST API support for custom phrase lists

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  19. ARM32 Support for Microsoft.CognitiveServices.Speech API (on Raspbian)

    I'd like to see the speech API coming to Raspberry PI, meaning having FULL ARM32 support for the linux implementation of this SDK. This would enable the raspi maker community to use Azure Speech seamlessly in their devices.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  20. Norwegian language needs improvement in grammar

    Norwegian needs a grammatical rethink with regards to compound words.

    Currently in Norwegian we distinguish individual words by putting spaces between them, so for instance when I mention to the speech-to-text-service the console window ("konsollvindu" in Norwegian), it outputs it as "konsoll vindu" with a space between the words.

    This.

    Must.

    Absolutely.

    Be.

    Fixed.

    I am a linguist by degree. Please hire me to fix this if you need help, because you really do.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    5 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3
  • Don't see your idea?

Feedback and Knowledge Base