Microsoft

Speech Service

  1. About the display of the character acquired by SpeechToText (SpeechSDK)

    The results obtained by the SpeechRecognizer's Recognized event are not broken by punctuation marks, and sentences are connected even if the speaker changes.
    Therefore, it is not possible to know the timing of the change of the speaker.
    I want you to improve it so that an event occurs for each punctuation mark.

    38 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  2. Confidence score on word level

    The lack of a confidence score on word level feature is a show stopper for my company's project. It would be extremely useful for us to have the confidence score included within "Words" list , which consist of words and their timestamps.

    According to this answer: https://social.msdn.microsoft.com/Forums/en-US/4979ca92-aa0f-4d09-b010-fc2eeb1bde80/speech-results-confidence-score-on-word-level?forum=AzureCognitiveService#8ae67445-4e23-49ea-b694-a8d877dc2dd0
    the feature is not public and we suspect that it could be provided quickly.

    I'd be grateful for each vote for this idea!

    18 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    10 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  3. Possibility to talk English, but in the different foreign accents.

    Like a German/French/Spanish/Italian person speaking English, all have their own accent. Perfect for applications like Air Traffic Control etc.

    6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  4. 6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  5. Neural TTS in french

    Hi,

    The new neural text 2 speech feature looks amazing, but one language is missing : French :)

    I don't know if it something on the list, or if it is coming soon, but I'm waiting for this feature to switch from Google Wavenet.
    I'm pretty sure that the french voice generated by this new neural TTS by MS Cognitive service will be a game changer.

    The french language is complex, the emphasis, the punctation etc... but if MS can provide the same awesome quality as they have done in english... we will be able to build something incredible...

    5 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  6. Xamarin Android & Xamarin IOS SDK

    There are no Xamarin Android & Xamarin IOS SDK for the latest custom speech used by Microsoft.

    We hope we can use the SDK ASAP, since we think Xamarin is one of the Microsoft core product and widely used.

    5 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  7. Please support spelling words

    Let's say I'm building an application where I want to know a user's name or address. Whereas I can find the address online, the name might be something unique.

    In this case, I'd like to let the user spell his name to the speech service. However, the results are not very good currently.

    I'd love if there was an option to tell the cognitive services that I'm spelling something or that I only send letters to it.

    Adding a LUIS model or custom intents to the recognizer didn't improve the results either. Very clear names always lead to some characters…

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  8. Add support for other audio formats and bitrates

    Add support for other audio formats and bitrates

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  9. Move Irish text to phonetic ipa conversion upstream.

    Irish (Gaeilge) tts works if correct IPA syntax is used when sending the request for tts with the IPA option included.

                                Example... 
    

    Ba chuid mhór den togra seo teacht ar ábhar a bheadh chomh maith nó níos fearr ná an fhuinseog agus an t-ábhar a bheith níos inmharthana.

                         translated to IPA..
    

    "bˠɑː xɪdʲ woːˈr dʲəɴʲ tʲɔɡˈrə ʃoː tʃæxt eˈr ɑːwəˈr ɑː vʲəh xɔv mˠɑːh ɴˠoː ɴʲiːsˠ fʲæˈrr ɴˠɑː en ɪɴʲʃoːɡ əɡʊsˠ en tʲɑːwəˈr ɑː vʲeɪh ɴʲiːsˠ ɪɴʲwəˈrhɑːɴˠɑː"

    The IPA (irish) text can be read and spoken accurately by Neural EN-GB or EN-US voice.

    I have attached a file that converts…

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  10. Retrain a previously trained model on custom speech portal

    I had previously trained a custom speech model and I trying to retrain that model but I am not seeing an option to retrain it, It only gives me an option to train the baseline model.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  11. REST API support for custom phrase lists

    REST API support for custom phrase lists

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  12. LUIS Reference Grammar ID fails West Europe when included

    For West Europe region, the service is returning "Specified grammar type is not supported!" when we pass in LUIS reference grammar ID (a.k.a. IntentRecognizer).

    This causes the speech service to fail with a "WebSocket is already in CLOSING or CLOSED state." error when the LUIS reference grammar ID is passed in. If it is not included, the service works correctly.

    May be related to this issue in the Cognitive Services Speech SDK repo: https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/127

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  13. Stop TTS synthesizer

    Once the synthesizer starts synthesizing audio, it can't be stopped. It would be nice to have some method that would stop/interrupt currently running audio synthesis.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  14. Support more languages/derivatives for Custom Voice Fonts

    Support more languages/derivatives for Custom Voice Fonts. In particular, I'm looking to create a custom voice for en-GB, and building it using en-US doesn't quite work. I would also be looking to create custom voices for other flavours of English, en-AU being a priority.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  15. Mark Labels for TTS Speech API

    Azure Speech API should offer json mark labels for Text to Speech audio. This allows developers to use the audio file and the json mark labels to create audio tracking text in the app. The competitor has a similar solution. I found Azure TTS to be superior but am forced to use the competitor's solution due to lack of json mark labels. Speech mark labels should be in json format and available for all languages. It should provide information such as the begin and end timestamp of each sound to the text, phrase and sentences.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  16. Is there a way to stream audio via WebSocket and get Speech to Text results AND get a copy of the recording on Azure Storage?

    We are currently using Bing Speech with LUIS, but looking to convert to Speech service.

    Right now we have multiple recorders that operate in the browser, Flash, WebRTC, HTML5.

    Each of these has to connect to Bing Speech to Text to get realtime translation and LUIS results to drive actions in the application. Additionally we are currently streaming the audio to Amazon S3. Ideally we would like to stream the audio only once, and have it picked up by Microsoft from Speech to Text AND be able to retrieve a URL for later use.

    Having to maintain two streams has…

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  17. 3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Sample Requests  ·  Flag idea as inappropriate…  ·  Admin →
    Completed  ·  Allison Light responded

    Thanks for letting us know about the broken code! We’ve updated our documentation to link to the built-in Windows 10 Speech API which is the suggested way to call Speech API through UWP applications. You can read more about it using the links below.

    Documentation: https://msdn.microsoft.com/en-us/library/windows/apps/windows.media.speechrecognition.aspx.
    Sample: https://github.com/Microsoft/Windows-universal-samples/tree/master/Samples/SpeechRecognitionAndSynthesis

  18. BUG - The TTS engine (in Engish) doesn't pronounce well numbers after words. COVID-19, NASDAQ 100, S&P 500, NIKKEI 225

    This is a bug not a feature. covid-19 sounds like covid 19. NASDAQ 100
    S&P500
    etc.
    For the neural voices

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  19. Have scripts ready for voice input when creating a custom voice.

    To create a new voice, have scripts ready to be read by the user. As the computer recognizes the user's voice, the sound is recorded and synthesized into a custom voice for the user.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  20. en-GB Neural voice Mia pronounces number 4 too quickly

    En-GB mia neural voice pronounces the number 4 too quickly in sentences. Whether it is the digit or the word, same problem. I’m able to workaround this by adjusting the prosody speed of just that number :)

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4
  • Don't see your idea?

Feedback and Knowledge Base