Microsoft

Speech Service

  1. Confidence score on word level

    The lack of a confidence score on word level feature is a show stopper for my company's project. It would be extremely useful for us to have the confidence score included within "Words" list , which consist of words and their timestamps.

    According to this answer: https://social.msdn.microsoft.com/Forums/en-US/4979ca92-aa0f-4d09-b010-fc2eeb1bde80/speech-results-confidence-score-on-word-level?forum=AzureCognitiveService#8ae67445-4e23-49ea-b694-a8d877dc2dd0
    the feature is not public and we suspect that it could be provided quickly.

    I'd be grateful for each vote for this idea!

    15 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    5 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  2. Norwegian language needs improvement in grammar

    Norwegian needs a grammatical rethink with regards to compound words.

    Currently in Norwegian we distinguish individual words by putting spaces between them, so for instance when I mention to the speech-to-text-service the console window ("konsollvindu" in Norwegian), it outputs it as "konsoll vindu" with a space between the words.

    This.

    Must.

    Absolutely.

    Be.

    Fixed.

    I am a linguist by degree. Please hire me to fix this if you need help, because you really do.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    5 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  3. LUIS Reference Grammar ID fails West Europe when included

    For West Europe region, the service is returning "Specified grammar type is not supported!" when we pass in LUIS reference grammar ID (a.k.a. IntentRecognizer).

    This causes the speech service to fail with a "WebSocket is already in CLOSING or CLOSED state." error when the LUIS reference grammar ID is passed in. If it is not included, the service works correctly.

    May be related to this issue in the Cognitive Services Speech SDK repo: https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/127

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  4. ARM32 Support for Microsoft.CognitiveServices.Speech API (on Raspbian)

    I'd like to see the speech API coming to Raspberry PI, meaning having FULL ARM32 support for the linux implementation of this SDK. This would enable the raspi maker community to use Azure Speech seamlessly in their devices.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  5. Neural TTS in french

    Hi,

    The new neural text 2 speech feature looks amazing, but one language is missing : French :)

    I don't know if it something on the list, or if it is coming soon, but I'm waiting for this feature to switch from Google Wavenet.
    I'm pretty sure that the french voice generated by this new neural TTS by MS Cognitive service will be a game changer.

    The french language is complex, the emphasis, the punctation etc... but if MS can provide the same awesome quality as they have done in english... we will be able to build something incredible...

    5 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  6. GovCloud - Cognitive Services Endpoints in overview

    It would be beneficial for GovCloud users to have secure access a list of available endpoints for the service listed under the overview instead of just the token issuing endpoint forcing one to comb through out of date documentation trying to find the secured GovCloud endpoints.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  7. Pronunciation Support for words in portuguese Brazil

    The custom speech has good accuracy compared to others solutions but it doen't support pronunciation for portuguese BR, so important words for business aren't recognized. It's a very important feature for business usage. The UI offers only support for english language.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  8. Azure TTS Cognitive Service Voice Limit Issue

    I am very new to learn cognitive services of Text-to-Speech (TTS) of Microsoft Azure. I successfully able to convert the given text into an audio file by using TTS services of Azure. It works fine when I'm having a single voice element in my SSML XML document. The example of working SSML is;

    <speak version="1.0" xml:lang="en-US">
    
    <voice xml:lang="en-US" xml:gender="Male" name="en-US-Jessa24kRUS">
    Hello, this is my sample text to convert into audio?
    </voice>
    </speak>

    But, when I'm having multiple voice tags(on gender base), then it causes an error. The SSML of it is:

    <speak version="1.0" xml:lang="en-US">
    
    <voice xml:lang="en-US" xml:gender="Male" name="en-US-Guy24kRUS"> What’s your
    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  9. Voice has changed

    Something changed with the zh-CN-XiaoxiaoNeural voice - the 'sentiment' expression no longer works, as of a few days ago. Using exactly the same code, but the output is quite different (no longer sounds sentimental). Was an update posted? If so, where do we find a list of changes?

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  10. Automatic determination of English locales

    At present, we have to determine the locale of the input language in detail, such as "Speech to Text" feature. For example, en-US, en-AU, and so on.
    Users may not know which one to choose, so it will be easier to use if they automatically recognize it from their voice.

    Please let me know if you have any prospects for the future.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  11. cris.ai window resizing hides buttons and features.

    Depending on screen resizing, certain elements on cris.ai will not be visible. There seems to be an issue with the underlying architecture of the site since this is the second issue that arose from resizing the window. Both times, things went wrong when the window occupied half or less than half of the monitor screen.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  12. Please support spelling words

    Let's say I'm building an application where I want to know a user's name or address. Whereas I can find the address online, the name might be something unique.

    In this case, I'd like to let the user spell his name to the speech service. However, the results are not very good currently.

    I'd love if there was an option to tell the cognitive services that I'm spelling something or that I only send letters to it.

    Adding a LUIS model or custom intents to the recognizer didn't improve the results either. Very clear names always lead to some characters…

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  13. How do I prevent the speaker from being picked up by the microphone

    We are currently using Bing Speech with LUIS, but looking to convert to Speech service.

    Right now we have multiple recorders that operate in the browser, Flash, WebRTC, HTML5.

    Each of these has to connect to Bing Speech to Text to get realtime translation and LUIS results to drive actions in the application. Additionally we are currently streaming the audio to Amazon S3. Ideally we would like to stream the audio only once, and have it picked up by Microsoft from Speech to Text AND be able to retrieve a URL for later use.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  14. Stop TTS synthesizer

    Once the synthesizer starts synthesizing audio, it can't be stopped. It would be nice to have some method that would stop/interrupt currently running audio synthesis.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  15. audio+transcription not available

    For training a model, I only have access to use "related text" and not audio+transcription.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  16. Proper error messaging when model training fails

    When training a model on speech.microsoft.com fails we need proper error messaging. Right now it literally says "Failed" which provides absolutely no useful information for debugging.

    Why do you hate your support people that they have to deal with customers whose only info is "it failed"...

    Note that this is happening with training data that has already successfully been uploadted.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  17. Support more languages/derivatives for Custom Voice Fonts

    Support more languages/derivatives for Custom Voice Fonts. In particular, I'm looking to create a custom voice for en-GB, and building it using en-US doesn't quite work. I would also be looking to create custom voices for other flavours of English, en-AU being a priority.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  18. New language development

    Please add Nigerian Yoruba language or give opportunity to specific users to develop a model with local language.
    Thank you

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  19. Add support for KWS on IOS

    Add support for wake word detection on IOS platform

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  20. Add support for other audio formats and bitrates

    Add support for other audio formats and bitrates

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1
  • Don't see your idea?

Feedback and Knowledge Base