Microsoft

Speech service

  1. Automatic determination of English locales

    At present, we have to determine the locale of the input language in detail, such as "Speech to Text" feature. For example, en-US, en-AU, and so on.
    Users may not know which one to choose, so it will be easier to use if they automatically recognize it from their voice.

    Please let me know if you have any prospects for the future.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  2. Norwegian language needs improvement in grammar

    Norwegian needs a grammatical rethink with regards to compound words.

    Currently in Norwegian we distinguish individual words by putting spaces between them, so for instance when I mention to the speech-to-text-service the console window ("konsollvindu" in Norwegian), it outputs it as "konsoll vindu" with a space between the words.

    This.

    Must.

    Absolutely.

    Be.

    Fixed.

    I am a linguist by degree. Please hire me to fix this if you need help, because you really do.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    5 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  3. cris.ai window resizing hides buttons and features.

    Depending on screen resizing, certain elements on cris.ai will not be visible. There seems to be an issue with the underlying architecture of the site since this is the second issue that arose from resizing the window. Both times, things went wrong when the window occupied half or less than half of the monitor screen.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  4. How do I prevent the speaker from being picked up by the microphone

    We are currently using Bing Speech with LUIS, but looking to convert to Speech service.

    Right now we have multiple recorders that operate in the browser, Flash, WebRTC, HTML5.

    Each of these has to connect to Bing Speech to Text to get realtime translation and LUIS results to drive actions in the application. Additionally we are currently streaming the audio to Amazon S3. Ideally we would like to stream the audio only once, and have it picked up by Microsoft from Speech to Text AND be able to retrieve a URL for later use.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  5. Neural TTS in french

    Hi,

    The new neural text 2 speech feature looks amazing, but one language is missing : French :)

    I don't know if it something on the list, or if it is coming soon, but I'm waiting for this feature to switch from Google Wavenet.
    I'm pretty sure that the french voice generated by this new neural TTS by MS Cognitive service will be a game changer.

    The french language is complex, the emphasis, the punctation etc... but if MS can provide the same awesome quality as they have done in english... we will be able to build something incredible...

    5 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  6. Confidence score on word level

    The lack of a confidence score on word level feature is a show stopper for my company's project. It would be extremely useful for us to have the confidence score included within "Words" list , which consist of words and their timestamps.

    According to this answer: https://social.msdn.microsoft.com/Forums/en-US/4979ca92-aa0f-4d09-b010-fc2eeb1bde80/speech-results-confidence-score-on-word-level?forum=AzureCognitiveService#8ae67445-4e23-49ea-b694-a8d877dc2dd0
    the feature is not public and we suspect that it could be provided quickly.

    I'd be grateful for each vote for this idea!

    13 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    5 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  7. Please support spelling words

    Let's say I'm building an application where I want to know a user's name or address. Whereas I can find the address online, the name might be something unique.

    In this case, I'd like to let the user spell his name to the speech service. However, the results are not very good currently.

    I'd love if there was an option to tell the cognitive services that I'm spelling something or that I only send letters to it.

    Adding a LUIS model or custom intents to the recognizer didn't improve the results either. Very clear names always lead to some characters…

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  8. New language development

    Please add Nigerian Yoruba language or give opportunity to specific users to develop a model with local language.
    Thank you

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  9. Stop TTS synthesizer

    Once the synthesizer starts synthesizing audio, it can't be stopped. It would be nice to have some method that would stop/interrupt currently running audio synthesis.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  10. audio+transcription not available

    For training a model, I only have access to use "related text" and not audio+transcription.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  11. Proper error messaging when model training fails

    When training a model on speech.microsoft.com fails we need proper error messaging. Right now it literally says "Failed" which provides absolutely no useful information for debugging.

    Why do you hate your support people that they have to deal with customers whose only info is "it failed"...

    Note that this is happening with training data that has already successfully been uploadted.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  12. Add support for KWS on IOS

    Add support for wake word detection on IOS platform

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  13. Support more languages/derivatives for Custom Voice Fonts

    Support more languages/derivatives for Custom Voice Fonts. In particular, I'm looking to create a custom voice for en-GB, and building it using en-US doesn't quite work. I would also be looking to create custom voices for other flavours of English, en-AU being a priority.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  14. Mark Labels for TTS Speech API

    Azure Speech API should offer json mark labels for Text to Speech audio. This allows developers to use the audio file and the json mark labels to create audio tracking text in the app. The competitor has a similar solution. I found Azure TTS to be superior but am forced to use the competitor's solution due to lack of json mark labels. Speech mark labels should be in json format and available for all languages. It should provide information such as the begin and end timestamp of each sound to the text, phrase and sentences.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  15. Add support for other audio formats and bitrates

    Add support for other audio formats and bitrates

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  16. Gaeilge (Irish)

    Please add Gaeilge the Irish language.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  17. Is there a way to stream audio via WebSocket and get Speech to Text results AND get a copy of the recording on Azure Storage?

    We are currently using Bing Speech with LUIS, but looking to convert to Speech service.

    Right now we have multiple recorders that operate in the browser, Flash, WebRTC, HTML5.

    Each of these has to connect to Bing Speech to Text to get realtime translation and LUIS results to drive actions in the application. Additionally we are currently streaming the audio to Amazon S3. Ideally we would like to stream the audio only once, and have it picked up by Microsoft from Speech to Text AND be able to retrieve a URL for later use.

    Having to maintain two streams has…

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  18. Please continue to support the Korean language supported by Bing speech api

    I would like to use the new Speech SDK. I am currently using Bing speech API, but if Korean is added, I will change to Speech SDK.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  19. 6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  20. Speech Service very high response time

    we are using https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstart-js-browser
    for filing user data on screen based on Voice input (recognizeOnceAsync)..but i am facing multiple problems in that
    1. There is no way around to stop recognizer immediately once i speak some words ..it take 4-5 seconds to stop mic processing ..
    2. assume i want to enter user first name based on voice ..but it takes 5-6 to complete the process ,which is not recommendable..need to your inputs to improve it .
    3. Any results returned from Speech service is appended with extra (.) in last of result . not sure why its happening .

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1
  • Don't see your idea?

Feedback and Knowledge Base