Microsoft

Speech service

  1. Stop TTS synthesizer

    Once the synthesizer starts synthesizing audio, it can't be stopped. It would be nice to have some method that would stop/interrupt currently running audio synthesis.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  2. audio+transcription not available

    For training a model, I only have access to use "related text" and not audio+transcription.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  3. Proper error messaging when model training fails

    When training a model on speech.microsoft.com fails we need proper error messaging. Right now it literally says "Failed" which provides absolutely no useful information for debugging.

    Why do you hate your support people that they have to deal with customers whose only info is "it failed"...

    Note that this is happening with training data that has already successfully been uploadted.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  4. Add support for other audio formats and bitrates

    Add support for other audio formats and bitrates

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  5. Support more languages/derivatives for Custom Voice Fonts

    Support more languages/derivatives for Custom Voice Fonts. In particular, I'm looking to create a custom voice for en-GB, and building it using en-US doesn't quite work. I would also be looking to create custom voices for other flavours of English, en-AU being a priority.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  6. Add support for KWS on IOS

    Add support for wake word detection on IOS platform

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  7. I need Tamil nlp

    Tamil nlp

    5 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  8. Is there a way to stream audio via WebSocket and get Speech to Text results AND get a copy of the recording on Azure Storage?

    We are currently using Bing Speech with LUIS, but looking to convert to Speech service.

    Right now we have multiple recorders that operate in the browser, Flash, WebRTC, HTML5.

    Each of these has to connect to Bing Speech to Text to get realtime translation and LUIS results to drive actions in the application. Additionally we are currently streaming the audio to Amazon S3. Ideally we would like to stream the audio only once, and have it picked up by Microsoft from Speech to Text AND be able to retrieve a URL for later use.

    Having to maintain two streams has…

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  9. Mark Labels for TTS Speech API

    Azure Speech API should offer json mark labels for Text to Speech audio. This allows developers to use the audio file and the json mark labels to create audio tracking text in the app. The competitor has a similar solution. I found Azure TTS to be superior but am forced to use the competitor's solution due to lack of json mark labels. Speech mark labels should be in json format and available for all languages. It should provide information such as the begin and end timestamp of each sound to the text, phrase and sentences.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  10. Xamarin Android & Xamarin IOS SDK

    There are no Xamarin Android & Xamarin IOS SDK for the latest custom speech used by Microsoft.

    We hope we can use the SDK ASAP, since we think Xamarin is one of the Microsoft core product and widely used.

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  11. Gaeilge (Irish)

    Please add Gaeilge the Irish language.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  12. Please continue to support the Korean language supported by Bing speech api

    I would like to use the new Speech SDK. I am currently using Bing speech API, but if Korean is added, I will change to Speech SDK.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  13. Speech Service very high response time

    we are using https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstart-js-browser
    for filing user data on screen based on Voice input (recognizeOnceAsync)..but i am facing multiple problems in that
    1. There is no way around to stop recognizer immediately once i speak some words ..it take 4-5 seconds to stop mic processing ..
    2. assume i want to enter user first name based on voice ..but it takes 5-6 to complete the process ,which is not recommendable..need to your inputs to improve it .
    3. Any results returned from Speech service is appended with extra (.) in last of result . not sure why its happening .

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  14. Samples

    I haven't been able to find samples of the TTS anywhere.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  15. need amr to be supported in Speech to Text services

    I saw that currently only wav audio format is supported.
    https://docs.microsoft.com/en-us/azure/cognitive-services/speech/getstarted/getstartedrest?tabs=Powershell

    However, my customer is using Cordova App and the default format in Android Cordova is AMR audio.
    We hope Azure Speech to Text can accept AMR as well.

    This cloud services accept various format of audio including AMR, so I think technically it is feasible for Azure.
    http://www.folio3.com/speech-to-text-services/

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  16. Connector for PowerApps

    Please provide a connector which can be used in PowerApps.

    I have tried to use the service from the demo in an azure function, but this fails currently by an unknown error :(

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  17. improve the recognition of numbers

    The recognition of words like "tenthousend" is "10 1.000" but should be "10000" this is bad if you try to allow an Input of numbers bigger then 1000

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  18. captialization

    This service is not doing a good job of capitalizing the first word in sentences. I say "I ate lunch period it was good" and I get

    I ate lunch period it was good. This later becomes

    I ate lunch. it was good

    For some reason the recognition of a period in punctuation doesn't produce the capitalization of the next sentence.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  19. 3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Sample Requests  ·  Flag idea as inappropriate…  ·  Admin →
    Completed  ·  Allison Light responded

    Thanks for letting us know about the broken code! We’ve updated our documentation to link to the built-in Windows 10 Speech API which is the suggested way to call Speech API through UWP applications. You can read more about it using the links below.

    Documentation: https://msdn.microsoft.com/en-us/library/windows/apps/windows.media.speechrecognition.aspx.
    Sample: https://github.com/Microsoft/Windows-universal-samples/tree/master/Samples/SpeechRecognitionAndSynthesis

  20. confidence number value per word or per speech fragment

    I am doing a POC with speech recognition for long speeches.
    https://docs.microsoft.com/de-de/azure/cognitive-services/speech/concepts#recognition-modes

    The recognition mode "conversation" with format "detailed" delivers message responses of type "SpeechPhrase" including confidence value.

    The recognition mode "dictation" with format "detailed" delivers message responses of type "SpeechFragment" and "SpeechPhrase" (including confidence value). But the fragments do not contain any information about confidence value.
    With the C# service library and the recognition mode "dictation" you'll get partial results with a confidence value (enum). But this is not our desired solution, because the confidence value seems to belong to the whole phrase (Confidence: Indicates the level of confidence…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  • Don't see your idea?

Feedback and Knowledge Base