Microsoft

Bing Speech

Welcome to the Bing Speech Forum

The Cognitive Service's Speech Service is replacing Bing Speech. Please refer to their forum for speech product feedback


Categories

Documentation – Any ideas or suggestions for the API Reference or Documentation.

Language Support – Submit a request to have a particular language supported.

Samples & SDK Requests – Let us know if you would like to see a tutorial or sample provided.

Speech to Text – API & SDK – Ideas and feature requests to Speech Recognition and Speech to Text (STT).

Text to Speech – Ideas and feature requests for Text to Speech (TTS) – API only


                               Attention!




We have moved our Customer Feedback & Ideas for Azure Cognitive Services portal to the Azure Feedback Forum.





Please go to the link below to access our new Feedback and Ideas Page.


  1. Control/reduce the amount of silence at end of text-to-speech clip

    Based on my testing, the current text-to-speech outputs appear to have anywhere between 659ms and 672ms of silence at the end. The start of the audio have between 62ms and 86ms of silence.

    When using multiple generated phrases in sequence, the long gap at the end makes the flow sound unnatural. At present, I am have to post process the generated auto using NAudio to clip the amount of silence at the end. At present, I have found about 100ms at the end allows the clips to played in sequence sounding natural.

    It would be good if the amount of…

    10 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Text to Speech - API Only  ·  Flag idea as inappropriate…  ·  Admin →
  2. Support SSML for customizing pronunciation

    The text-to-speech HTML interface accepts the input as an SSML document. I was trying to use the following features of SSML without any luck:

    <say-as interpret-as="ordinal">3</say-as> - should say "third"
    <phoneme> - to render speech by its phonetic pronunciation. For example I get the wrong pronunciation of 'record'. I want to force it to use the correct pronunciation. (ie record player vs record an song).
    Use SSML to Control Synthesized Speech

    Most of the things I tired were from the old Microsoft Speech SDK documentation. Is there any guidance on what is supported, what is not supported and any plans…

    5 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Text to Speech - API Only  ·  Flag idea as inappropriate…  ·  Admin →
  3. Using Cognitive Service APIs from Unity

    Add sample code to help developers interested in use Microsoft Cognitive Service APIs from a Unity project. (via MSDN forum)

    For Speech recognition specifically, the REST API can't do live recognition, so having a Unity example would help developers who want to code in this platform.

    14 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  3 comments  ·  Samples & SDK Request  ·  Flag idea as inappropriate…  ·  Admin →
  4. Web client for Bing Speech APIs

    The existing REST APIs don't provide functionality that the "native" clients do.

    It would be great to have a JavaScript web client (With partial results and silence detection) that works via WebSockets, WebRTC, HTTP2, or other existing standard. For example, the demo on this page: https://www.microsoft.com/cognitive-services/en-us/speech-api

    82 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Completed  ·  5 comments  ·  Samples & SDK Request  ·  Flag idea as inappropriate…  ·  Admin →
  5. C++ client for Bing Speech APIs

    Would be great to add support for C++ (linux) of Bing Speech APIs

    6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  2 comments  ·  Samples & SDK Request  ·  Flag idea as inappropriate…  ·  Admin →
  6. Time stamps of recognized text.

    Suppose I have an audio file in which Mr. Satya Nadella saying. "Our mission at Microsoft is to make things that help you make things and make things happen." smile emoticon and suppose total duration of audio file is 20 seconds. I want SAPI to return recognized text in such way.
    Word (speakStartTime, speakEndTime).. Our(1,2) Mission(3,4) at(5,6) and so on.

    15 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  2 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  7. Have consistency across API SDKs in a category

    There is a big difference in code languages and platform support across even a single category. For instance, under Speech, I can use TextToSpeech API via HTTP on multiple platforms and using multiple languages (Javascript/Node, PHP, Python, etc.) but the SpeechToText SDK is only available on Windows (.NET), Android (Java), and iOS with no HTTP options. It would be nice to have at least close to the same options for both considering they are both under a single category of "Speech".

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Samples & SDK Request  ·  Flag idea as inappropriate…  ·  Admin →
1 2 3 5 Next →

Feedback and Knowledge Base