Microsoft

Bing Speech

Welcome to the Bing Speech Forum

The Cognitive Service's Speech Service is replacing Bing Speech. Please refer to their forum for speech product feedback


Categories

Documentation – Any ideas or suggestions for the API Reference or Documentation.

Language Support – Submit a request to have a particular language supported.

Samples & SDK Requests – Let us know if you would like to see a tutorial or sample provided.

Speech to Text – API & SDK – Ideas and feature requests to Speech Recognition and Speech to Text (STT).

Text to Speech – Ideas and feature requests for Text to Speech (TTS) – API only


  1. Web client for Bing Speech APIs

    The existing REST APIs don't provide functionality that the "native" clients do.

    It would be great to have a JavaScript web client (With partial results and silence detection) that works via WebSockets, WebRTC, HTTP2, or other existing standard. For example, the demo on this page: https://www.microsoft.com/cognitive-services/en-us/speech-api

    82 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Completed  ·  5 comments  ·  Samples & SDK Request  ·  Flag idea as inappropriate…  ·  Admin →
  2. Woman voice for Slovenian language

    We need a woman voice as the customers have branded bots (personas)

    27 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
  3. Text to Speech - Request to improve or replace Korean voices

    Text to Speech: Korean - KR / HeamiRUS

    Currently TTS is supported by HeamiRUS.
    But Korean voices are too mechanical.
    I tried the Read aloud feature in Microsoft Edge, but the Korean voice is too awkward.
    English is quite natural.
    Please improve this or replace it with a new voice.

    And Korean voices are only female voices.
    Please add a male voice later.

    I used Google Translator

    25 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    23 comments  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
  4. Modify Speech Recognition API to allow continuous speech

    I need a functionality that when a user click on START button, it will start and continue the Speech Recognition until user hit on STOP button.

    17 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  2 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  5. Time stamps of recognized text.

    Suppose I have an audio file in which Mr. Satya Nadella saying. "Our mission at Microsoft is to make things that help you make things and make things happen." smile emoticon and suppose total duration of audio file is 20 seconds. I want SAPI to return recognized text in such way.
    Word (speakStartTime, speakEndTime).. Our(1,2) Mission(3,4) at(5,6) and so on.

    15 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  2 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  6. Using Cognitive Service APIs from Unity

    Add sample code to help developers interested in use Microsoft Cognitive Service APIs from a Unity project. (via MSDN forum)

    For Speech recognition specifically, the REST API can't do live recognition, so having a Unity example would help developers who want to code in this platform.

    14 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  3 comments  ·  Samples & SDK Request  ·  Flag idea as inappropriate…  ·  Admin →
  7. Bing Speech API: support multiple audio formats

    Please add support of other audio formats like ogg, aac.

    13 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  5 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  8. HIPAA compliance

    Make the Speech to Text data HIPAA compliant so sensitive client information can be spoken and converted to text.

    11 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  9. Add NodeJS examples on GitHup for Bing Speech APIs

    The Bing speech recognition is excellent in comparing its accuracy to other NLP services and adding examples to GitHub is great but many developers use non-MSFT environments to build their apps. Please add a clean, Node.js example. Thanks!

    11 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  1 comment  ·  Samples & SDK Request  ·  Flag idea as inappropriate…  ·  Admin →
  10. Keyword Spotting or HotWord "Hey Cortana"

    A big missing feature of SaaS speech recognition is offline Keyword Spotting or HotWord like "Hey Cortana, Ok Google" to capture audio and send it to Oxford.

    So company like Nuance have a SDK with that. CMUSphinx have an implementation. Would be so great to have something from Microsoft.

    10 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  11. Control/reduce the amount of silence at end of text-to-speech clip

    Based on my testing, the current text-to-speech outputs appear to have anywhere between 659ms and 672ms of silence at the end. The start of the audio have between 62ms and 86ms of silence.

    When using multiple generated phrases in sequence, the long gap at the end makes the flow sound unnatural. At present, I am have to post process the generated auto using NAudio to clip the amount of silence at the end. At present, I have found about 100ms at the end allows the clips to played in sequence sounding natural.

    It would be good if the amount of…

    10 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Text to Speech - API Only  ·  Flag idea as inappropriate…  ·  Admin →
  12. Korean language support in text to speech api

    It would be nice if we could have Korean language support.

    8 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Completed  ·  6 comments  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
  13. Speech to text - Dutch

    Dutch support for the Bing Speech to Text!

    7 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
  14. Need timestamp information for speech to text

    Hello,

    Please include timestamps in your speech to text api output.

    Thank you.

    williamj

    7 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  2 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  15. Maximum request length

    There's no clear documentation on the maximum request length that the Text-to-Speech API can support. My plan was to chunk my text according to this maximum request length. Since my application is aggressive/greedy (try to max out every call), I often get a 413 Error "RequestEntityTooLarge" most of the time.

    I found this on Microsoft's web site: "The maximum amount of audio returned for a given request must not exceed 15 seconds." Which is quite useless because the length of the audio cannot be known from the client side at the time the request is generated. And I found that…

    7 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Text to Speech - API Only  ·  Flag idea as inappropriate…  ·  Admin →
  16. Provide per word timecodes on final result

    When returning results on other ASR services you get usually an array of words with a per word timecode and confidence.

    7 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  5 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  17. Support speech to text for long interview / meeting recordings

    I'm working on a podcast with my friends and we do lots of interviews. I'd like to use the speech-to-text API to convert the recordings to transcripts to make post-editing easier. However, there is a limit of the input audio file size, less than 14 seconds, according to the documentation.

    This feature would also be useful to generate transcripts of meeting recordings for searching.

    7 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  4 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  18. Speech API / Speaker Verification working together

    It would be nice to combine Speech Text with Speaker Verification. The Goal : Extract a text from a specific speaker :-)

    7 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  4 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  19. Speech to Text API - Korean language

    https://azure.microsoft.com/ko-kr/services/cognitive-services/speech/

    We know that Text to Speech is supported in Korean.
    (Text to Speech: Korean - KR / HeamiRUS)

    However, Speech to Text does not support Korean. Only English, Chinese, French, German, Italian, and Spanish are in the list. There is no Korean in the list. Please add Korean.

    6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Completed  ·  1 comment  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
  20. BOTS: AUDIO TO TEXT

    Would be great is you could use via a URL. I'm doing bots on Facebook Messenger. Facebook provides the developer the .mp4 audio URL. Would be great to send this direct to the API. It is a pain, to download, then convert, then send to API. Too slow for chat.

    6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  2 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4 5

Feedback and Knowledge Base