Microsoft

Bing Speech

Welcome to the Bing Speech Forum

The Cognitive Service's Speech Service is replacing Bing Speech. Please refer to their forum for speech product feedback


Categories

Documentation – Any ideas or suggestions for the API Reference or Documentation.

Language Support – Submit a request to have a particular language supported.

Samples & SDK Requests – Let us know if you would like to see a tutorial or sample provided.

Speech to Text – API & SDK – Ideas and feature requests to Speech Recognition and Speech to Text (STT).

Text to Speech – Ideas and feature requests for Text to Speech (TTS) – API only


                               Attention!




We have moved our Customer Feedback & Ideas for Azure Cognitive Services portal to the Azure Feedback Forum.





Please go to the link below to access our new Feedback and Ideas Page.


  • Hot ideas
  • Top ideas
  • New ideas
  • My feedback
  1. Support low-latency Opus audio for speech recognition

    I am super happy to hear that Opus audio can be used for uploading speech to the speech-to-text API. However, I have a concern: because of the way Ogg pages + framing works, Opus packets are buffered for several seconds before being sent on the stream. This makes OggOpus useless for real-time speech transcription (though for the REST API it is fine).
    For real-time transcription over websocket I would appreciate an Opus protocol that works around the ogg buffering issue, for example by using RTP headers or a custom size prefix scheme that frames the raw Opus packets. I have…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  2. Add word timings, IPA syllables, confusion network to speech reco response

    I know you have this information available in the speech decoder; can you please expose it via the public API?
    - The list of phrase elements (words) and their timestamps within the audio stream
    - IPA phonemes for each phrase element
    - Confusion network output from the lattice

    Right now I am forced to reconstruct / approximate this data after the fact and it would be 1000x easier if the API could just give it to me.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  3. 1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  4. speech recognition

    Speech Recognition should transcribe spelling...

    Suppose I say the spelling of a word, I would like the response to be the letters corresponding to that word. For example, if I say 'double you A tee ee or', it should return 'w a t e r'

    Is this already available in Bing Speech?

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  5. websocket refuse handshake

    websocket refuse handshake close code = -1.
    Can you help me solve this question?
    Thanks.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  6. Is there a way to stream audio via WebSocket and get Speech to Text results AND get a copy of the recording on Azure Storage?

    Right now we have multiple recorders that operate in the browser, Flash, WebRTC, HTML5.

    Each of these has to connect to Bing Speech to Text to get realtime translation and LUIS results to drive actions in the application. Additionally we are currently streaming the audio to Amazon S3. Ideally we would like to stream the audio only once, and have it picked up by Microsoft from Speech to Text AND be able to retrieve a URL for later use.

    Having to maintain two streams has led to a number of problems where one isn't a consistent recording, and the transcription…

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  7. Internal Server Error occurs.

    I returned the results when I ran a few days ago, but today I get Internal Server Error and do not return any results. Please confirm.

    Bing Speech REST api

    internal class BingSpeechHelper

    {
    
    private const string INTERACTIVE = "interactive";
    private const string CONVERSATION = "conversation";
    private const string DICTATION = "dictation";

    private const string LANGUAGE = "en-US";
    private readonly string _requestUri;

    public BingSpeechHelper()
    {
    //&format=detailed
    _requestUri =
    $@&quot;<a rel="nofollow noreferrer" href="https://speech.platform.bing.com/speech/recognition/">https://speech.platform.bing.com/speech/recognition/</a>{
    INTERACTIVE}/cognitiveservices/v1?language={
    LANGUAGE}&quot;;

    }

    public async Task&lt;string&gt; GetTextFromAudioAsync(string recordedFilename)
    {
    var file = await ApplicationData.Current.LocalFolder.GetFileAsync(recordedFilename);

    using (var fileStream = new FileStream(file.Path, FileMode.Open, FileAccess.Read))
    {
    using (var client = new…

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  8. Make a microsoft flow / powerapps connector

    Allow support for all of the requests. (Speech to text, text to speech, etc). Integrate with flow!

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  9. Bond.IO DLL error on latest version of Bing.Speech assembly

    Version 2.0.2 of this installs Bond assemblies version 7.0.1, however when using "RecognizeAsync" it looks for version 1.0.0.0 of the Bond.IO.Dll which obviously doesn't exist. This is easily reproducible by taking the SpeechClientSample and updating the Microsoft.Speech.Bing nuget package to the latest 'Stable' build.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  10. How can I disabled the punctuation in recognized result when I use Bing speech recognise?

    How can I disabled the punctuation in recognized result when I use Bing speech recognise? Can I turn off the function?

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  11. Support for different PCM sample rates (8kHz) in CreateSpeechRecognizerWithStream

    If you create a recognizer with CreateSpeechRecognizerWithStream, you need to provide an AudioInputStreamFormat. This class only supports 16000 for SamplesPerSec. I'd like to have support for 8000 to be able to hook directly to a VoIP call. I was able to get it working with nAudio, but it needs an inline transcode, which can be expensive.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  12. Support for Xamarin

    Hi,
    I wanted to know if you could add a Speech Client Library for Xamarin with features such as intermediate results during recognition.
    Thanks!

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  13. InitialSilenceTimeout error

    When sending a request I get the InitialSilenceTimeout error with Duration and Offset both 0. That would indicate there is a problem with my audio file but that couldn't be true! I'm using WAV files with 16 bit PCM encoding and 16 kHz sampling rate. Could someone please tell me what could be the problem and/or point to a speech file or database that is proven to be working. Thank you!

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  14. I downloaded files from Github, ran npm install etc, but just kept receiving "Speech Recognition SDK not found"

    Followed GitHub's direction for the download and install, got a new key from you, entered the key in each of the scripts that required keys. But continued to receive the same "Speech Recognition SDK not found" message even though I can see it in the directory as GitHub states. By the way, your 7 day FREE is lame.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  15. I need to capture speaker voice immediately and feed it as input to the API instead of recording, converting to .wav, saving..etc

    Hi, I am using Bing Speech API. For the Speech to Text I need to capture speaker voice immediately and feed it as input to the API instead of recording, converting to .wav mono 16-Bit 17 Khz format, saving....etc. We need user to speak and then program to capture speech immediately and pass it to the API.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  16. en-in Language low quality mainly in bidirectional conversations and noises

    We have experienced very low quality/accuracy in en-in India English Language base models, mainly on the bidirectional conversations with noises, over-lap conversations, etc., mainly from the call centre audios, phone calls, mobile conversations.

    uni-directional/one-way conversations like demo/webinar/presentations quality/accuracy is better as compared with the en-us USA English.

    Early adaption of this service is being most awaited for our business requirements, willing to share the insights/sample audios files for analysis and improvements.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  0 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  17. block profane speech

    Please provide an option to block profane speech in Bing Speech-to-Text.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  18. Diagnostic messages when responding

    I have been unable to implement the system in Java because there are no diagnostic messages to even give a hint as to where the streaming data I am sending is wrong. Hence I have given up and will use two other different APIs, both of which work.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  2 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  19. ARM64 version of libandroid_platform.so is needed

    Your SDK cannot be used with the vast majority of current Android hardware.

    Your team has commented elsewhere (Github?) that you intend to open source this portion of the product rather than provide prebuilt binaries. If that's true could you give a date for when it will happen?

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  1 comment  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
  20. Need timestamp information for speech to text

    Hello,

    Please include timestamps in your speech to text api output.

    Thank you.

    williamj

    7 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Under Review  ·  2 comments  ·  Speech to Text - API & SDK  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3

Feedback and Knowledge Base