How can we improve Speech service?

Is there a way to stream audio via WebSocket and get Speech to Text results AND get a copy of the recording on Azure Storage?

We are currently using Bing Speech with LUIS, but looking to convert to Speech service.

Right now we have multiple recorders that operate in the browser, Flash, WebRTC, HTML5.

Each of these has to connect to Bing Speech to Text to get realtime translation and LUIS results to drive actions in the application. Additionally we are currently streaming the audio to Amazon S3. Ideally we would like to stream the audio only once, and have it picked up by Microsoft from Speech to Text AND be able to retrieve a URL for later use.

Having to maintain two streams has led to a number of problems where one isn't a consistent recording, and the transcription we get from Bing doesn't match what we have in our recording.

We would use this recording for review in the application, and as a possible training cohort to improve Speech to Text results using Custom Dictionary and Custom Acoustic Models.

Is this possible currently?

If not, is this on Microsoft's roadmap?

This is a big enough problem for us that we are considering moving compute/AI platforms one way or the other, especially as we have to reimplement to Speech Service by end of year.

1 vote
Sign in
Password icon
Signed in as (Sign out)

We’ll send you updates on this idea

Robert Janssen shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

1 comment

Sign in
Password icon
Signed in as (Sign out)
  • Akash commented  ·   ·  Flag as inappropriate

    This is a quite similar use case I am also looking for my app. It would be really helpful to know any information about the same

Feedback and Knowledge Base