Is there a way to stream audio via WebSocket and get Speech to Text results AND get a copy of the recording on Azure Storage?
We are currently using Bing Speech with LUIS, but looking to convert to Speech service.
Right now we have multiple recorders that operate in the browser, Flash, WebRTC, HTML5.
Each of these has to connect to Bing Speech to Text to get realtime translation and LUIS results to drive actions in the application. Additionally we are currently streaming the audio to Amazon S3. Ideally we would like to stream the audio only once, and have it picked up by Microsoft from Speech to Text AND be able to retrieve a URL for later use.
Having to maintain two streams has led to a number of problems where one isn't a consistent recording, and the transcription we get from Bing doesn't match what we have in our recording.
We would use this recording for review in the application, and as a possible training cohort to improve Speech to Text results using Custom Dictionary and Custom Acoustic Models.
Is this possible currently?
If not, is this on Microsoft's roadmap?
This is a big enough problem for us that we are considering moving compute/AI platforms one way or the other, especially as we have to reimplement to Speech Service by end of year.
This is a quite similar use case I am also looking for my app. It would be really helpful to know any information about the same