Is there a way to stream audio via WebSocket and get Speech to Text results AND get a copy of the recording on Azure Storage?
Right now we have multiple recorders that operate in the browser, Flash, WebRTC, HTML5.
Each of these has to connect to Bing Speech to Text to get realtime translation and LUIS results to drive actions in the application. Additionally we are currently streaming the audio to Amazon S3. Ideally we would like to stream the audio only once, and have it picked up by Microsoft from Speech to Text AND be able to retrieve a URL for later use.
Having to maintain two streams has led to a number of problems where one isn't a consistent recording, and the transcription we get from Bing doesn't match what we have in our recording.
We would use this recording for review in the application, and as a possible training cohort to improve Speech to Text results using Custom Dictionary and Custom Acoustic Models.
Is this possible currently?
If not, is this on Microsoft's roadmap?
This is a big enough problem for us that we are considering moving compute/AI platforms one way or the other.
If you use the Custom Speech Service and the matching SDK (https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/) you can get the audio stream with the translation result.
Currently it is not possible to get LUIS, and translation, and audio results from one connection.
We are moving the different service to the same endpoint, so that should be available in the future, although this feature isn't currently in the 'work list'