Add support for Speaker Diarization for untrained speakers.
Distinguish between multiple speakers in a conversation without training the system first. IBM Watson currently supports this: https://www.ibm.com/blogs/bluemix/2017/05/whos-speaking-speaker-diarization-watson-speech-text-api/
Given an audio recording of a conversation the minimuim I'm looking for is:
Speaker 1 (0:01-0:03): Hi Ted, how are you today?
Speaker 2 (0:04-0:05): I'm doing well, how about you?
Speaker 1 (0:05-0:10): Good thanks. So the reason I called you today was to discuss your recent sales performance.
Ideally each word would be timestamped so we could highlight the spoken word when displaying the transcription next to the playing audio. Also it would be nice if each word had a confidence (0.0-1.0) associated with it.

We now have speaker diarization/separation option available https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription
Please let us know if any feedbacks. Thanks!
8 comments
-
luqman.hussain commented
If you come across this page and need diarization support for more than 2 speakers then please vote here: https://cognitive.uservoice.com/forums/555925-speaker-recognition/suggestions/40702504-speaker-diarization-for-more-than-2-speakers
-
luqman.hussain commented
Within one sentence it switches from speaker 1 to 2. Randomly. Even though one is male and the other female.
Its a shame because the actual transcribing is good but unless we can seperate speaker than the output is too messy.
-
luqman.hussain commented
How can you say this is resolved. It only supports two voices and has incredibly bad accuracy. Especially when considering the google alternative supports unlimited speakers. https://cloud.google.com/speech-to-text/docs/multiple-voices
Come on guys!
-
jasonjas commented
your website help me to find quickly this product I want to thank this owner
thank you so much I am looking for <a href="https://www.speakerlovers.com/">Best Budget Powered Speakers for Turntable</a> -
Anonymous commented
your website is very nice and helpful
<a href="https://www.speakerlovers.com/">best budged power speaker for a turntable</a>
-
Anonymous commented
very nice blog
-
Gurucharan Subramani commented
Hi, when I try speaker diarization feature with below request body,
{
"recordingsUrl": "myurl.mp3",
"models": [],
"locale": "en-US",
"name": "Name",
"description": "Description",
"properties": {
"AddWordLevelTimestamps" : "True",
"AddDiarization" : "True"
}
}I get a 400 response with below error msg.
{
"code":"InvalidPayload",
"message":"This locale does not support diarization."
}I get the same error on existing Speech Services instance and on a new one as well. It was in West US Region (if it matters).
-
Andrew Khazanovich commented
Any update on support for this. This is looking like a deal breaker for utilizing Cognitive Services for our transcription needs.