Microsoft

How can we improve Microsoft Cognitive Services?

Add support for Speaker Diarization for untrained speakers.

Distinguish between multiple speakers in a conversation without training the system first. IBM Watson currently supports this: https://www.ibm.com/blogs/bluemix/2017/05/whos-speaking-speaker-diarization-watson-speech-text-api/

Given an audio recording of a conversation the minimuim I'm looking for is:
Speaker 1 (0:01-0:03): Hi Ted, how are you today?
Speaker 2 (0:04-0:05): I'm doing well, how about you?
Speaker 1 (0:05-0:10): Good thanks. So the reason I called you today was to discuss your recent sales performance.

Ideally each word would be timestamped so we could highlight the spoken word when displaying the transcription next to the playing audio. Also it would be nice if each word had a confidence (0.0-1.0) associated with it.

12 votes
Sign in
(thinking…)
Sign in with: Facebook Google
Signed in as (Sign out)

We’ll send you updates on this idea

Andrew Collard shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

3 comments

Sign in
(thinking…)
Sign in with: Facebook Google
Signed in as (Sign out)
Submitting...

Feedback and Knowledge Base