For example 2 minutes audio file. First 30 seconds Speaker A, then Speaker B from 30 to 1.30 and then again speaker A from 1.30 to 2 mins.35 votes
You can now do this type of recognition using this public sample application: https://github.com/Microsoft/Cognitive-SpeakerRecognition-Windows/tree/master/Streaming.
In the future, we plan to fully support this scenario from the service side to avoid sending too many requests from the client side, and to include speech recognition results as well. This means that you’ll get a response stating who the speaker is, and what is being said.
Provided by a fellow developer. Leave a comment below and let us know other customization options/parameters for the verification phrase list you'd like to see introduced.3 votes
This is not currently supported, but it is on our feature list for a future release.
I would love to create a pug-in for my home automation. which already uses Kinects, that can utilize the speaker Identification from Oxford. Main issue is most statements are short - ie: Computer, turn on family room light. So I never generate a 20 Second clip - Recognition with at least a 5 second clip or so would be great, even if recognition is only say 80% accurate for this case....7 votes
We have release a new feature that allows you to waive the audio limit. Just add “ShortAudio” parameter to instruct the service to waive the recommended minimum audio limit needed for enrollment. Set value to “true” to force enrollment using any audio length.
More details can be found here,
- Don't see your idea?