Speaker Recognition
Welcome to the Speaker Recognition Forum
Categories
API – Any ideas or feedback pertaining to features or enhancements to Speaker Recognition API.
Documentation – Any ideas or suggestions for the API Reference or Documentation.
Language Support – Submit a request to have a particular language supported.
Samples & SDK Request – Let us know if you would like to see a Code sample or SDK provided.
Attention!
We have moved our Customer Feedback & Ideas for Azure Cognitive Services portal to the Azure Feedback Forum.
-
Speaker diarization for more than 2 speakers
Speaker diarization for more than 2 speakers.
I dont feel this should be marked as resolved. Would expect support for at least 10 speakers. Additionally its currently really poor and switches between speaker 1 and 2 almost randomly. Please make this more intelligent. Its a deal breaker for us and I'm sure many others. Especially considering the google alternative can handle unlimited speakers and is far more accurate at identifying them.
https://cloud.google.com/speech-to-text/docs/multiple-voices
And no... expecting a sample to train it for each voice is not an option. We literally just need it to assign a number…
3 votes -
Profile Limit
The REFERENCE API: https://westus.dev.cognitive.microsoft.com/docs/services/563309b6778daf02acc0a508/operations/5645c068e597ed22ec38f42e
indicates that you can only create up to 1000 profiles, that is, only 1000 people can interact with my application? What happens if I need 1 or 2 million people? Is there any update about this?1 vote -
What languages does this service support?
The truth would be good to add support for Spanish.
4 votes -
Add Portuguese (pt-br) support
Portuguese from Brazil (pt-br) is one of the principal language on softwares. Plans for that? Anyone Date/Deadline?
2 votes -
Support Spanish language on Speaker Recognition API
Can you please specify when does Spanish language support is expected to be released for the Verification profile feature?
2 votes -
Return Confidence Score
Currently with the GET Operations Status API call, confidence is returned as "High", Normal", or "Low".
Having the actual Confidence Score (0-1 i real numbers) returned would be much more useful than an arbitrary value.
GitHub Issue connected with this: https://github.com/MicrosoftDocs/azure-docs/issues/30221
2 votes -
speaker verification demo
the demo for speaker verification, https://azure.microsoft.com/en-us/services/cognitive-services/speaker-recognition/, is great! Love the web-based aspect. The demo has a link that says 'want to build this', and that link takes you to the SDK docs, with no real info about how to build your sample app. I want to see the code for your demo! So a web based client communicating audio collected from a browser and sent to a server that calls the speaker veridication SDK. All your samples on github seem to be WPF based. I need a web based client talking to a (C#) server that calls you C#…
4 votes -
Wrong recognition
Hi, do you recommend a way of speaking or recording the audio, since we tried testing the speaker verification api, it is accepting another's person voice(acceptance rate:normal) which not supposed to be..and its very hard to get an acceptance rate of high..are there plans to make the confidence level into percentage? thank you.
2 votes -
Provide iPhone application to use Speaker Recognition API
It will be helpful if you provide the sample application for iPhone also.
0 votes -
1 vote
-
Korean language support for Speaker recognition
Please support Korean speaker recognition. I would like to support Korean.
1 vote -
Add support for Speaker Diarization for untrained speakers.
Distinguish between multiple speakers in a conversation without training the system first. IBM Watson currently supports this: https://www.ibm.com/blogs/bluemix/2017/05/whos-speaking-speaker-diarization-watson-speech-text-api/
Given an audio recording of a conversation the minimuim I'm looking for is:
Speaker 1 (0:01-0:03): Hi Ted, how are you today?
Speaker 2 (0:04-0:05): I'm doing well, how about you?
Speaker 1 (0:05-0:10): Good thanks. So the reason I called you today was to discuss your recent sales performance.Ideally each word would be timestamped so we could highlight the spoken word when displaying the transcription next to the playing audio. Also it would be nice if each word had a…
13 votesWe now have speaker diarization/separation option available https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription
Please let us know if any feedbacks. Thanks! -
Process speaker identification immediately for short audio samples
First off, this is an awesome API that I would love to use in my app. The big problem I have, though, is that it's not really usable for real-time, low latency identification from short samples because:
1. The asynchronous callback method requires me to make constant polls to the operation result endpoint, which takes (from my measurement) about 1200ms in the ideal case, whereas I would really prefer results within 400-500 ms.- Each poll on the operation status costs me QPS, which triggers throttling if I poll to often
I would propose the following change to the speaker identification…
2 votes -
How to reduce response time for identification requests?
When testing in python, each identification request would take eight to nine seconds to get a response. Is this due to the Internet or the identification model processing itself would take that long? And is there any way to get a response faster? Thank you.
2 votes -
API support to know who spoke what
I am trying to build a system that has should be able to recognize all the speakers and the speeches each speaker has spoken.
I was trying to build the solution using “Speaker Recognition API”. I am passing the voice/audio stream to identification API and able to know who are the speakers there; but didn’t find a way to know who spoke what.
Is there any way to know who spoke what using “Speaker Recognition API” as it is required for my solution?
Reference to any other APIs Microsoft is building will be really helpful5 votes -
Speaker Identication Apis
Operation status api always return status failed and message Speaker Invalid, please give the solution to this problem. audio are recorded exactly same as specifies the document.
{"status":"failed","createdDateTime":"2018-05-25T09:07:19.4685571Z","lastActionDateTime":"2018-05-25T09:07:20.3782489Z","message":"SpeakerInvalid"}
2 votes -
Solution to many of your APIs
Rather than offer the APIs at Microsoft, then send the user to GitHub, then hope the user can follow the various installation processes/steps, simply allow the user to download directly from Microsoft and include the newly generated key in the downloadable source code. This way, you control the entire process and don't have to worry about unzipping, npm installs, key issues etc.
2 votes -
Add support for italian on speaker recognition api
Please add the italian language. Thanks for your support.
2 votes -
Please add support for Danish language
I know Denmark is a small country, but there are still a need for support in Danish
1 vote -
Please add "How To" in the documentation
The "How To" aspect of Speaker Recognition API documentation is missing. The "How To" documentation for Face API is well documented in a step by step fashion. However its missing here for the Speaker Recognition API.
Second part of my question is - can I send the *.wav file directly to the endpoint URL when using the API or should it be converted to multipart-form data or application octet-stream data?8 votes
- Don't see your idea?