Microsoft

Speaker Recognition

Welcome to the Speaker Recognition Forum

Categories

API – Any ideas or feedback pertaining to features or enhancements to Speaker Recognition API.

Documentation – Any ideas or suggestions for the API Reference or Documentation.

Language Support – Submit a request to have a particular language supported.

Samples & SDK Request – Let us know if you would like to see a Code sample or SDK provided.


  • Hot ideas
  • Top ideas
  • New ideas
  • My feedback
  1. Recognize multiple speakers in audio file and when they speak

    For example 2 minutes audio file. First 30 seconds Speaker A, then Speaker B from 30 to 1.30 and then again speaker A from 1.30 to 2 mins.

    35 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    8 comments  ·  Speaker Identification  ·  Flag idea as inappropriate…  ·  Admin →
  2. Add support for Speaker Diarization for untrained speakers.

    Distinguish between multiple speakers in a conversation without training the system first. IBM Watson currently supports this: https://www.ibm.com/blogs/bluemix/2017/05/whos-speaking-speaker-diarization-watson-speech-text-api/

    Given an audio recording of a conversation the minimuim I'm looking for is:
    Speaker 1 (0:01-0:03): Hi Ted, how are you today?
    Speaker 2 (0:04-0:05): I'm doing well, how about you?
    Speaker 1 (0:05-0:10): Good thanks. So the reason I called you today was to discuss your recent sales performance.

    Ideally each word would be timestamped so we could highlight the spoken word when displaying the transcription next to the playing audio. Also it would be nice if each word had a…

    12 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    8 comments  ·  Speaker Identification  ·  Flag idea as inappropriate…  ·  Admin →
  3. Please Provide Sample Android app to use Speaker Recognition API

    It would be helpful if you provide Sample to use Speaker Recognition API like how you provided for Face Verification /detection samples for android?

    11 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    8 comments  ·  Samples & SDK Requests  ·  Flag idea as inappropriate…  ·  Admin →
  4. Add support for Spanish on speaker recognition api?

    currently only english is available on speaker recognition, can you add spanish language for Argentina, Mexico, Spain?

    10 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    4 comments  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
    Planned  ·  Luke Bayler responded

    Hello,

    Spanish for Speaker Identification is planned for a future release.

    Thanks,
    Luke

  5. More than 10 speakers to identify

    Is there any plans to later support more than 10 profiles in the requests to identify speaker voices?

    I am curious as to the technical limitations of the identification service. Is there plans in the future to make this a reliable service to make voice signature profiles out of?

    9 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    3 comments  ·  API  ·  Flag idea as inappropriate…  ·  Admin →
  6. Please add "How To" in the documentation

    The "How To" aspect of Speaker Recognition API documentation is missing. The "How To" documentation for Face API is well documented in a step by step fashion. However its missing here for the Speaker Recognition API.
    Second part of my question is - can I send the *.wav file directly to the endpoint URL when using the API or should it be converted to multipart-form data or application octet-stream data?

    8 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speaker Verification  ·  Flag idea as inappropriate…  ·  Admin →
  7. Give percentage match instead of categorizing them to High,Medium or Low

    Instead of categorizing the speaker Identification api response to High,medium or low give the percentage match the service has given. The user should decide what should be a cut off for a potential match.

    7 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speaker Identification  ·  Flag idea as inappropriate…  ·  Admin →
    Planned  ·  Luke Bayler responded

    Hello,

    This is currently planned for a future release.

    Thanks,
    Luke

  8. Speaker Recognition with shorter phrase?

    I would love to create a pug-in for my home automation. which already uses Kinects, that can utilize the speaker Identification from Oxford. Main issue is most statements are short - ie: Computer, turn on family room light. So I never generate a 20 Second clip - Recognition with at least a 5 second clip or so would be great, even if recognition is only say 80% accurate for this case....

    7 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speaker Identification  ·  Flag idea as inappropriate…  ·  Admin →
    Completed  ·  Raymond responded

    We have release a new feature that allows you to waive the audio limit. Just add “ShortAudio” parameter to instruct the service to waive the recommended minimum audio limit needed for enrollment. Set value to “true” to force enrollment using any audio length.

    More details can be found here,
    - https://dev.projectoxford.ai/docs/services/563309b6778daf02acc0a508/operations/5645c3271984551c84ec6797
    - https://dev.projectoxford.ai/docs/services/563309b6778daf02acc0a508/operations/5645c523778daf217c292592

  9. API support to know who spoke what

    I am trying to build a system that has should be able to recognize all the speakers and the speeches each speaker has spoken.
    I was trying to build the solution using “Speaker Recognition API”. I am passing the voice/audio stream to identification API and able to know who are the speakers there; but didn’t find a way to know who spoke what.
    Is there any way to know who spoke what using “Speaker Recognition API” as it is required for my solution?
    Reference to any other APIs Microsoft is building will be really helpful

    5 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  API  ·  Flag idea as inappropriate…  ·  Admin →
  10. What languages does this service support?

    The truth would be good to add support for Spanish.

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
  11. speaker verification demo

    the demo for speaker verification, https://azure.microsoft.com/en-us/services/cognitive-services/speaker-recognition/, is great! Love the web-based aspect. The demo has a link that says 'want to build this', and that link takes you to the SDK docs, with no real info about how to build your sample app. I want to see the code for your demo! So a web based client communicating audio collected from a browser and sent to a server that calls the speaker veridication SDK. All your samples on github seem to be WPF based. I need a web based client talking to a (C#) server that calls you C#…

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Samples & SDK Requests  ·  Flag idea as inappropriate…  ·  Admin →
  12. 4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Samples & SDK Requests  ·  Flag idea as inappropriate…  ·  Admin →
    Under Review  ·  Luke Bayler responded

    Hello,

    Xamarin support is not currently being planned for a future release, but it will be considered.

    Thanks,
    Luke

  13. French language support for speaker recognition

    Do you have plan to add support for french language in the speaker recognition API?

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
    Planned  ·  Luke Bayler responded

    Hello,

    French for Speaker Identification is currently planned for a future release.

    Thanks,
    Luke

  14. Include more locals

    Currently only supported locale is en-US and my regular accent is Asian. The APIs are having hard time recognising my accent probably due to locale? I would like to see more locale added.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
    Under Review  ·  Luke Bayler responded

    Hello,

    We are currently focusing on including the 9 fundamental locales, and then will start looking into the others, such as English accents.

    Thanks,
    Luke

  15. Define accuracy level of each request to accept or reject the outcome

    It has been observed on rigorous testing that the accuracy level of speaker identification api is not that handsome. It is giving erroneous output in both positive and negative scenarios. In the positive scenario when same user with different voice samples are tested against each other its not giving expected results many times. For negative scenario when different users voice samples are tested against each other it wrongly identifies as same user on many occasions. I feel adding a accuracy level of the output received will help the end user to an extend to decide whether to accept the outcome…

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    Resolved  ·  1 comment  ·  Speaker Identification  ·  Flag idea as inappropriate…  ·  Admin →
  16. I'd like the ability to customize my own verification phrase list

    Provided by a fellow developer. Leave a comment below and let us know other customization options/parameters for the verification phrase list you'd like to see introduced.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    5 comments  ·  Speaker Verification  ·  Flag idea as inappropriate…  ·  Admin →
    Planned  ·  Luke Bayler responded

    Hello,

    This is not currently supported, but it is on our feature list for a future release.

    Thanks,
    Luke

  17. Speaker diarization for more than 2 speakers

    Speaker diarization for more than 2 speakers.

    See this one: https://cognitive.uservoice.com/forums/555925-speaker-recognition/suggestions/34823824-add-support-for-speaker-diarization-for-untrained

    I dont feel this should be marked as resolved. Would expect support for at least 10 speakers. Additionally its currently really poor and switches between speaker 1 and 2 almost randomly. Please make this more intelligent. Its a deal breaker for us and I'm sure many others. Especially considering the google alternative can handle unlimited speakers and is far more accurate at identifying them.

    https://cloud.google.com/speech-to-text/docs/multiple-voices

    And no... expecting a sample to train it for each voice is not an option. We literally just need it to assign a number…

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speaker Identification  ·  Flag idea as inappropriate…  ·  Admin →
  18. Add Portuguese (pt-br) support

    Portuguese from Brazil (pt-br) is one of the principal language on softwares. Plans for that? Anyone Date/Deadline?

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
  19. Support Spanish language on Speaker Recognition API

    Can you please specify when does Spanish language support is expected to be released for the Verification profile feature?

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Language Support  ·  Flag idea as inappropriate…  ·  Admin →
  20. Return Confidence Score

    Currently with the GET Operations Status API call, confidence is returned as "High", Normal", or "Low".

    Having the actual Confidence Score (0-1 i real numbers) returned would be much more useful than an arbitrary value.

    GitHub Issue connected with this: https://github.com/MicrosoftDocs/azure-docs/issues/30221

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  API  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3
  • Don't see your idea?

Feedback and Knowledge Base