Microsoft

How can we improve Speech service?

confidence number value per word or per speech fragment

I am doing a POC with speech recognition for long speeches.
https://docs.microsoft.com/de-de/azure/cognitive-services/speech/concepts#recognition-modes

The recognition mode "conversation" with format "detailed" delivers message responses of type "SpeechPhrase" including confidence value.

The recognition mode "dictation" with format "detailed" delivers message responses of type "SpeechFragment" and "SpeechPhrase" (including confidence value). But the fragments do not contain any information about confidence value.
With the C# service library and the recognition mode "dictation" you'll get partial results with a confidence value (enum). But this is not our desired solution, because the confidence value seems to belong to the whole phrase (Confidence: Indicates the level of confidence of a recognized phrase., https://cdn.rawgit.com/Microsoft/Cognitive-Speech-STT-ServiceLibrary/master/docs/html/9d706b3a-8d1f-ba71-d628-fff00928c72d.htm)

The recognition mode "interactive" is not optimized for long speeches.

A confidence number value per word or per speech fragment would be very interesting for us. Because with this confidence value it would be possible to get self-assessments of the Microsoft speech service, if it recognizes the word or the fragment correctly or not. Unfortunately I didn't found a such possibility.

Thanky you for a short answer, if there are any solution supported by Microsoft speech service.

1 vote
Sign in
(thinking…)
Password icon
Signed in as (Sign out)

We’ll send you updates on this idea

AdminLuke Bayler (Community Manager, Microsoft Cognitive Services) shared this idea  ·   ·  Flag idea as inappropriate…  ·  Admin →

0 comments

Sign in
(thinking…)
Password icon
Signed in as (Sign out)
Submitting...

Feedback and Knowledge Base