Microsoft

Speech Service

  1. Improve workflow for Intent recognition training

    I used the following workflow for training my intent recognition:
    1) I've a series of entities, features and patterns edited
    2) I've a series of example inputs for training
    3) all the samples have the entities marked
    4) now I train the examples
    5) execute a series of batch test cases

    The issues recommended for improvement
    - The test cases for the batch testing require character positions, startPos and endPos. I had no other option than counting these manually, which is error prone.
    - When loading the batch test cases, the feedback / error log is hard to find and…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  2. ccextractor

    Hello,

    May I recommend you use the closed-captioned text extractor tool called, "ccextractor" in order to compare the results of your Speech-to-Text service.

    The url is:

    https://www.ccextractor.org/

    Thank you.

    Regards,
    William Johnson

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  3. Support for specifying an External ID when creating a batch transcription request, which will be part of the response of the Web Hook

    It would be nice if you could specify an “extern id” when you create a batch transcription request and that the “external id” is also returned in the response of the web hook callback.

    Why? To be able to link a request to an id of a running process/workflow. For example in a durable function. The durable function (using a orchestration) looks like:
    1. Durable function send a message to the speech to text service (STTS) to create a transcription.
    2. Durable function makes a call to context.WaitForExternalEvent<string>("TranscriptionCompleted");
    3. At some point in time the STTS is finished and calls…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  4. How to set VS Code access Mic in MAC OS

    MAC OS like Catalina ask permission to access Mic from Privacy settings. I have resolved issue and add Terminal to the list and run the SDK sample code in Terminal and and Jupyter Notebook successfully. But how can I allow VS Code to run these code normally access Mic?

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  5. de vilm wil ik graag in hert deuts wat nu in het pools is

    film in het deuts wat nu in het pools is

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  6. 1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  7. Properties to define : Max Audio Recognition Time from Microphone OR Stop Recognition on silence

    Hello,

    I am doing speech recognition and am using Android SDK. I plan to move to containers in future. Stopping on silence is default as per documentation. How do i define the following the max audio time recognition time as below:

    If the user is speaking and has spoken more than 15 seconds. The sdk should automatically stop the recognizer on Android end. If the user has spoken less than 15 seconds and was silent in between then it should be based on silence detection. The speech sdk should stop the microphone on android either on silence detection (when spoken…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  8. segmentation length config for recognized result

    from https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/610#event-3282436941

    The reason and scenario of this asking is that, some time the output recognized text is too long to render friendly, e.g. mobile app with recognized text limit of 2 lines each max 20-char (or less).

    E.g.

    Utterance: I will go to bookstore this afternoon to check if any new arrivals. After that Jack will pick up me there to gym for practice. We need to prepare a match in two weeks. Dinner will be taken in gym to save commute overhead. I will arrive home around 8:30 in the evening.

    Current result from speech sdk:
    RECOGNIZED: Text=I…

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  9. Fluency format of recognized result

    from https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/598#event-3275944556

    Suggest add fluency format for scenario like formal meeting transcription, translation, etc. , which does not expect spoken text forms.

    E.g. :

    Utterance: "i want to ah, to book a flight to Denver, i mean, to Boston, the day, the day after, after Monday. "

    RECOGNIZED: "I want to are to book a flight to Denver. I mean to Boston the day, the day after after Monday."

    Expected: " I want to book a flight to Boston the day after Monday."

    Thank you.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  10. Adding custom headers to speech to text Websocket requests

    Adding the ability of adding custom headers to the speech to text sdk so that the intermediate servers can verify the headers to authenticate and authroize . This is required for container versions

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  11. Azure AD Authentication

    Support for authenticating to the Service using Azure AD to allow an alternative to keys.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  12. Language Support for Greek

    Is Greek on the roadmap? Please let me know when it is planned. If not, please add it.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  13. About the display of the character acquired by SpeechToText (SpeechSDK)

    The results obtained by the SpeechRecognizer's Recognized event are not broken by punctuation marks, and sentences are connected even if the speaker changes.
    Therefore, it is not possible to know the timing of the change of the speaker.
    I want you to improve it so that an event occurs for each punctuation mark.

    38 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  14. Show multilanguage translations on a single screen during a presentation

    Hello,
    I have the following use case (international wedding): The presenter speaks in French. I would like to show the German and Catalan live translations on a single screen for the audience. Is this possible? I know that there is the conversation feature readily available, but not everybody in the audience has a smartphone.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  15. REST API support for custom phrase lists

    REST API support for custom phrase lists

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  16. LUIS Reference Grammar ID fails West Europe when included

    For West Europe region, the service is returning "Specified grammar type is not supported!" when we pass in LUIS reference grammar ID (a.k.a. IntentRecognizer).

    This causes the speech service to fail with a "WebSocket is already in CLOSING or CLOSED state." error when the LUIS reference grammar ID is passed in. If it is not included, the service works correctly.

    May be related to this issue in the Cognitive Services Speech SDK repo: https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/127

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  17. Automatic determination of English locales

    At present, we have to determine the locale of the input language in detail, such as "Speech to Text" feature. For example, en-US, en-AU, and so on.
    Users may not know which one to choose, so it will be easier to use if they automatically recognize it from their voice.

    Please let me know if you have any prospects for the future.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  18. Norwegian language needs improvement in grammar

    Norwegian needs a grammatical rethink with regards to compound words.

    Currently in Norwegian we distinguish individual words by putting spaces between them, so for instance when I mention to the speech-to-text-service the console window ("konsollvindu" in Norwegian), it outputs it as "konsoll vindu" with a space between the words.

    This.

    Must.

    Absolutely.

    Be.

    Fixed.

    I am a linguist by degree. Please hire me to fix this if you need help, because you really do.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    5 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  19. Confidence score on word level

    The lack of a confidence score on word level feature is a show stopper for my company's project. It would be extremely useful for us to have the confidence score included within "Words" list , which consist of words and their timestamps.

    According to this answer: https://social.msdn.microsoft.com/Forums/en-US/4979ca92-aa0f-4d09-b010-fc2eeb1bde80/speech-results-confidence-score-on-word-level?forum=AzureCognitiveService#8ae67445-4e23-49ea-b694-a8d877dc2dd0
    the feature is not public and we suspect that it could be provided quickly.

    I'd be grateful for each vote for this idea!

    18 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    10 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  20. Please support spelling words

    Let's say I'm building an application where I want to know a user's name or address. Whereas I can find the address online, the name might be something unique.

    In this case, I'd like to let the user spell his name to the speech service. However, the results are not very good currently.

    I'd love if there was an option to tell the cognitive services that I'm spelling something or that I only send letters to it.

    Adding a LUIS model or custom intents to the recognizer didn't improve the results either. Very clear names always lead to some characters…

    4 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1
  • Don't see your idea?

Feedback and Knowledge Base