Microsoft

Speech Service

  1. About the display of the character acquired by SpeechToText (SpeechSDK)

    The results obtained by the SpeechRecognizer's Recognized event are not broken by punctuation marks, and sentences are connected even if the speaker changes.
    Therefore, it is not possible to know the timing of the change of the speaker.
    I want you to improve it so that an event occurs for each punctuation mark.

    38 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  2. Possibility to talk English, but in the different foreign accents.

    Like a German/French/Spanish/Italian person speaking English, all have their own accent. Perfect for applications like Air Traffic Control etc.

    6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    2 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  3. Move Irish text to phonetic ipa conversion upstream.

    Irish (Gaeilge) tts works if correct IPA syntax is used when sending the request for tts with the IPA option included.

                                Example... 
    

    Ba chuid mhór den togra seo teacht ar ábhar a bheadh chomh maith nó níos fearr ná an fhuinseog agus an t-ábhar a bheith níos inmharthana.

                         translated to IPA..
    

    "bˠɑː xɪdʲ woːˈr dʲəɴʲ tʲɔɡˈrə ʃoː tʃæxt eˈr ɑːwəˈr ɑː vʲəh xɔv mˠɑːh ɴˠoː ɴʲiːsˠ fʲæˈrr ɴˠɑː en ɪɴʲʃoːɡ əɡʊsˠ en tʲɑːwəˈr ɑː vʲeɪh ɴʲiːsˠ ɪɴʲwəˈrhɑːɴˠɑː"

    The IPA (irish) text can be read and spoken accurately by Neural EN-GB or EN-US voice.

    I have attached a file that converts…

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  4. Custom Voice Portal does not have an option to add tests

    The portal says "Add a test", but there is no option to do so. Screenshot attached

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  5. Améliorer la lecture des nombres pour la Suisse

    Bonjour,
    La voix fr-CH, French (Switzerland), Male, "fr-CH-Guillaume" ne lit pas les nombres comme nous le faisons en Suisse, elle le fait comme en France.
    En effet, 70 doit se dire "septante" et non "soixante-" et 90 doit se dire "nonante" et non "quatre-vingt-". En outre selon les régions 80 se dit "huitante". Ce qui en rapport à la majorité des autres langues devrait être la norme.
    Avec mes meilleures salutations,

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  6. en-GB Neural voice Mia pronounces number 4 too quickly

    En-GB mia neural voice pronounces the number 4 too quickly in sentences. Whether it is the digit or the word, same problem. I’m able to workaround this by adjusting the prosody speed of just that number :)

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  7. Confidence score on word level

    The lack of a confidence score on word level feature is a show stopper for my company's project. It would be extremely useful for us to have the confidence score included within "Words" list , which consist of words and their timestamps.

    According to this answer: https://social.msdn.microsoft.com/Forums/en-US/4979ca92-aa0f-4d09-b010-fc2eeb1bde80/speech-results-confidence-score-on-word-level?forum=AzureCognitiveService#8ae67445-4e23-49ea-b694-a8d877dc2dd0
    the feature is not public and we suspect that it could be provided quickly.

    I'd be grateful for each vote for this idea!

    18 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    10 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  8. Retrain a previously trained model on custom speech portal

    I had previously trained a custom speech model and I trying to retrain that model but I am not seeing an option to retrain it, It only gives me an option to train the baseline model.

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  9. If it is possible to identify sounds into characters, then it is good for other language developers to map it to the corresponding words.

    If it is possible to identify sounds into characters, then it is good for other language developers to map it to the corresponding words.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Custom Speech  ·  Flag idea as inappropriate…  ·  Admin →
  10. BUG - The TTS engine (in Engish) doesn't pronounce well numbers after words. COVID-19, NASDAQ 100, S&P 500, NIKKEI 225

    This is a bug not a feature. covid-19 sounds like covid 19. NASDAQ 100
    S&P500
    etc.
    For the neural voices

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  11. Improve workflow for Intent recognition training

    I used the following workflow for training my intent recognition:
    1) I've a series of entities, features and patterns edited
    2) I've a series of example inputs for training
    3) all the samples have the entities marked
    4) now I train the examples
    5) execute a series of batch test cases

    The issues recommended for improvement
    - The test cases for the batch testing require character positions, startPos and endPos. I had no other option than counting these manually, which is error prone.
    - When loading the batch test cases, the feedback / error log is hard to find and…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  12. Have scripts ready for voice input when creating a custom voice.

    To create a new voice, have scripts ready to be read by the user. As the computer recognizes the user's voice, the sound is recorded and synthesized into a custom voice for the user.

    2 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  13. ccextractor

    Hello,

    May I recommend you use the closed-captioned text extractor tool called, "ccextractor" in order to compare the results of your Speech-to-Text service.

    The url is:

    https://www.ccextractor.org/

    Thank you.

    Regards,
    William Johnson

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  14. Support for specifying an External ID when creating a batch transcription request, which will be part of the response of the Web Hook

    It would be nice if you could specify an “extern id” when you create a batch transcription request and that the “external id” is also returned in the response of the web hook callback.

    Why? To be able to link a request to an id of a running process/workflow. For example in a durable function. The durable function (using a orchestration) looks like:
    1. Durable function send a message to the speech to text service (STTS) to create a transcription.
    2. Durable function makes a call to context.WaitForExternalEvent<string>("TranscriptionCompleted");
    3. At some point in time the STTS is finished and calls…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  15. How to set VS Code access Mic in MAC OS

    MAC OS like Catalina ask permission to access Mic from Privacy settings. I have resolved issue and add Terminal to the list and run the SDK sample code in Terminal and and Jupyter Notebook successfully. But how can I allow VS Code to run these code normally access Mic?

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  16. 1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    1 comment  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  17. does TTS support Speech SDK when using containers?

    Hi, can we use Speech SDK to access TTS service in container?
    Why if using container, STT only supports SDK, and TTS only supports REST API?

    By tests, it seems REST API is slower than SDK, why? Thanks.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  18. Mbps is read as Megabytes per second instead of Megabits per second

    MBps and Mbps are very different things. MBps is 8 times larger than Mbps, so Aria needs to know the difference between the two.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  19. This chapter has many issues with Aria Neural TTS

    This textbook (Computer Security Handbook by Seymour Bosworth et al., Chapter 33) seems to cause countless errors using the Aria Neural voice. (attached txt and pdf were trimmed to respect the copyright of the author)

    It messes up the chapter markers, saying "January First, Thirty Three" when it says "33.1.1" (as well as all the other section markers)

    802.11 is pronounced "eight hundred and two point one one"

    SSIDs is pronounced "sids"

    BSSIDs is pronounced "bsids"

    2Mb/s (as well as other Mb/s numbers) is pronounced "two em bee slash ess" which should be pronounced 2 Megabits per second.

    LAN is…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  20. 802.11 is mispronounced

    802.11 as a wireless standard should be pronounced "eight-o-two-eleven" instead of "eight hundred and two point one one"

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4
  • Don't see your idea?

Feedback and Knowledge Base