Microsoft

Speech Service

  1. Any CPU build

    Currently the .NET project has to be built in either x64 or x86. Any CPU is not supported. Please change this

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  2. Don't pronounce initials as abbreviations in Dutch

    Various initials are mispronounced as abbreviations in Dutch by text-to-speech. Since Teams uses this service for the voicemail messages, this is very annoying. These are a few examples:

    b. - pronounced as bruto
    c. - pronounced as cent
    e. - pronounced as edidiet
    f. - pronounced as forte
    H. - pronounced as heilige
    i. - pronounced as in
    l. - pronounced as loco
    p. - pronounced as piano
    t. - pronounced as temperatuur
    H.M. - pronounced as Hare Majesteit (Her Majesty)
    V.J. - pronounced as vorig jaar (last year).
    G.B. - Gedaan en bieden
    B.G. - Bovengenoemde

    We would like…

    3 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  3. integration in Microsoft teams

    integration in Microsoft teams

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech Translation  ·  Flag idea as inappropriate…  ·  Admin →
  4. 1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  5. 0 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  6. Neural voices in Northern Europe

    Make neural voices available in the Northern Europe region.
    We want to use the Norwegian Neural voice in our product (based in Norway, naturally) but are for some reason locked out of this feature.

    6 votes
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  7. Rest API to accept authorization token for RecordingsUrl

    Hi, my company has been using Azure's speech to text service and are happy with the results.

    However, we've hit a snag. For the RecordingsUrl parameter, I understand we will be passing in a blob uri that's public facing (no auth required), but our audio files are stored in a 3rd party service whereby an auth bearer token is required to be passed into the header to be able to access the .wav file (request header: { Authorization: Bearer <token> }).

    For the speech to text Rest API, is it possible if we can have an additional parameter (eg. RecordingsUrlToken)…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  8. Custom Voice Portal does not have an option to add tests

    The portal says "Add a test", but there is no option to do so. Screenshot attached

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Custom Voice  ·  Flag idea as inappropriate…  ·  Admin →
  9. Améliorer la lecture des nombres pour la Suisse

    Bonjour,
    La voix fr-CH, French (Switzerland), Male, "fr-CH-Guillaume" ne lit pas les nombres comme nous le faisons en Suisse, elle le fait comme en France.
    En effet, 70 doit se dire "septante" et non "soixante-" et 90 doit se dire "nonante" et non "quatre-vingt-". En outre selon les régions 80 se dit "huitante". Ce qui en rapport à la majorité des autres langues devrait être la norme.
    Avec mes meilleures salutations,

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  10. Improve workflow for Intent recognition training

    I used the following workflow for training my intent recognition:
    1) I've a series of entities, features and patterns edited
    2) I've a series of example inputs for training
    3) all the samples have the entities marked
    4) now I train the examples
    5) execute a series of batch test cases

    The issues recommended for improvement
    - The test cases for the batch testing require character positions, startPos and endPos. I had no other option than counting these manually, which is error prone.
    - When loading the batch test cases, the feedback / error log is hard to find and…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  11. ccextractor

    Hello,

    May I recommend you use the closed-captioned text extractor tool called, "ccextractor" in order to compare the results of your Speech-to-Text service.

    The url is:

    https://www.ccextractor.org/

    Thank you.

    Regards,
    William Johnson

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  12. Support for specifying an External ID when creating a batch transcription request, which will be part of the response of the Web Hook

    It would be nice if you could specify an “extern id” when you create a batch transcription request and that the “external id” is also returned in the response of the web hook callback.

    Why? To be able to link a request to an id of a running process/workflow. For example in a durable function. The durable function (using a orchestration) looks like:
    1. Durable function send a message to the speech to text service (STTS) to create a transcription.
    2. Durable function makes a call to context.WaitForExternalEvent<string>("TranscriptionCompleted");
    3. At some point in time the STTS is finished and calls…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  13. How to set VS Code access Mic in MAC OS

    MAC OS like Catalina ask permission to access Mic from Privacy settings. I have resolved issue and add Terminal to the list and run the SDK sample code in Terminal and and Jupyter Notebook successfully. But how can I allow VS Code to run these code normally access Mic?

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  14. does TTS support Speech SDK when using containers?

    Hi, can we use Speech SDK to access TTS service in container?
    Why if using container, STT only supports SDK, and TTS only supports REST API?

    By tests, it seems REST API is slower than SDK, why? Thanks.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  15. Mbps is read as Megabytes per second instead of Megabits per second

    MBps and Mbps are very different things. MBps is 8 times larger than Mbps, so Aria needs to know the difference between the two.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  16. This chapter has many issues with Aria Neural TTS

    This textbook (Computer Security Handbook by Seymour Bosworth et al., Chapter 33) seems to cause countless errors using the Aria Neural voice. (attached txt and pdf were trimmed to respect the copyright of the author)

    It messes up the chapter markers, saying "January First, Thirty Three" when it says "33.1.1" (as well as all the other section markers)

    802.11 is pronounced "eight hundred and two point one one"

    SSIDs is pronounced "sids"

    BSSIDs is pronounced "bsids"

    2Mb/s (as well as other Mb/s numbers) is pronounced "two em bee slash ess" which should be pronounced 2 Megabits per second.

    LAN is…

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  17. 802.11 is mispronounced

    802.11 as a wireless standard should be pronounced "eight-o-two-eleven" instead of "eight hundred and two point one one"

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  18. Japanese sentence is mispronounced

    このレストランではタバコを吸ってはいけません。is mispronounced. The はいけません part should be pronounced waikemasen and not haikemasen. This is with the Nanami Neural voice.

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
  19. de vilm wil ik graag in hert deuts wat nu in het pools is

    film in het deuts wat nu in het pools is

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Speech to Text  ·  Flag idea as inappropriate…  ·  Admin →
  20. No SSML restrictions on creating TTS audio tuning files

    No SSML restrictions to one SSML elements in a SSML file, if you want to realise multiple tunings in the Audio Content Creation, like breaks, pronunciation, intonation etc.
    (source:
    Improve synthesis with the Audio Content Creation tool
    > Create an audio tuning file:
    "SSML restrictions Each SSML file can only contain a single piece of SSML."
    (https://docs.microsoft.com/en-US/azure/cognitive-services/speech-service/how-to-audio-content-creation)

    1 vote
    Sign in
    (thinking…)
    Sign in with: Facebook Google
    Signed in as (Sign out)

    We’ll send you updates on this idea

    0 comments  ·  Text to Speech  ·  Flag idea as inappropriate…  ·  Admin →
← Previous 1 3 4
  • Don't see your idea?

Feedback and Knowledge Base