Currently Azure Media Indexer (backend Speech-to-Text processor of Video Indexer) can output Confidence data in VTT file line by line. Could you please merge the confidence data into the output VTT file from Video Indexer?1 vote
I uploaded a video and its stuck at 65%. Any one can help here?1 vote
Please refer to our support via visupport[at]microsoft[dot]com
It would be fantastic to be able to bulk upload and then bulk download selected video clips in the web portal.
I am looking at using Video Indexer as a video library for my entire video library (3tb) and without this functionality it would take forever!
if there is more than one person speaking at the same time, video indexer only resolves transcription for one (random) speaker. ideally, it needs to detect many speakers at the same time, and return what is each one speaking4 votes
For transcript (audio, OCR), how can I query a summary of confidence level for each? For content model customization of Brand and Language, same question? Lastly, how can I determine which group of language model text files was most pertinent? I wish to automate the workflow where I 1) submit language model txt file samples, 2) submit sample video files, 3) query confidence level, if lower than threshold, then 4) submit different language model txt file samples, and so on until confidence level is acceptable.2 votes
Confidence level is provided in the insights json as part of attributes
Polyglotism: via API/Portal, please consider supporting multiple languages within 1 source video or audio input? IE: Within 1 video file, recognizable text and speech are in English and Spanish. In another, French and Italian. Thanks. Today, I run the video twice and semi-automatically merge the results, discarding the mistakes.2 votes
I’m happy to update that the feature is now available in public preview.
You can can more about it here- https://docs.microsoft.com/en-gb/azure/media-services/video-indexer/multi-language-identification-transcription
Add technical video metadata to breakdown, e.g. framerate, format, codecs, per track information, exact duration, resolution, aspect ratio, bitrate, start timecode ...3 votes
A way to train the face recognition with "known data" (e.g. images of people, like in FACE API).
This could be something like a corporate image database that can be referenced or the images uploaded.
Also a way to export the face training data would be nice.16 votes
done! see https://azure.microsoft.com/en-us/blog/people-recognition-enhancements-video-indexer/ for details
Hi please add a parameter callbackurl like in post method api to create linguistic model api, sothat reindex breakdown can be easier . since waituntilready is true/false state is waiting. so if a callback is provided we can easily reindex breakdowns.7 votes
Hi please add a parameter callbackurl like in post method api to create linguistic model api, sothat reindex breakdown can be easier . since waituntilready is true/false state is waiting. so if a callback is provided we can easily reindex breakdowns.1 vote
Speech is starting in the middle of the video. Transcript is missing. Is it a known issue?1 vote
No – please contact visupport[at]microsoft[dot]com for support
In Insights, you're able to add names to people that have been identified in videos.
You should also be able to associate people with speakers within the transcript instead of displaying "Speaker #1" or "Speaker 2".
This should also work for audio only content.4 votes
Hi, Does VideoIndexer Result includes Face Emotion recognition details..such as Happy/Surprise etc? I do not see any such attributes in the JSON Result, but documentation in general says Emotion details are analyzed. Please clarify2 votes
Thank you for reaching out.
Emotions are identified based on speech and audio cues, and not based on face expressions.
The emotion could be: joy, sadness, anger, or fear.
You can see how an example for how it looks in the JSON here:
Ability to export the transcript to a Word doc or plain text format without the timestamps1 vote
Export to plain text and CSV formats are now supported
When I alter the transcript does this improve the learning ability on future uploads? At the moment the projects I'm working on are complex and the transcript needs a lot of alteration. Or do I need to add a lot of detail into the content model?0 votes
The ability to learn from edits was recently added, see https://azure.microsoft.com/en-us/blog/azure-media-services-the-latest-video-indexer-updates-from-nab-show-2019/ for details
When there more than one user accessing the account, at the moment there doesn't seem to be the ability for the second user to save the edited transcript even when working on another video source. (save button doesn't exist only reload source).0 votes
Currently, we get breakdown from Video Indexer with transcript blocks that contain whole sentences or phrase which are 5-10 seconds long. We need more granular speech to text with times when each word is spoken. Media Analytics in Media Services already does that, but it would be useful to have it in Video Indexer API.1 vote
Hi, I have been trying to upload videos onto the indexer but it keeps getting hanged at 95% and/or restarting at 0%, not any one of the videos out of my list of at least 9 videos could be uploaded.
How should I go about it?
*The videos are about 2hrs long1 vote
The ability to train the API to detect scenes and objects(faces, logos) according to my needs5 votes
I'd like to be able to link to a timecode position in a video,
where 123 is the timecode.1 vote
It is supported, exactly as in your idea
If you experiance an issue with it, please contact our support: visupport#at#microsoft#dot#com
- Don't see your idea?