Form Recognizer
Attention!
We have moved our Customer Feedback & Ideas for Azure Cognitive Services portal to the Azure Feedback Forum.
-
Recognize Checkbox and Signature sections
We have a number of paper forms that are completed by staff. The forms include checkboxes. We need to confirm that the checkboxes have been checked.
Additionally, we require the form to be signed by the person completing it. We'd like to confirm that a mark has been made in the signature area.
It would be great if there was a way that Form Recognizer could just register that a pen mark has been made in a certain area, even if it can't read it (e.g. a signature or a tick in a box)
37 votesCheckbox extraction is available in Form Recognizer 2.1 release.
-
How to label a table in a form?
In the Azure form recognizer official website "https://azure.microsoft.com/en-in/services/cognitive-services/form-recognizer/" few examples have tables in the sample file. In output also we have an attribute called table in sample json. Please guide us in labeling a table in a custom layout form.
30 votesForm Recognizer discovers and extracts tables automatically. Table results are part of the pageResults section in the JSON output. If the table in the form was not discovered you can label tables a values by labeling each table cell and training with the maximum number of rows in the tables. Form Recognizer does not yet support labeling tables as tables.
-
Manual Training
Make it trainable (maybe with a UI) where users can manually point or map fields they need. We have tons of different templates of Invoices but we basically need only the main fields on the Invoice recognized, and currently we have no way of pointing out what we need and with the 50 pages\4 Mb per model and we feel very limited.
Maybe a way of saying Invoice #, or Invoice No. or Inv# should be mapped to Invoice Number for all of the results for this model; this would be of much help when extracting data from different Invoice…
29 votesForm Recognizer v2.0 (preview) enables labeling forms and training a model to extract the values of interest. Train a Form Recognizer model with labels using the sample labeling tool.
https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/label-tool
-
When will form-recognizer support the complex forms?
Forms usually have complex structures as well and that is where automated services are needed. I have forms that contain merged cells, and nested tables. On documentation of form-recognizer, it is stated that such forms are not supported. Does Microsoft looking forward towards making form-recognizer capable for such forms as well? If yes, any expected time frame?
17 votes -
Form with tables, checkboxes, and whole lot of square boxes
I have trained my form recognizer model using all the requirements and tips provided in documentation. However, the output is not good at all.
The form that I am trying to train on is attached (empty version). I have used 2 filled in forms as well as per requirements.
The form has two small tables at the top - even these haven't been read correctly. And then there are whole lot of square boxes (representing one character) to fill in details of the users. The boundaries of these square boxes are being read either as 1 or as a dash…12 votes -
Auto recognise wich custom model to use.
Hi
Woult it be posible to auto detect wich custom model to use? In a normal invoicing flow you would have one big pile of different invoices and send Them to be recognizer.
Then it would sort Them in 2 piles one wher a model was found and one without.
Do you understand what i Mean by that?11 votesWith the v2.1 preview announced last week, we have added support for “Model Compose” to support this functionality
https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/whats-new#august-2020 -
Add the ability to train a model with a blob larger than 4mb
I am trying to use Microsoft Form Recogniser to get the key value pairs from medical forms. However there are a number of different types of form and cannot currently train an accurate enough model to use with the current size limit.
10 votesForm Recognizer v2.0 (preview) enables training large data-sets and analyzing large documents.
-
What's the best way to train a table without line? (See Pic)
I know Form Recognizer does not yet support labeling tables as tables. Suggestions please.
Please see pic1 for reference. Number of Line Items may vary at large.
And, What can we do if some text is not recognized by the OCR labeling tool see pic1 ?8 votes -
Add lasso selection functionality to Form Labeling tool
It would be really nice to have a "Lasso" selection functionality in the form labeling tool. I know I can hold the left mouse button down to highlight multiple words but there are some areas of forms (e.g. remarks) where we have several hundred words that have to be selected. They usually are in a rectangle shape so hence a lasso selection would work perfectly.
8 votes -
Reinforcement learning
whenever something is wrong I want the model to learn
8 votes -
Cognitive Service Container with V2.0 Support
The containers provided are with V1.0 API and all limitations ( no Layout API, 4Mb Dataset for training, ...).
When containers will be updated for V2.0 API ?Thx
8 votes -
Increase limit of number of samples for custom training
We understand that the number of samples for custom training is currently limited to 50. We hope that by providing more samples, the model would be more accurate in the end on large volumes, so would it be possible to raise that limit?
8 votesThank you for your request, this feature has been implemented: “The total size of the training data set must be 500 pages or less.”
https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/build-training-data-set -
Process multiple documents contained in a single file
I’m working on a project to create custom Machine Learning models for the processing of invoices using Azure Form Recognizer and OCR Form Labeling Tool.
In general it’s working really well, however, I have a situation where I have more than one invoice in the same PDF file. In is current form, the OCR Form Labeling Tool & Azure Form Recognizer don’t handle this situation very well. I‘m wondering if there are any tips or guidelines for this situation or will this use case be covered in the future?
Any guidance would be greatly appreciated!!
7 votes -
labeling tool improvements
Add regex option to form tag format. Let users specify what is expected result and help FR filter out text that does not belong to expected value. validate result with regex and adjust confidence score accordingly.
modify recognized field in training form. Sometimes recognized value is incorrect and contains additional text. I would like to be able to flag and correct this value so FR would better recognize this type of mistake
Add versioning to trained models. Models are immutable, but they belong to a same project. Show a list of previously trained models, so users could switch between them…
6 votes -
Limit processing to first X pages
We should be able to use a setting to only analyze the first X pages of forms.
I.e. we have forms related to phone bills, which the first 2 pages have relevant info then there is 20+ pages of just call records which we do not want to analyze.
We should be able to tell the analyze forms function to only process the first X number of pages.Especially in the power automate function.
6 votes -
Recognizing form regions
At the moment we don't have features like recognizing checkboxes, radio buttons, signatures, etc.
Can form recognizer recognize a region of the form, so we can cut that part out and put for further processing?
For example, I would like to know where on the document are the radio button questions, so that I know in which area I need to do custom processing.
6 votesHi! For radio boxes specifically, we’ve added support for selection marks in our v2.1 preview. For region specification more broadly, this is only “started” status.
https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/whats-new#august-2020
-
Separate hand written text from printed text
Occasionally, OCR puts together printed and handwritten text as a single tag. I would like them to be separate.
For example:
Compay Handwritten company name
Printed text is usually the label while the handwritten text is what we are looking for.
6 votes -
Roadmap
Are you able to disclose what the roadmap is for the Form Recognizer product? I'm particularly interested in using the Custom Model and Labelling Tool.
6 votesForm Recognizer v2.0 (preview) is available and includes the ability to label forms and train a custom model. To get started follow the Train with labels quickstart – https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/label-tool
See what’s new in Form Recognizer v2.0 (preview) –
https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/whats-new -
What is the best way to train 1k different vendor invoices for custom model train? as there is a limit of 500 pages per model id
What is the best way to train a model having 1k vendor invoices, as there is a limit of 500 pages.
As per the limit mentioned we can max 100 vendor invoices per model id.Can you please put some light on this from the implementation perspective ?
5 votes -
OCR validate text detection
Often, OCR detects handwritten text incorrectly.
For example:
"Bridget Sims, MD" was detected at "Bridge+ Sims, MD"
There should be a way to correct this and enter in the correct value of the text detected as "Bridget Sims MD" after the OCR has done its work.
Is there a way to do this already?
5 votes
- Don't see your idea?