I’m working on a project to create custom Machine Learning models for the processing of invoices using Azure Form Recognizer and OCR Form Labeling Tool.
In general it’s working really well, however, I have a situation where I have more than one invoice in the same PDF file. In is current form, the OCR Form Labeling Tool & Azure Form Recognizer don’t handle this situation very well. I‘m wondering if there are any tips or guidelines for this situation or will this use case be covered in the future?
Any guidance would be greatly appreciated!!6 votes
Woult it be posible to auto detect wich custom model to use? In a normal invoicing flow you would have one big pile of different invoices and send Them to be recognizer.
Then it would sort Them in 2 piles one wher a model was found and one without.
Do you understand what i Mean by that?11 votes
Thank you for the request. This feature is being planned for the next release.
In the Azure form recognizer official website "https://azure.microsoft.com/en-in/services/cognitive-services/form-recognizer/" few examples have tables in the sample file. In output also we have an attribute called table in sample json. Please guide us in labeling a table in a custom layout form.20 votes
Form Recognizer discovers and extracts tables automatically. Table results are part of the pageResults section in the JSON output. If the table in the form was not discovered you can label tables a values by labeling each table cell and training with the maximum number of rows in the tables. Form Recognizer does not yet support labeling tables as tables.
I know Form Recognizer does not yet support labeling tables as tables. Suggestions please.
Please see pic1 for reference. Number of Line Items may vary at large.
And, What can we do if some text is not recognized by the OCR labeling tool see pic1 ?6 votes
We should be able to use a setting to only analyze the first X pages of forms.
I.e. we have forms related to phone bills, which the first 2 pages have relevant info then there is 20+ pages of just call records which we do not want to analyze.
We should be able to tell the analyze forms function to only process the first X number of pages.
Especially in the power automate function.5 votes
We have a number of paper forms that are completed by staff. The forms include checkboxes. We need to confirm that the checkboxes have been checked.
Additionally, we require the form to be signed by the person completing it. We'd like to confirm that a mark has been made in the signature area.
It would be great if there was a way that Form Recognizer could just register that a pen mark has been made in a certain area, even if it can't read it (e.g. a signature or a tick in a box)36 votes
Thank you for the request. This checkbox feature is being planned for the next release.
When I copied and pasted the sample C# client code snippets from the Quickstart documentation, the code doesn't even compile. Please share a simple working C# client program via github so we don't have to copy/paste bits and pieces and we don't have to struggle with compiler errors. The sample should work as long as the FORMRECOGNIZERENDPOINT and FORMRECOGNIZERKEY environment variables are set to valid values. [Oh, please make this textarea box resizable]2 votes
it would be nice to allow to name the model when training.. will allow to not waist time saving id ect4 votes
Hello, I followed the tutorial here: https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/label-tool
I setup a Azure web app (ACI) to run the docker image according to this tutorial: https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/deploy-label-tool#deploy-with-azure-container-instances-aci
I setup the connection. Then when I went to create the project it kept giving me this error and I have no idea what is wrong? I tried removing all spaces and non alpha characters from project name but nothing seems to work :(2 votes
It would be really nice to have a "Lasso" selection functionality in the form labeling tool. I know I can hold the left mouse button down to highlight multiple words but there are some areas of forms (e.g. remarks) where we have several hundred words that have to be selected. They usually are in a rectangle shape so hence a lasso selection would work perfectly.8 votes
The original forms labeling tool (VoTT) creates vott files and the new one (OCR Form Labeling Tool) uses fott files. I labeled and trained a bunch of forms using the original labeling tool. Is there a way for me to migrate those vott files to the fott file format?
It seems kind of strange you would switch labeling tools without providing any guidance on how to migrate their trained forms.3 votes
whenever something is wrong I want the model to learn8 votes
Can i use my own ocr service to capture text ? Cause many languages are not supported and there is no schedule.2 votes
At the moment we don't have features like recognizing checkboxes, radio buttons, signatures, etc.
Can form recognizer recognize a region of the form, so we can cut that part out and put for further processing?
For example, I would like to know where on the document are the radio button questions, so that I know in which area I need to do custom processing.6 votes
Thank you for the request. Checkbox and radio button recognition is being planned for the next release.
Often, OCR detects handwritten text incorrectly.
"Bridget Sims, MD" was detected at "Bridge+ Sims, MD"
There should be a way to correct this and enter in the correct value of the text detected as "Bridget Sims MD" after the OCR has done its work.
Is there a way to do this already?2 votes
I am trying to use Microsoft Form Recogniser to get the key value pairs from medical forms. However there are a number of different types of form and cannot currently train an accurate enough model to use with the current size limit.10 votes
Form Recognizer v2.0 (preview) enables training large data-sets and analyzing large documents.
The containers provided are with V1.0 API and all limitations ( no Layout API, 4Mb Dataset for training, ...).
When containers will be updated for V2.0 API ?
Thank you Mathieu, container v2.0 support is planned for the next release. New updates and features can be found/tracked on this What’s New page.
Are you able to disclose what the roadmap is for the Form Recognizer product? I'm particularly interested in using the Custom Model and Labelling Tool.6 votes
Form Recognizer v2.0 (preview) is available and includes the ability to label forms and train a custom model. To get started follow the Train with labels quickstart – https://docs.microsoft.com/en-us/azure/cognitive-services/form-recognizer/quickstarts/label-tool
See what’s new in Form Recognizer v2.0 (preview) –
Occasionally, OCR puts together printed and handwritten text as a single tag. I would like them to be separate.
Compay Handwritten company name
Printed text is usually the label while the handwritten text is what we are looking for.6 votes
- Don't see your idea?