How to Create a Document Processing Custom Model

Are you ready to create your very own document processing model? It’s easier than you think! Follow these simple steps to get started.

Sign in to AI Builder

To begin, sign in to either Power Apps or Power Automate. Once you’re signed in, locate the left pane and select AI Builder > Explore. From there, choose Extract custom information from documents and click on Get Started. A step-by-step wizard will guide you through the process of listing all the data you wish to extract from your documents. If you have at least five examples that share the same layout, you can use your own documents. Otherwise, sample data is available for you to create your model.

After listing your data, select Train and proceed to test your model by choosing Quick Test.

Select the Type of Document

On the Choose document type step, you’ll need to select the type of document for which you want to build an AI model to automate data extraction. There are two options available:

  • Structured and semi-structured documents: These documents have a consistent layout, with fields, tables, checkboxes, and other items found in similar places. Examples include invoices, purchase orders, delivery orders, tax documents, and more.

  • Unstructured and free-form documents: Unlike structured documents, unstructured ones lack a set structure and often contain paragraphs of varying lengths. Examples include contracts, statements of work, letters, and more.

There is also a preview option available for invoices. You can augment the behaviors of the prebuilt invoice processing model by adding new fields to be extracted or by providing samples of documents that weren’t properly extracted.

Define Information to Extract

On the Choose information to extract screen, you will define the specific fields, tables, and checkboxes that you want your model to extract. To define these, simply click on the +Add button and start filling in the necessary information.

  • For each Text field, provide a name for the field in the model.
  • For each Number field (preview), provide a name and define whether the decimal separator should be a dot (.) or a comma (,).
  • For each Date field (preview), provide a name and define the date format (e.g., Year, Month, Day or Day, Month, Year).
  • For each Checkbox, provide a name and define separate checkboxes for each item that can be checked in a document.
  • For each Table, provide a name for the table and define the different columns that the model should extract.

Group Documents by Collections

A collection represents a group of documents that share the same layout. To process multiple layouts, create a collection for each layout. For example, if you’re building a model to process invoices from two different vendors, each with their own invoice template, you should create two collections.

Remember to upload at least five sample documents per collection. Currently, the accepted file formats are JPG, PNG, and PDF.

Next Step

The next step is to tag documents in your document processing model. Stay tuned for our upcoming article on how to do just that!

See Also

For more information on training and processing custom documents with AI Builder, refer to the Training: Process custom documents with AI Builder (module) module.