Blog

20+ MUST-KNOW TERMS when considering an Intelligent Document Processing solution

Written by Smart Touch | Aug 21, 2023 3:12:34 PM
 
1. IDP stands for Intelligent Document Processing, a technology using Artificial Intelligence and Machine Learning techniques to automate the extraction, classification, and interpretation of data from various documents, such as invoices, contracts, and forms.
 
An IDP solution transforms unstructured and semi-structured data into information fit for machine processing by understanding and extracting insights from this data. Or, simply put, it understands what the document is about, what information it contains, extracts that information, and sends it to the right place, as defined by the customer using it.
 
Our first developed product, Apollo, is the Intelligent Document Processing platform that serves ALL departments, which means it is not limited to invoices nor does it use templates.
 
 
 
 “What kind of documents does your company process: structured, semi-structured or unstructured?” is a question you will receive from any IDP solution provider. Let’s discover what they are trying to find out about the documents your company processes.
 
 
2. Structured documents, such as surveys, tests, claim forms, tests, or application forms, use fixed data in fixed locations – this means that, for example, the date is always situated in the right upper corner, the name of the company occupies the same fixed location etc. The IDP solution will know and get to the exact same location in order to extract a certain data. 
 
The problem with structured documents is that you'll need to create a template for each of the providers and each of their document types. The more providers, the more the variations and templates needed to be done. Also, these templates change in time, by redesign or changing the document-issuing software, which means new templates to be learned by the IDP solution.
 
 
 
3. Semi-Structured documents, such as invoices, purchase orders, bills of lading, or CVs, use a fixed set of data but not a fixed format. For example, the date can be found in the right upper corner or the left upper corner or at the bottom left corner. The field’s name may also be different, even if the fields is the same. Purchase Order Number is the same as PO Number, PO No or PON. Due to these variations, you cannot use templates. 
 
Intelligently Processing these semi-structured Documents need Machine Learning capabilities, as well as Natural Language Processing features to understand the context of each field.
 
 
 
4. Unstructured documents, such as contracts, articles, memos, or letters, use no fixed data and no fixed format.
 
Intelligently Processing these documents requires advanced configuration and customisation actions, which allow the IDP platform to learn based on Machine Learning training, custom pre-processing pipelines or computer vision-based recognition for visual components.
 
 
 
5. Machine Learning refers to a field of AI that focuses on the development of algorithms & statistical models that enable computer systems to learn and make predictions or decisions without being explicitly programmed. It involves training a computer or a system to recognise patterns and extract meaningful insights from data, allowing it to improve its performance over time through experience.
 
Machine Learning algorithms can analyse & interpret vast amounts of data, identify patterns, and make predictions or take actions based on the information they have learned.
 
 
 
6. Natural Language Processing or NLP is a field of Artificial Intelligence that focuses on enabling computers to understand and interpret human language, both spoken and written. In Intelligent Document Processing, NLP techniques are used to analyse and extract meaning from text-based documents. 
 
 
 
7. OCR stands for Optical Character Recognition, a technology that converts printed or handwritten text from documents into machine-readable text. Traditional OCRs extract text from a previously scanned document and transform it into a computer-readable structure (text) but cannot understand the data itself. IDP solutions understand data in context using continuous-learning and advanced Artificial Intelligence techniques that enable accurate and rapid extraction of relevant data.
 
 
 
8. Document Ingestion refers to IDP solutions importing documents from various sources. For example, Apollo ingests files via API, email, Google Drive, MS OneDrive etc and in various formats, such as pdf, images, docx etc. The IDP software can import files from the client’s resources, or they can be submitted via a gateway, such as our SMART PORTAL, an entry point for documents that are about to enter the IDP system.
 
 
 
9. Data Extraction involves identifying and capturing specific information from the documents, such as company name, address, Fiscal Identification Code, bank account, VAT, or invoice amount. Apollo’s Extractor feature automatically withdraws all relevant data from each type of document using AI and Machine Learning algorithms, as well as semantic and spatial criteria.
 
 
 
10. Document Classification is the process of categorising documents into different types or classes based on their content or purpose. IDP systems can automatically classify documents, making it easier to efficiently process them. Apollo categorises incoming documents based on pre-trained sequence to sequence classification models.
 
 
 
11. The Splitter feature in an IDP solution refers to a functionality that automatically separates or divides a document into smaller components or sections by identifying and extracting specific sections of interest, such as pages, paragraphs, or individual fields. This feature enables efficient processing by breaking down a document into manageable units, allowing for easier analysis, extraction, and further processing of relevant information.
 
Apollo’s Splitter automatically identifies page structure & splits distinct documents on page boundary from large scanned files using computer vision.
 
 
 
12. Document Grouping refers to grouping those documents that should be processed together, according to the rules established by the client.
 
 
 
13. Data Validation is the process of verifying and validating extracted data against predefined rules or databases to ensure accuracy, reliability, and minimising errors. Apollo automatically validates the data extracted through its SmartFlow feature.
 
 
 
14. SmartFlows, one of Apollo’s key differentiators, are 100% customisable document flows that fully replicate processes, internal business flows, or company's internal procedures, from data extraction to their transfer to third parties. A SmartFlow is done by simple Drag & Drop actions.
 
A user can build as many SmartFlows as they want, use predefined ones depending on each company's unique needs, or choose ones from the marketplace, already built by other users.
 
 
 
15. SmartBites are pieces of a SmartFlow, also known as customisable rule-based flow-bites. Just like the Lego bricks, they are used to build SmartFlows, which are Apollo’s 100% customised processes from data extraction to transfer to 3rd party applications, based on the unique needs of every company. SmartFlows will replicate any existing business flow within an organisation.
 
Building a SmartFlow is made by simply drag & drop the SmartBites. A company can have multiple SmartFlows, for every document type or every user.  Human In The Loop actions are triggered by exceptions within the data processing cycle.
 
 
 
16. Human In The Loop or HITL refers to human involvement or intervention in a system, a decision-making or data processing loop. It combines the capabilities of both humans and machines to achieve more accurate and reliable results. As a result, we will have a continuous feedback loop. With constant feedback, the algorithms learn and produce better results every single time. 
 
Within Apollo, these actions are triggered by exceptions within the data processing cycle. We advise our SMART customers to keep the process seamless, as adding too many HITL may affect the speed and fluency of documents processing. Either way, Apollo can handle it.
 
 
 
17. Exception Handling in IDP refers to the process of managing documents or data that require manual intervention due to ambiguity, errors, or special cases. IDP systems can flag exceptions for review or trigger HITL when necessary.
 
 
 
18. IWR stands stands for Intelligent Word Recognition. It is a technology that involves the automated recognition & interpretation of handwritten text within documents. IWR utilises advanced Machine Learning algorithms and Optical Character Recognition (OCR) techniques to analyse & convert handwritten words or phrases into digital text. By applying pattern recognition and contextual analysis, IWR enables the extraction and understanding of handwritten content, making it searchable and editable in a digital format. This capability enhances the overall efficiency and accuracy of IDP, particularly when dealing with handwritten documents or forms.
 
 
 
19. The Intelligent Character Recognition or ICR technology enables the automated recognition and conversion of handwritten or printed characters into digital text. ICR utilises advanced algorithms and machine learning techniques to analyse the shapes, patterns, and features of characters and accurately recognise and interpret them. It is particularly useful for extracting information from handwritten forms, documents, or other sources where printed or cursive text is present. By converting handwritten or printed text into digital format, ICR enhances the efficiency and accuracy of document processing, enabling further analysis, search-ability, and integration with other digital systems.
 
 
The main difference between IWR and ICR lies in the scope of recognition & the level of analysis they perform on handwritten text: IWR focuses on recognizing complete words or phrases in handwriting, while ICR is a more comprehensive technology that recognizes and converts both handwritten and printed characters, including letters, numbers, and symbols.
 
Apollo, our IDP platform, uses both technologies.
 
 
 
20. Workflow Automation and Systems Integrator Platforms provide a broader set of tools for automating and optimising end-to-end business processes by integrating multiple systems, tasks, and data flows, while IDP focuses on automating document processing and data extraction. They serve different purposes and focus on different aspects of automation.
 
 
 
21. Last but not least, we think it’s useful to know that Deep Tech is a term used for
business models based on high tech innovation in engineering or major scientific advances. Game changers that are likely to use Artificial Intelligence or Machine Learning, deep tech innovations are often radical, creating new markets or disrupting existing ones, addressing big societal and environmental challenges. 
 
Opposite to Deep Tech is Shallow Tech, which refers to a simple technological advance, eg from a non-digital to a digital business model.
 
 
 
Wanna know more? Drop a question by email or our LinkedIn account and we’ll be happy to answer it and adopt a smarttitude.
 
 
 
Smarttitude refers to a certain attitude we might have, since we are Smart Touch Technologies and smart questions require smart answers, hence we are… smart?! 🙃