Is there an Art & Science of Picking the Right Gen AI Solution to Handle your Documents?

Baskar Swaminathan

Published on

March 14, 2024

Introduction

Enterprises have invested a lot of time and money trying to automate the highly manual process of document understanding with sub par results.

The inherent complexity arises from the diverse layouts and formats commonly encountered in documents, presenting a challenge in automating tasks related to document comprehension and subsequent actions based on extracted information. This document delves into the current challenges and how a next generation solution can make a difference.

Additionally, this document outlines how Lazarus has a competitive edge over other in the market and touches upon the seamless integration capability of our advanced technology into their technological ecosystem, encompassing custom-built applications, ERPs, CRMs, RPA, and more. Let us start with understanding certain nuances about documents and how they can be categorized, because each document type would require certain capabilities for it to be understood and processed.

Documents can be classified into four distinct categories, delineated by two key characteristics: (a) Structured versus Unstructured, and (b) Machine-Printed versus Handwritten. These categories are as follows:

- Structured & Machine-Printed

- Unstructured & Machine-Printed

- Structured & Handwritten

- Unstructured & Handwritten

Document Types

Within the enterprise context, the ongoing imperative to understand documents within each of these four categories remains constant. Beyond these categories, additional parameters such as skewed documents, the presence of signatures, barcodes, and checkboxes add complexity. Often, documents possess characteristics spanning two or more of the mentioned categories. Organizations must ensure that a solution possesses the capability to adeptly handle documents from all categories rather than being limited to a subset.

Our primary focus is consistently directed towards ensuring that the platform can seamlessly manage any type of document with both ease and high accuracy. This approach eliminates the need for the organization to contend with different platforms or products for handling specific document types and ensures independence from any dependencies.

A Typical Document Understanding Solution Workflow:

A simplified version of the document understanding workflow is outlined below.

STEP 1 (Pre-Processing):

The process initiates with pre-processing steps, which may involve tasks such as retrieving documents from systems like email, CRMs, ERPs, or custom applications.

STEP 2 (Digitize):

In this phase, document preprocessing takes place, including actions such as rotating pages, removing noise, and adjusting contrast. The goal is to enhance the digitization process. Generally, the Document Understanding solution employs either an on-prem OCR engine (which is usually less effective) or utilizes cloud OCR engines like Google OCR, Microsoft OCR, etc.

STEP 3 (Classify):

Following the digitization of documents, this step involves document classification. The primary emphasis here is on identifying the document type (e.g., insurance claims, mortgage applications, etc.) This step is very important, as the following steps (like extraction and post processing) will change based on the document type..

STEP 4 (Human in the Loop):

This is an optional step, but a very important one. If the classification is incorrect, it will lead to disaster, so it is very important to send the documents for human verification if the confidence is low.

STEP 5 (Actual Document Task):

This can be either an extraction or summarization task.

STEP 5a (Extraction):

In this stage, the extraction process may range from straightforward data extraction, as outlined in the document, to more intricate procedures that involve analyzing images and comprehending extensive paragraphs. The objective is to reason out and determine the value to be extracted, which may involve sophisticated data comprehension and image analysis techniques.

STEP 5b (Summarization):

This involves generating a concise and coherent summary of a longer text while retaining its essential information.

STEP 6 (Human in the loop):

This is an optional step, but a very important one. This is introduced to ensure the correct data gets processed downstream. Usually, the data from the previous step will be routed to this one when the confidence is low or when it fails other checks in place (e.g., data format rules, lookup values etc).

STEP 7 (Post-Processing):

The process involves taking the extracted or summarized data, performing post-processing steps that may include data validation tasks, and entering the data into administration platforms.

A typical simplified view of a document understanding solution will look something like this.

Key Capabilities of a Document Understanding Solution:

When evaluating a Document Understanding solution an Enterprise might focus on several aspects including ease of use, data privacy, data security, on-prem deployment capabilities, the company’s responsiveness to address your problems, and the research depth to guide your journey. We will elaborate a bit more on the three key aspects below.

1.OCR accuracy (ability to digitize)

Typical Data Processing solutions in the market rely on external OCR engines. Accuracy in this process is paramount, as inaccuracies in input data inevitably result in inaccurate outcomes.

2.Proficiency in Contextualization (ability to classify, extract, reason & summarize)

This is the core capability – classification, extraction, and summarization – the Enterprises need to ensure the technologies used are in line with the latest advancements in this space to avoid getting stuck in the ways of old. Creating targeted models for each task falls under the previous generation technology. While the solutions can be developed and deployed successfully, scaling platforms becomes very challenging in the old ways.

Few other questions to ponder – what about your data privacy? Who owns your data? What if there are partnership challenges between the platform providers and model providers?

3.Easy Integration (ability to integrate)

The business needs to ensure they get the right tool for the right job and the integration is crucial as it ensures a seamless and efficient implementation and user experience. A product’s ability to seamlessly integrate with other tools, applications, or services enhances its functionality, offering users a comprehensive solution that goes beyond standalone features. Moreover, well-integrated products gain a competitive advantage by providing users with a customizable and personalized experience, ultimately leading to higher user satisfaction and market success.

Art and Science of Picking the Rght Solution:

At the end of the day, it is paramount to ensure a solution is chosen which addresses the organization’s needs. There are a lot of solutions out there in the market, and they all come with lots of technical jargon that many people use without understanding.

Please do not fall for it. Just a cool demo or the size of a product company is not going make automatically make the product the right fit for you. Product demonstration is an important aspect, but please ensure to perform holistic due diligence before writing a big check.

Even in this digital era, there is value for this saying “Always touch and feel the fabric before buying”. Please never make the mistake of evaluating solutions with unique datasets, because certain solutions are designed to handle only certain types of documents and the sales team will nudge you towards their areas of strength.

Lazarus AI

Let’s see how Lazarus AI, a Next Generation GEN AI solution, stands out concerning digitization accuracy, .integration prowess, and its ability to classify, extract, reason & summarize.

Digitization:

Typical Data Processing solutions in the market rely on external OCR engines. Accuracy in this process is paramount, as inaccuracies in input data inevitably result in inaccurate outcomes. Platforms that are dependent on external OCR engines lack essential control, encountering significant challenges, particularly when confronted with documents that are skewed, smudged, or contain unclear handwritten content. Many organizations have spent a lot of money and effort in building solutions but never went live because this factor was overlooked.

Proficiency in contextualization:

Compared to the models available in the market, our proprietary foundation models are trained specifically to deal with documents in the enterprise. In addition to our highly accurate OCR engine, our proprietary approach to training the models focuses on not just text, but also the location of the text. This makes our solutions much more accurate compared to other models in the market.

The high levels of accuracy in classification and extraction are typically achieved through a combination of the use of LLMs for data extraction, summarization, analysis of images and videos, and audio and other signal data, all with various RAG implementations such as VKGs. With those resources aligned, the final step is to engineer a prompt or a series of prompts that produce repeatable and predictable results. There are no templates in this process. However, many various combinations of resources and styles of prompt architecture are available, and the perfect combination is use-case-specific.

Integration prowess:

Integrating Lazarus is as simple as calling the API with the necessary parameters such as the document and prompts to classify, extract, or summarize the document. Lazarus APIs are used in customer production deployments of all these topics. Customers can integrate API calls into their current applications or workflows. All communications between the platforms are highly secured. The key point to note is that this entire platform can be hosted on your data center or any chosen private cloud.

You have full control on your data and privacy, with the ability to integrate with any platform that you want. Any system that can create custom forms and make an API call (for example, UiPath can create custom forms to implement the human-in-the-loop capability) can be integrated easily with our Lazarus platform.

Should you get a Point Solution or a Platform for your Document Understanding Needs?

Choosing between a point solution and a GenAI solution for your document understanding needs depends on various factors, including the specific requirements of your organization, the scale of your operations, and your long-term goals. Here are some considerations to help you make an informed decision:

If the organization’s Document Understanding needs are very narrow and static, choosing a point solution may not be a bad idea.

There are a wide variety of point solutions out in the market that can address a specific problem or set of related problems, and it may be very tempting to go choose over a platform. However, that comes with lots of constraints around scaling that solution to handle documents that are not in its scope, and very soon you may have to end up choosing another point solution to address that situation or choose a platform.

Point solutions are usually quicker to implement, but do not overlook the fact that they come with less flexibility and customization options than the platform offers. If your document understanding needs are diverse or likely to evolve, a platform may provide more adaptability.

Point solutions tend to be cheaper (mostly, but not always) in the initial stages, but choosing a platform may be a better investment if you anticipate expanding your use of document understanding technologies over time.

As your organization learns the platform, the speed, and efficacy at which you can implement solutions will increase rapidly, as opposed to getting stuck with a point solution.

Summary

In summary, the decision between a point solution and a platform depends on your organization’s specific needs, budget constraints, and long-term vision. Assess the trade-offs in terms of functionality, ease of implementation, scalability, and integration capabilities to make the most appropriate choice for your document understanding needs.

About Lazarus AI

Lazarus AI develops enterprise-grade foundation models for the insurance industry and beyond. Lazarus AI’s advanced APIs enable organizations to eliminate their processing bottlenecks and provide rapid time to value.

Heading 1

Heading 2

Heading 3

Heading 4

Heading 5

Heading 6

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

Block quote

Ordered list

Item 1
Item 2
Item 3

Unordered list

Item A
Item B
Item C

Text link

Bold text

Emphasis

^Superscript

_Subscript