As innovators, we are always looking at how emerging technologies can improve operational efficiency. The following is a brief look at how Artificial Intelligence (AI) can be applied to records management systems, specifically TRIM.
Micro Focus Content Manager (TRIM) is likely to be around for a long time yet! Why? Because migrating the huge volume of TRIM documents to another system is an enormous task and it is hard to justify the huge cost versus the benefits. Rather than seeking to replace TRIM, this post looks at how AI might modernize TRIM for the majority of users.
The Problem we need to Solve
First, a quick look at common complaints about records management systems:
- Users find them overly complex and prefer to drop files into their local file system
- Users find them time-consuming, requiring manual effort to manage recordkeeping
- Paper-based processes are often required (still)
- Processes often require two to three systems to complete common work tasks
- Systems, like email, case management, and financial systems, have limited interoperability
- Users are frustrated by a disjointed manual approval process
These frustrations result in the loss of evidentiary material and increased costs due to the discoverability of poorly managed information. We need a way to solve these issues by automation and make the records management system “disappear” to everyone except administrators.
Before jumping into AI remember TRIM does more than save documents. For completeness, TRIM needs to also store the document content, context, and provenance.
The document itself with associated metadata, version histories, renditions, embedded objects, and attachments required to make the information complete. Content can have different versions of the same document usually to track changes or make a redacted version available. Documents can also have many renditions i.e. copies in another format like Word and PDF.
This is information about the creation of the document. Context can be provided by links to other documents, the location in the classification scheme, management metadata such as retention schedules, and when and who created, modified, and used the information.
This is the document's origins, custody, and ownership, which validate the document source and authenticity. This is usually exposed as metadata generated by the records management system.
It is clear that using AI to automatically capture, categorize, index, manage and dispose of documents without user interaction is going to get complicated. It gets simpler if we break it into stages, and auto-classification, or filing, seems the logical place to start.
Traditionally we would try to auto file documents by looking at a document's metadata (filename, title, author, etc.) then save it to a mapped location in the records management system. AI does not work that way. Instead, AI looks at the actual content of the document, achieving greater accuracy using a two-step process:
Training an algorithm
- Examine all the documents and metadata already in a TRIM location
- Examine the documents in other locations
- Determine what characteristics are shared between documents in the same location and how they differentiate from documents in other locations
- Create (train) an algorithm that encapsulates these learnings
Executing the algorithm
- When a new document requires filing, input the document data into the algorithm
- The algorithm output will be the determined location
An algorithm can be retrained any time changes are made to the classification scheme. Different hypothesis algorithms can be tested to determine if a confident outcome is produced. When an algorithm proves it can classify a document with acceptable accuracy it can be published. When auto filing, the system can raise a flag if the calculated classification does not reach a threshold level of confidence.
Unfortunately, it's not that simple. Using document content to determine where to classify documents only works when the requirement is to classify similar documents together e.g. invoices, specifications, or resumes. If, for example, the classification schema required all the documents belonging to a customer to be located together, AI would struggle.
Micro Focus IDOL (Intelligent Data Operating Layer) attempts to overcome some of these challenges. IDOL runs on an independent server and associated database. IDOL uses AI to automatically gather and process information from multiple repositories using connectors and a global relational index.
- Executes automatic record classification and folder creation based on categories that exist on the IDOL server. When IDOL creates a category, it analyzes the training and produces a list of terms with weights to indicate the term importance. Users can adjust the terms to finetune the auto-classification
- The IDOL Image Server detects new images, extracts text and saves it as an OCR rendition of the original
- Content can stay in authoring applications. TRIM has a manage-in-place framework to apply holds to content in external repositories
- IDOL indexes metadata to speed up searches
The number of classification scheme variants challenge traditional forms of AI. Algorithms are trained on documents that can be grouped by their content but natural grouping (e.g. invoices together, statements together, contracts together) is not always reflected in the classification scheme.
IDOL helps to overcome the mismatches between the AI-generated output and the in-situ classification scheme by adjusting term weightings to provide the required outcome. This can still be problematic with many classification schemes.
In this post, we've only looked at simple auto-classification where we've seen that AI success has a dependency on the classification schema used. The challenge of making TRIM disappear for users is further compounded when considering the context and provenance aspects of each document.