Tackling Data Compatibility and Unstructured Data Challenges in Travel Insurance - Ancileo

Building an enterprise-level AI module for travel insurance claims is complex. Claims processing requires handling diverse data formats, interpreting detailed information, and applying judgment beyond simple automation.

When developing Lea’s AI claims module, we faced challenges like outdated legacy systems, inconsistent data formats, and evolving fraud tactics. These hurdles demanded not only technical skill but also adaptability and problem-solving.

In this article series, we’ll share the in-depth journey of building Lea’s AI eligibility assessment module: the challenges, key insights, and technical solutions we applied to create an enterprise-ready system for travel insurance claims processing.


Challenge : Data Compatibility and Handling Unstructured Data

Travel insurance claims processing is complex and filled with unpredictable data, requiring a system built to handle unexpected data patterns effectively.

Key Learnings:

  1. Document Classification Reduces Errors : Advanced AI categorizes unstructured documents (receipts, medical bills, boarding passes) into predefined types, allowing only relevant data to be processed.
  2. Dual-Layer Verification Enhances Accuracy: A two-layer protocol dynamically flags anomalies (e.g., unexpected currencies or incomplete data) to ensure claims are reviewed with precision.
  3. Real-Time Event Mapping Speeds Decision-Making : Mapping extracted data to specific events, such as flight delays or stolen luggage, accelerates processing and minimizes manual intervention.

At Ancileo, our journey to build a real-time AI claims module was anything but straightforward. Traditional AI solutions, like the classic LLM-ETL (Large Language Model – Extract, Transform, Load) pipeline, rely on batch processing and offline databases. Our challenge was clear: design a system capable of parsing, normalizing, and transforming unstructured, multi-format claim documents.


Why Real-Time Matters in Travel Insurance ⏱️

Consider a traveler whose return flight was canceled, compelling them to purchase an alternative ticket at a significantly higher cost. They promptly submit a claim for reimbursement, but delays in claim processing prolong their financial burden and test their trust in their insurance provider.

In such scenarios, speed and accuracy are essential for ensuring a seamless recovery from travel disruptions and maintaining customer satisfaction.

The Challenge: Handling Diverse and Messy Data

Travel insurance claims data comes in a wide range of formats—structured data such as JSON files from digital claims forms and unstructured data such as scanned receipts, PDFs, or images of handwritten documents, including medical bills, flight itineraries, police reports, or compensation notes.

Each document type serves a specific purpose. For instance:

  • A delayed baggage claim may require receipts for essential items purchased during the delay.
  • A trip cancellation claim may necessitate proof of the incident, such as a medical certificate or official cancellation notice.

It is common for a single claim to include multiple incidents, requiring distinct sets of supporting documents. For example, a traveler may simultaneously file for reimbursement of delayed baggage essentials and a trip cancellation due to a family emergency. Each claim component must be correctly categorized and assessed individually to ensure accurate processing

The Reality: Disorganized and Inconsistent Submissions

Claim submissions often arrive in disarray:

  • Document formats and quality vary widely, from high-resolution PDFs to blurry, misaligned photos.
  • Claimants provide inconsistent or incomplete labels and tags, making automated categorization unreliable.
  • Supporting documents may be irrelevant or incomplete, requiring additional validation steps.

These challenges cause bottlenecks in the processing pipeline, impacting the speed and precision needed for real-time claims assessment. Resolving these issues requires a system that can classify, analyze, and extract actionable data from both structured and unstructured sources, regardless of format or quality.


Our approach to addressing this 🚀

To address these challenges, Lea’s AI module includes a dedicated sub-module, Agentic Graph, which analyzes, classifies, and extracts data from claim documents. It uses a type- and key-value-based mapping system to link relevant documents with specific events under assessment.

Agentic Graph functions as the document-handling core of Lea’s AI claims module, processing both structured and unstructured data in real time. This system enables flexibility and iterative refinement in document categorization and assessment.

Our solution is specifically developed to address the complexities of travel insurance claims processing.

1. Understanding and Classifying Unstructured Data 📁

Unstructured data in travel insurance is notoriously complex. We’re not just talking about extracting text from a PDF; we’re dealing with documents that come in a variety of unpredictable formats such as scanned images with complex handwriting such as:

  • Medical certificates with handwritten diagnoses, which may include non-standard abbreviations or difficult-to-read text.
  • Receipts from foreign hospitals written in different languages or scripts.
  • Scanned boarding passes that are misaligned, blurred, or include annotations.

 


 

Our Tailor-Made workflow dynamically processes unstructured data by identifying, classifying, and extracting relevant information from a variety of document types. It begins with:

Global Understanding: Using advanced NLP (Natural Language Processing), and computer vision and GenAI, the system identifies document types and recognizes key patterns. It doesn’t matter if the document is a high-quality PDF or a blurry image from a smartphone—our system adapts in real time.


2. Document Category Analyzer: Dynamic and Intelligent 🧠

Once a document is processed for initial understanding, it is classified using our Document Category Analyzer. This module leverages deep learning to assign documents into specific categories, such as:

  • Policy documentation for verifying coverage details.
  • Claim forms containing essential data fields.
  • Police reports for validating incidents.
  • Medical receipts for reimbursing expenses.

To enhance accuracy and ensure contextual awareness, we’ve developed and implemented an Analyzer and Challenger Protocol:

 

🔄 Analyzer and Challenger Protocol

  • When a document contains inconsistencies or unusual data points, our Challenger module automatically flags it. For example, if a bill from Malaysia lists expenses in USD instead of the local currency, the Challenger questions this anomaly.
  • The flagged document is then re-evaluated by the Analyzer, which uses advanced validation algorithms to recheck the data, cross-reference additional sources if needed, and confirm the authenticity or correctness of the information.

Our dynamic protocol continuously learns and updates, making real-time adjustments as new data patterns emerge.

Challenge Overcome: By developing this dual-layer verification system, we ensure that even unexpected or anomalous data gets a second look, significantly enhancing classification accuracy.

Travel insurance claims span a wide range of variations. Our Analyzer and Challenger Protocol ensures that discrepancies are identified and resolved, making our system robust and reliable across different regions and data formats.


3. Advanced Data Extraction and Error Handling 🔍

To ensure data quality, our pipeline includes validation checks that safeguard data integrity. This involves confirming that numeric fields, such as claim amounts, are formatted correctly in the appropriate currency, and that dates adhere to a standardized format like (ISO 8601). Any discrepancies trigger automated alerts or error-correction mechanisms, improving the accuracy and reliability of the extracted data.

Building on this, once unstructured data is processed and categorized, our system moves to a dynamic Data Extraction phase. By classifying each document into a relevant category, our system can load a tailored data extraction schema for that specific category, ensuring highly relevant and precise data extraction instead of generic or incomplete results.

Our AI-driven approach enables the module to:

  • Extract claim amounts from handwritten receipts, even when embedded in complex, unstructured text.
  • Identify dates and times from poorly scanned documents.
  • Pull out policy numbers from documents written in multiple languages.

This adaptive method ensures that data extraction is accurate and contextually appropriate, tailored to each document type for optimal efficiency.


4. Event Mapping or Peril Mapping 🔍

Travel insurance claims often involve multiple perils—distinct risks or incidents that activate coverage, such as trip disruptions, lost baggage, or medical emergencies. Each peril introduces unique processing requirements, including documentation, eligibility criteria, and contextual analysis, making claims inherently complex.

For example:

  • Losses and Personal Losses: A case of stolen luggage may require documentation from local authorities, proof of ownership, and a detailed event timeline.
  • Trip Disruptions: Missed connections or emergency cancellations might need several supporting documents to ensure the correct benefits are applied.
  • Flight Delays: Even a simple flight delay claim can involve variables such as the duration, reasons provided by the airline, and subsequent changes to the travel itinerary.

Each peril must be evaluated through a detailed framework that considers the claim type (what happened), benefit (what is covered), and event (under what circumstances the coverage applies). Accurate data extraction and mapping become crucial for efficient claim processing.

This is where Event Mapping, or Peril Mapping, comes into play. It connects the extracted data to structured fields in our system, ensuring every piece of information is aligned correctly for assessment. Once data is processed, it is transformed into a standardized format, like JSON, with clearly defined fields (e.g., {“patient_id”: “1234”, “claim_amount”: 500, “service_date”: “2023-12-01”}), making it ready for immediate use by the module.

The Complexity of Peril Mapping

The real challenge lies in dynamically mapping data to one or multiple perils in real time. A single document, such as a hospital bill in a foreign language, might relate to a medical emergency but also serve as evidence for trip interruption coverage. A boarding pass showing a flight delay may need to be cross-referenced with hotel and meal receipts to assess a broader trip disruption claim.

Our event mapping handles this complexity by understanding and organizing vast amounts of information in a structured, actionable format. 

For example:

  • A “€500 medical expense” extracted from a French hospital receipt is mapped precisely to the corresponding peril, so the AI module uses only the relevant data for assessment.
  • A “flight delay” extracted from a boarding pass image is linked to the appropriate peril, ensuring accuracy in the evaluation process.

By automating this intricate mapping, we ensure claims are processed swiftly and with a high degree of precision. This advanced, real-time Event Mapping approach not only reduces errors but also enhances the overall customer experience.


5. Transforming Data for Seamless AI Integration 🔄

By the end of our pipeline, every piece of data—structured or unstructured—has been transformed into a consistent, AI-ready format. This makes it compatible for real-time claims assessment, reducing delays and improving decision-making.

Why Our Approach is Beneficial:

  • Speed: Real-time data processing accelerates claim resolution, minimizing delays and improving customer experience.
  • Accuracy: Advanced AI models reduce errors associated with manual data entry.

Scalability: Our system learns and adapts, meaning it’s future-proof and ready to handle emerging challenges in travel insurance.


The Bottom Line: Practical Solutions for Travel Insurance Claims Processing


Our AI claims module tackles the practical challenges of unstructured data and complex claim scenarios by enabling precise data handling, event mapping, and real-time processing. Built with adaptability and accuracy in mind, it’s designed to navigate the unique requirements of travel insurance claims.

If optimizing claim processing and improving operational efficiency are priorities for your business, we’re here to help. Let’s connect to explore how our solution can support your needs.

Spread the love