In the fast-evolving world of AI-driven automation, innovation doesn’t come from a single model or algorithm — it comes from the synergy between different technologies. At our company, we’ve built a computer vision platform designed to automatically assess vehicle damage from images. But as our solution matured, we realized something: while our visual models could see damage very well, they couldn’t explain it, contextualize it, or communicate it effectively to users and stakeholders.
That’s when we decided to bring Generative AI (Gen AI) into the equation.
Our existing computer vision pipeline was highly capable of identifying and localizing damage — scratches, dents, broken parts, paint issues, and other imperfections. However, the process stopped short at producing structured outputs or visual highlights that told a complete story. For insurers, repair shops, and fleet managers, that wasn’t enough. They wanted:
- Ability to deal with unstructured data such as low quality images
- Human-like summaries that can be shared with non-technical clients.
- Consistent documentation that aligns with their existing claims or maintenance systems.
That’s where generative AI models, especially large language models (LLMs), became the missing piece.
We approached the integration as a layered system — each model focusing on its strength, while Gen AI acted as the “interpreter.” This is how it works:
Our computer vision models (based on CNNs and transformers like YOLOv8 and ViT) process vehicle images to detect damaged areas, classify the type of damage, and estimate severity. This layer outputs structured metadata such as:
{"part": "front bumper",
"damage_type": "scratch",
"severity": "moderate",
"confidence": 0.94 }
Next, we introduced Gen AI — specifically, large language models fine-tuned on automotive and insurance data. These models interpret the structured output and produce natural language insights such as:
“The front bumper shows moderate scratches likely caused by a low-speed frontal impact. The estimated repair involves repainting and minor surface restoration.”
This turns technical outputs into actionable intelligence.
Finally, we use generative models to tailor the results:
- For insurers: concise damage summaries and repair cost estimates.
- For customers: plain-language explanations with annotated images.
- For internal teams: structured JSON reports for system integration.
By combining computer vision with generative AI, we created a full-cycle solution — from detection to decision.
The integration delivered results beyond our initial expectations:
- Faster Assessments: Reduced manual review time by over 60%.
- Improved Consistency: Natural language outputs are standardized and error-checked.
- Enhanced Customer Experience: Reports now read like they were written by an expert, not a machine.
- Smarter Decision-Making: The system explains why certain classifications were made, improving transparency and trust.
Implementing Gen AI into an existing computer vision pipeline wasn’t plug-and-play. Some lessons we learned:
- Data alignment is key — language models need structured, well-annotated data from vision outputs to perform well.
- Prompt engineering matters — we designed specialized prompts that help the Gen AI model stay consistent and accurate.
- Human oversight remains critical — even the best AI pair needs quality assurance for edge cases and ambiguous results.
The fusion of visual AI and generative AI marks a new era for automation in the automotive and insurance industries. Our next steps include:
- Expanding to video-based assessments for dynamic inspections.
- Using Gen AI to simulate repair outcomes or predict future damage risks.
- Integrating with claims management systems for full-cycle automation.
What started as a project to improve image understanding has evolved into a system that can reason, explain, and communicate — just like a human assessor, only faster and more consistently.