Comprehensive screenshot analysis with real-time recommendations
A leading player in the high-tech industry.
The client asked XPG Factor to create an intelligent LLM-based multimodal solution that could efficiently handle various system interfaces used within the company. The idea was to enable the solution to analyze screenshots uploaded by employees, identify interface elements such as fields, buttons and menus, and interpret the visual context. This would allow it to provide clear and accurate instructions for new employees and accelerate the onboarding process.
The main challenge for us was the high variability of interfaces. This meant that the system had to be flexible to recognize and interpret dozens of different screens and interface elements.

Our team developed a custom AI solution with an architecture specifically designed for scaling and multimodal data processing. At its core is a FastAPI microservice running on a GPU-accelerated Azure virtual machine, which allows the system to process images and generate responses almost instantaneously. As a result, the system doesn’t just provide standardized prompts, but creates informed and contextually relevant recommendations that actually help employees make the right decisions in real time.
During the project, it was important for us to ensure that the system complied with strict restrictions on data transfer and storage, especially screen images, which could potentially contain confidential information. Therefore, our solution is designed to meet all data protection requirements, including environment isolation, encryption and access control.
The testing process consisted of three main stages: validating all components, running controlled tests using real system screenshots and a pilot deployment. Each stage was followed by relevant improvements, enabling us to deliver a fully operational and effective solution.
Our system helped the client to optimize and improve key business operations.
Quick staff training
Thanks to step-by-step guidance, new employees adapted much faster. According to the client, the onboarding time was reduced by 40–50%.
No-code updates
The system can instantly adapt to new interfaces thanks to its self-learning architecture. This eliminated the need for the client to hire developers for each update of third-party solutions.
This project is under NDA, so specific client details and proprietary information cannot be disclosed.
We appreciate that you've taken the time to write us. Your message was received and we will get back to you with our answer as soon as possible.
Have a great day!