UI interpretation system

Comprehensive screenshot analysis with real-time recommendations

Self-learning, no manual work
Training time cut by 50%

About the client

A leading player in the high-tech industry.

How it all started

The client asked XPG Factor to create an intelligent LLM-based multimodal solution that could efficiently handle various system interfaces used within the company. The idea was to enable the solution to analyze screenshots uploaded by employees, identify interface elements such as fields, buttons and menus, and interpret the visual context. This would allow it to provide clear and accurate instructions for new employees and accelerate the onboarding process.

Challenges we faced

The main challenge for us was the high variability of interfaces. This meant that the system had to be flexible to recognize and interpret dozens of different screens and interface elements.

Implemented solution

Our team developed a custom AI solution with an architecture specifically designed for scaling and multimodal data processing. At its core is a FastAPI microservice running on a GPU-accelerated Azure virtual machine, which allows the system to process images and generate responses almost instantaneously. As a result, the system doesn’t just provide standardized prompts, but creates informed and contextually relevant recommendations that actually help employees make the right decisions in real time.

Solution’s key attributes

  • Robust operation
    Thanks to the use of a vector knowledge database and Retrieval-Augmented Generation, the system quickly adapts to new interfaces and ensures stable performance during peak loads and simultaneous multi-user access.
  • Logically structured guidance
    The system is designed to provide not just isolated facts, but context-aware instructions that match the employee’s workflow.
  • Multiple roles and scenarios
    The system is scalable for different roles. Its functionality can be adapted for use by supervisors, technical administrators or even for training purposes.
  • Live knowledge control
    The system supports real-time updates from admins: they can input new guidance, revise existing content or refine decision logic.
  • User activity monitoring
    With the query history function, the system helps to identify typical user problems, refine recommendations and optimize training procedures.

Security and privacy

During the project, it was important for us to ensure that the system complied with strict restrictions on data transfer and storage, especially screen images, which could potentially contain confidential information. Therefore, our solution is designed to meet all data protection requirements, including environment isolation, encryption and access control.

Testing

The testing process consisted of three main stages: validating all components, running controlled tests using real system screenshots and a pilot deployment. Each stage was followed by relevant improvements, enabling us to deliver a fully operational and effective solution.

Results & business value

Our system helped the client to optimize and improve key business operations.

Quick staff training

Thanks to step-by-step guidance, new employees adapted much faster. According to the client, the onboarding time was reduced by 40–50%.

No-code updates 

The system can instantly adapt to new interfaces thanks to its self-learning architecture. This eliminated the need for the client to hire developers for each update of third-party solutions.

Team composition

Scrum master
Project manager
Product owner
2 backend developers
2 data scientists
DevOps engineer
Frontend developer
2 QA engineers

This project is under NDA, so specific client details and proprietary information cannot be disclosed.

Let your next project shape your success

We can help you with that. Share your details through the contact form, and our team will get in touch to arrange a meeting and discuss the next steps.