A Multimodal RAG-based Maintenance Chatbot forRobotic Arm Manuals

Yen-Hua  Lu; Ching-Hung Lee

PDF

views: 41 |

downloads: 8

Published Nov 12, 2025

Yen-Hua Lu

National Yang Ming Chiao Tung University

Ching-Hung Lee

National Yang Ming Chiao Tung University, Chung Yuan Christian University

Abstract

With the rapid advancement of large language models (LLMs), intelligent chatbots are increasingly being adopted for maintenance documentation, fault diagnosis, and personnel training. This study introduces a multimodal Retrieval-Augmented Generation (RAG) chatbot designed to provide accurate and natural-language support for robotic arm maintenance tasks. The system separates textual and visual content from maintenance manuals and processes them through two complementary pipelines. Caption RAG employs a vision-language model (VLM) to generate contextual captions for images, improving the retrieval of relevant documents. VLM RAG then integrates retrieved text and associated images, using GPT-4o to deliver more precise and context-aware answers. To address industrial data privacy concerns, the system supports local deployment using open-source LLaMA and Taiwan’s TAIDE LLM models. The evaluation dataset was curated and validated by senior experts from an industrial robotic arm manufacturer, ensuring strong domain alignment. Experimental results show high accuracy—96% with GPT-4o, 92% with LLaMA 8B, and 74.67% with TAIDE 8B. Incorporating visual context via VLM RAG further improved performance to 96.67%, highlighting the benefit of multimodal integration. In summary, the proposed chatbot enhances maintenance efficiency and fault resolution while preserving data privacy, making it a practical solution for real-world industrial deployment.

Download Statistics

Keywords

Multimodal, RAG, Large language model, Vision-language model, Chatbot, Maintenance manuals, Robotic arm

References

Citation Format

Issue

Vol 8 (2025): Special Issue of ARIS 2025

Section

Articles

This work is licensed under a Creative Commons Attribution 4.0 International License.

Creative Commons CC BY 4.0

##plugins.themes.bootstrap3.article.sidebar##

##plugins.themes.bootstrap3.article.main##

Abstract

Download Statistics

##plugins.themes.bootstrap3.article.details##