第2章 - AskDona - AI Platform for Enterprises

Usage Trends and Performance of the “Generative AI Chat” and “Advanced Search” on RIKEN’s “Fugaku” Support Site

This chapter analyzes both quantitatively and qualitatively how the AI-generated chat 'AskDona' and 'Advanced Search', integrated into the support site of the supercomputer 'Fugaku', have influenced user behavior and the support framework, and reports the results and current status.

The background for adopting RAG technology in this project is predicated on the fact that LLMs are primarily trained on publicly available information such as from the web and books, and do not have knowledge of non-public information possessed by specific organizations. Since the manuals and user guides provided on the 'Fugaku Website' and 'Fugaku Support Site' are not publicly available, the LLM cannot appropriately respond to specialized questions about 'Fugaku' from its learned data. Furthermore, as the 'Fugaku Website' and 'Fugaku Support Site' are accessible only to members, LLM applications equipped with 'web search' features cannot provide appropriate answers. Hence, there's a risk of either generating answers based on general information or generating unsupported information (hallucination).

One technology addressing this challenge is RAG. RAG involves LLMs generating responses by searching relevant information in real-time from data sources that it has not trained on or cannot access, such as organization-specific documents, and referring to the LLM with the content. This can enable both the generation of accurate responses based on specific, non-public information of an organization and a reduction in the risk of hallucination.

Our AI-generated chat 'AskDona', based on our proprietary RAG solution, demonstrated compliance with the technical requirements set by R-CCS, as shown in the 'Pre-Implementation Technical Evaluation' conducted in May 2024, in preparation for joining this project. The initial version of AskDona (dona-rag-1.0) achieved a perfect score (100% answer accuracy) on all setup questions against the technical requirements (over 80% answer accuracy) set by R-CCS, and the fundamental RAG mechanism that achieves high answer accuracy from large-volume data has been evaluated.

2.1 Technical Overview of AskDona Initial Version (dona-rag-1.0)

This section details the RAG architecture, the core technology supporting AskDona, specifically the implementation of the initial version 'dona-rag-1.0', adopted when introduced to the Fugaku Support Site. At the time of the 'Pre-Implementation Technical Evaluation' conducted in May 2024, practical verification examples of RAG were still scarce, making generating correct answers from RAG data sources an important challenge. Therefore, this project took an approach to ensure response accuracy and quality by combining our technologies with the general structure of RAG.

Generally, the RAG system comprises two main processes. One is the 'Pre-processing', where information is extracted from documents (files) used as response data sources to build a searchable data source, and the other is the 'Real-time Processing', where related information is searched from the data source in response to user queries and handed over to LLM to generate answers. In 'dona-rag-1.0', these processes are implemented as follows.

First, during 'Pre-processing', high-precision information extraction beyond simple text detection is performed from documents of various formats used as data sources. Specifically, it is based on OCR (Optical Character Recognition) technology for accurately recognizing and extracting complex content, like table information with retained structure and specialized content such as mathematical or chemical formulas described in LaTeX format, from within the documents.

Next, the extracted text information undergoes 'chunking' for storage in data sources, not as fixed-length splits but as dynamically split units based on 'meaningful bundles' analyzed from the context of the documents. This independent method prevents fragmentation of information and enhances search accuracy. This split method also holds the advantage of inherently eliminating the occurrence of information leakage when relying on LLM for partition processing.

Then, 'Metadata Tagging' is performed. In conditions handling tens of thousands of diverse manual groups, such as the Fugaku Support Site, this process holds extremely significant meaning. Generally, RAG systems face the trade-off relationship where, as more documents are added as data sources, the search space widens, increasing the chances of also searching for unrelated information (noise) even though it might be semantically similar, thereby reducing answer accuracy. AskDona overcomes this specific challenge of large-scale data sources with unique technological measures such as metadata tagging.

Specifically, structural information (metadata) such as filenames, page numbers, section/chapter headings is tagged to each divided text chunk. This allows filtering based not just on semantic similarity but on specific context like 'from a particular manual' during the search process in real-time processing. This metadata-based filtering effectively eliminates search noise, significantly enhancing the reliability of the final answers.

The effectiveness of this approach was later advocated as 'Contextual Retrieval' in an article (September 2024) published by Anthropic and is recognized as a useful method for improving RAG accuracy when handling large-scale diverse document groups.

Subsequently, the text chunks with attached metadata undergo 'vectorization'. Here, OpenAI's text-embedding-3-large is adopted as the Embedding model, converting to a vector of the maximum dimension count available at the time, 3072 dimensions, and stored in the data sources.

In 'Real-time Processing', questions received from users are vectorized using the same text-embedding-3-large model used during pre-processing. Using this question vector, searches based on cosine similarity are conducted within the data sources (Retrieval), obtaining the top 10 text information judged to be highly relevant (Top-K=10).

Finally, the acquired text information is combined with the user's original question and a system prompt directing the form or tone of the response, and handed over to the generative model (LLM) to generate responses. This implementation ay