Junaid Ahmed Khan

and 2 more

Data centers increasingly depend on Operational Data Analytics (ODA) for real-time insights from vast streams of telemetry data generated by IoT devices. Effective ODA enables organizations to make informed decisions, optimize operations, and enhance system performance. While data centers typically utilize NoSQL databases for scalability and data diversity, this often results in unstructured data representation, which poses significant challenges for querying. The lack of standardization, combined with schema flexibility and complex data structures, makes it difficult for system administrators to write and execute queries, ultimately complicating the automation of data retrieval tasks. While large language models (LLMs) offer opportunities to simplify data retrieval through natural language input, they frequently generate inaccurate or hallucinated query code. This manuscript presents EXASAGE, the first ODA co-pilot utilizing a knowledge graph-based approach to address these limitations. EXASAGE employs an LLM agent to convert natural language into SPARQL queries (native to knowledge graphs), executed at a graph database endpoint. In evaluations on 1,000 prompts, EXASAGE achieved a 92.5% accuracy rate in generating correct SPARQL code, significantly outperforming the 25% accuracy of NoSQL/SQLite query queries, which exhibited frequent hallucinations. Furthermore, SPARQL queries demonstrated greater conciseness, faster execution times, and shorter inference durations, underscoring their suitability for managing real-time IoT data in data centers.