This paper presents a cost-effective and scalable methodology to evaluate the generator in retrieval-augmented generation (RAG) systems, tailored for offline use in critical fields like healthcare and finance. Current generator evaluations rely on online LLMs, which can hallucinate and bias scores, leading to cost-inefficient, non-reproducible, and non-scalable outcomes. Meanwhile, traditional NLP approaches lack a multi-faceted view and are overly dependent on entity or phrase matching mechanisms, missing deeper semantic understanding. Our approach leverages advanced semantic metrics to evaluate query relevance, factual accuracy, context consistency, coherence, semantic relevance, answer correctness, and hallucination detection, employing specialized tools across various NLP tasks. The methodology demonstrates superior performance over LLM-based judges and RAGAS by providing stable, scalable, and cost-effective offline assessment across both RAG systems and natural language generation tasks. We implement a bounded scoring system (0-1) using harmonic means with PCA, adaptive, and entropy-weighting techniques for trustworthy scoring. The framework's modular architecture enables continuous metric adaptation and domain-specific customization while maintaining evaluation consistency, making it particularly valuable for high-stakes applications requiring rigorous quality assessment without external API dependencies.