Multi-View Stereo (MVS) reconstructs detailed 3D structures from multi-view images by establishing spatial correspondences. While learning-based methods have significantly advanced the MVS task, challenges such as ambiguous matching caused by textureless surfaces and lighting variations persist. To address these issues, we propose GAP-MVSNet, a framework that leverages surface normals from a monocular normal foundation model as priors to enhance the geometric awareness of reconstruction targets. In this work, surface normal priors are seamlessly integrated into the MVS pipeline to improve depth prediction robustness and accuracy. Specifically, we introduce a structure-aware feature pyramid network that incorporates surface normal information and utilizes uncertainty-aware feature resampling to extract robust image features. Additionally, we present the spatial geometry enhanced regularization that combines sampled depth hypotheses with surface normals to generate a spatial geometric prior, guiding the cost regularization process and enforcing strong spatial coherence, particularly in textureless regions. Furthermore, we design a local consistency depth refinement module that utilizes surface normals to establish depth relationships as a local geometric prior, thereby refining classification-based depth predictions and aligning them with ground truth depth. Extensive experiments on the DTU and Tanks & Temples datasets demonstrate that our method achieves state-of-the-art performance.