Fusarium head blight (FHB) is an economically important disease in wheat which can cause yield losses >50%. Breeding for host resistance is the most effective control method, however time, labor, and human subjectivity limit phenotyping efforts. A novel, high-throughput phenotyping rover was used to collect in-field RGB images of inoculated wheat spikes at multiple time points in 2021 and 2022. A deep neural network pipeline was developed to classify wheat spikes, segment healthy and diseased tissue, and quantify FHB severity as the region of intersection between spike and disease masks. To validate the pipeline, model inferences on a plot and spike scale were compared to five raters who performed disease scoring in the field and on images. The precision and throughput of the phenotyping rover and FHB quantification pipeline exceeded conventional rating methods. The plot aggregate disease scores based on pipeline outputs correlated strongly with plot-level disease scores by raters in the field and imagery. When comparing disease annotations on spike images, pipeline to human disease correlations were equivalent to correlations between raters, however location tended to influence disease assessment. The pipeline has strong generalizability and performed well on images taken across environments, with different camera orientations, and throughout disease progression. These results demonstrate a breakthrough in FHB phenotyping and facilitate precise and efficient disease quantification on spikes and plot aggregates across time and imaging conditions that are unachievable using conventional methods.