The ability of current global models to simulate the transport of CO2 by mid-latitude, synoptic-scale weather systems (i.e. CO2 weather) is important for inverse estimates of regional and global carbon budgets but remains unclear without comparisons to targeted measurements. Here, we evaluate ten models that participated in the Orbiting Carbon Observatory-2 model intercomparison project (OCO-2 MIP version 9) with intensive aircraft measurements collected from the Atmospheric Carbon Transport (ACT)-America mission. We quantify model-data differences in the spatial variability of CO2 mole fractions, mean winds, and boundary layer depths in 27 mid-latitude cyclones spanning four seasons over the central and eastern United States. We find that the OCO-2 MIP models are able to simulate observed CO2 frontal differences with varying degrees of success in summer and spring, and most underestimate frontal differences in winter and autumn. The models may underestimate the observed boundary layer-to-free troposphere CO2 differences in spring and autumn due to model errors in boundary layer height. Attribution of the causes of model biases in other seasons remains elusive. Transport errors, prior fluxes, and/or inversion algorithms appear to be the primary cause of these biases since model performance is not highly sensitive to the CO2 data used in the inversion. The metrics presented here provide new benchmarks regarding the ability of atmospheric inversion systems to reproduce the CO2 structure of mid-latitude weather systems. Controlled experiments are needed to link these metrics more directly to the accuracy of regional or global flux estimates.