The ocean is a major carbon sink and takes up 25-30% of the anthropogenically emitted CO2. A state-of-the-art method to quantify this sink are global ocean biogeochemistry models (GOBMs) but their simulated CO2 uptake differs between models and is systematically lower than estimates based on statistical methods using surface ocean pCO2 and interior ocean measurements. Here, we provide an in-depth evaluation of ocean carbon sink estimates from 1980 to 2018 from a GOBM ensemble. As sources of inter-model differences and ensemble-mean biases our study identifies the (i) model set-up, such as the length of the spin-up, the starting date of the simulation, and carbon fluxes from rivers and into sediments, (ii) the ocean circulation, such as Atlantic Meridional Overturning Circulation and Southern Ocean mode and intermediate water formation, and (iii) the oceanic buffer capacity. Our analysis suggests that the late starting date and biases in the ocean circulation cause a too low anthropogenic CO2 uptake across the GOBM ensemble. Surface ocean biogeochemistry biases might also cause simulated anthropogenic fluxes to be too low but the current set-up prevents a robust assessment. For simulations of the ocean carbon sink, we recommend in the short-term to (1) start simulations in 1765, when atmospheric CO2 started to increase, (2) conduct a sufficiently long spin-up such that the GOBMs reach steady-state, and (3) provide key metrics for circulation, biogeochemistry, and the land-ocean interface. In the long-term, we recommend improving the representation of these metrics in the GOBMs.