Sea surface temperature (SST) observations made at ships are distributed irregularly in space and time and are affected by systematic biases and random errors. Such observations are often “binned”: split into samples, contained within “bins” - grid boxes of a space-time grid (1oX1o monthly bins are used here), and their statistics are computed. Bin averages often serve as gridded representations of such data, thus requiring reliable uncertainty estimates, which for ship observations are particularly important because of their domination in the early observational records. Here ship SST observations for 1992–2010 are compared with an independent high-resolution satellite-based SST data set. To remove systematic biases, seasonal means were subtracted from the difference between bin-averaged data sets. In more than 66%(50%) of locations with binned temporal coverage exceeding 50%(66%), the magnitude of remaining anomalies agreed within 20%(10%) with random error model estimates. Separate estimates for sampling and measurement error components were obtained.