Assessing space weather modeling capability is a key element in improving existing models and developing new ones. In order to track improvement of the models and investigate impacts of forcing, from the lower atmosphere below and from the magnetosphere above, on the performance of ionosphere-thermosphere models, we expand our previous assessment for 2013 March storm event [Shim et al., 2018]. In this study, we evaluate new simulations from upgraded models (Coupled Thermosphere Ionosphere Plasmasphere Electrodynamics (CTIPe) model version 4.1 and Global Ionosphere Thermosphere Model (GITM) version 21.11) and from NCAR Whole Atmosphere Community Climate Model with thermosphere and ionosphere extension (WACCM-X) version 2.2 including 8 simulations in the previous study. A simulation of NCAR Thermosphere-Ionosphere-Electrodynamics General Circulation Model version 2 (TIE-GCM 2) is also included for comparison with WACCM-X. TEC and foF2 changes from quiet-time background are considered to evaluate the model performance on the storm impacts. For evaluation, we employ 4 skill scores: Correlation coefficient (CC), root-mean square error (RMSE), ratio of the modeled to observed maximum percentage changes (Yield), and timing error(TE). It is found that the models tend to underestimate the storm-time enhancements of foF2 (F2-layer critical frequency) and TEC (Total Electron Content) and to predict foF2 and/or TEC better in the North America but worse in the Southern Hemisphere. The ensemble simulation for TEC is comparable to results from a data assimilation model (Utah State University-Global Assimilation of Ionospheric Measurement (USU-GAIM)) with differences in skill score less than 3% and 6% for CC and RMSE, respectively.