Statistical analyses
The hazards of all-cause mortality associated with HRT were initially
estimated by Cox proportional hazards regression model.
The outcome was time from study
entry to death from any cause in years. The model included second order
interaction effects of all variables with main exposure HRT, and
interactions of all medical conditions with the lifestyle variables.
Backward elimination was applied to select the variables at 5%
significance level for the main exposures, and 1% significance level
for the interactions. The contribution of the covariates in explaining
the variation of the hazard in the Cox regression model was assessed by
ANOVA. Grambsch and Therneau’s test33 was performed to
check the non-proportionality of hazards at 5% level of significance
and was found to be significant. Also, the underlying baseline hazards
of this study population were found to follow the Weibull distribution.
Consequently, a model34 which we refer to as Weibull
Double-Cox model was fitted to estimate the shape parameters of the
variables with time-variant hazards, and the scale effects. In
principle, this model replaces the unspecified baseline hazard in the
Cox model by Weibull hazard function and incorporates an additional Cox
regression term for shape. General practices were included in the model
as a random effect or frailty to account for unobserved heterogeneity of
patients between practices. Four separate survival models were also
fitted to assess the impact of HRT by 5-year age group at initiation on
all-cause mortality. The same sets of explanatory variables were
adjusted for in the full-case (all age combined) model and in age
subgroup analyses.
There were missing values for smoking, BMI, deprivation, and
hypertension status (Table S1). Multilevel multiple imputation (MI) was
used to deal with missing data. Ten imputed datasets were generated and
analyzed independently for the full-case model as well as for each
subgroup model. It is widely accepted that when missingness varies from
10-50%, MI can be used to deal with missing data, and 5-10 imputations
are sufficient as having more imputations does not affect the
results.35,36 The distributions of the variables with
missing records in complete and imputed datasets were similar (Table
S2). Rubin’s rules37 were applied to pool the
estimated parameters. Complete
case analyses were performed to validate the imputation models (Figure
S1). The overall performance of the models was assessed by the
concordance, and its values of 0.7 in full model, and 0.75−0.81 in the
subgroup models indicate a good fit.38Kaplan-Meier (KM) survival
analysis techniques were used to analyze the time to diagnosis of some
selected medical conditions at follow-up. All analyses were performed in
statistical software R version 3.6.1 using the packages “survival”,
“MASS”, “rms”, and “jomo”.