Results

Despite the considerable variation in citation metrics among researchers and disciplines, there was broad consistency in the strength of the relationships between citation mass (Arel) and loge years publishing (t) across disciplines (Fig. 1), although the geology (GEO) sample had the poorest fit (ALLR 2 = 0.43; Fig. 1). The distribution of residuals ε for each discipline revealed substantial difference in general form and central tendency (Fig. 2), but after scaling, the distributions of ε′ became aligned among disciplines and were approximately Gaussian (Shapiro-Wilk normality tests; see Fig. 2 for test values).
After scaling (Fig. 3a), the relationship between ε′ and the m-quotient is non-linear and highly variable (Fig. 3b), meaning that m-quotients often poorly reflect actual relative performance (and despite the m-quotient already being ‘corrected’ for t, it still increases with t ; Supplementary Material Fig. S1). For example, there are many researchers whose m-quotient < 1, but who perform above expectation (ε′ > 0). Alternatively, there are many researchers with an m-quotient of up to 2 or even 3 who perform below expectation (ε′ < 0). Once the m-quotient > 3, ε′ reflects above-expectation performance for all researchers in the example sample (Fig. 3b). The corresponding ε′ indicate a more uniform spread by gender and career stage (Fig. 3c) than do m-quotients (Fig. 3d). Another advantage of ε′ versus the m-quotient is that the former has a threshold (ε′ = 0) above which researchers perform above expectation and below which they perform below expectation, whereas the m-quotient has no equivalent threshold. Further, the m-quotient tends to increase through one’s career, whereas ε′ is more stable. There is still an increase in ε′ during late career relative to mid-career, but this is less pronounced that that observed for the m-quotient (Fig. 4).
Examining the ranks derived from ε′ across disciplines, genders and career stage (Fig. 5), bootstrapped median ranks overlap for all disciplines (Fig. 5a), but there are some notable divergences between the genders across career stage (Fig. 5b). In general, women ranked slightly below men in all career stages, although the bootstrapped median ranks overlap among early and mid-career researchers. However, the median ranks for late-career women and men do not overlap (Fig. 5b), which possibly reflects the observation that senior academic positions in many disciplines are dominated by men (24-26), and that women tend to receive fewer citations than men at least in some disciplines, which often tends to compound over time (27-30). The ranking based on the m-quotient demonstrates the disparity among disciplines (Fig. 5c), but it is perhaps somewhat more equal between the genders (Fig. 5d) compared to the ε′ rank (Fig. 5b), despite the higher variability of the m-quotient bootstrapped median rank.
However, calculating the scaled residuals across all disciplines for each gender separately, and then combining the two datasets and recalculating the rank (producing a gender-‘debiased’ rank) effectively removed the gender differences (Fig. 6).