Materials and Methods
Researcher samples
Each co-author assembled an example set of
researchers from within her/his field, which we broadly defined as archaeology (S.A.C.), chemistry (J.M.C.), ecology (C.J.A.B.), evolution/development (V.W.), geology (K.T.),microbiology (B.A.E.), ophthalmology (J.R.S.), and palaeontology (J.A.L.). Our basic assembly rules for each of these
discipline samples were: (i) 20 researchers from each stage of
career, defined here arbitrarily as early career (0–10 years
since first peer-reviewed article published in a recognised scientific
journal), mid-career (11–20 years since first publication), and late career (> 20 years since first publication);
each discipline therefore had a total of 60 researchers, for a total
sample of 8 × 60 = 480 researchers across all disciplines. (ii)
Each sample had to include an equal number of women and men from each
career stage. (iii) Each researcher had to have a unique,
publicly accessible Google Scholar profile with no obvious errors,
inappropriate additions, obvious omissions, or duplications. The entire
approach we present here assumes that each researcher’s Google Scholar
profile is accurate, up-to-date, and complete.
We did not impose any other rules for sample assembly, but encouraged
each compiler to include only a few previous co-authors. Our goal was to
have as much ‘inside knowledge’ as possible with respect to each
discipline, but also to include a wide array of researchers who were
predominantly independent of each of us. The composition of each sample
is somewhat irrelevant for the purposes of our example dataset; we
merely attempted gender and career-level balance to show the properties
of the ranking system (i.e., we did not intend for sampling to be a
definitive comment about the performance of particular researchers, nor
did we mean for each sample to represent an entire discipline). Finally,
we completely anonymised the sample data for publication.
Citation data
Our overall aim was to provide a meaningful and
objective method for ranking researchers by citation history without
requiring extensive online researching or information that was not
easily obtainable from a publicly available, online profile. We also
wanted to avoid an index that was overly influenced by outlier
citations, while still keeping valuable performance information
regarding high-citation outputs and total productivity (number of
outputs).
For each researcher, the algorithm requires the following information
collected from Google Scholar: (i) i10-index (the
number of publications in the researcher’s profile with at least 10
citations, which we denoted i10); one condition
is that a researcher must have i10 ≥ 1 for the
algorithm to function correctly; (ii) h-index —
the researcher’s Hirsch number (4): the number of publications with at
least as many citations, which we denoted h; (iii ) the
number of citations for the researcher’s most highly cited paper
(denoted cm ); and (iv) the year
the researcher published her/his first peer-reviewed article in a
recognized scientific journal (denoted Y1). For the designation of Y1, we excluded any reports, chapters, books,
theses or other forms of publication that preceded the year of the first
peer-reviewed article; however, we included citations from the former
sources in the researcher’s i10, h, and cm.
Ranking algorithm
The algorithm first computes a
power-law-like relationship between the vector of frequencies (as
measured from Google Scholar): i10, h , and
1, and the vector of their corresponding values: 10, h , and cm, respectively. Thus, h is, by
definition, both a frequency (y-axis) and value (x-axis).
We then calculated a simple linear model of the form y ~ α + βx , where
\(y=\mathrm{\log}_{e}\par
\begin{bmatrix}i_{10}\\
h\\
1\\
\end{bmatrix}\) and \(x=\mathrm{\log}_{e}\par
\begin{bmatrix}10\\
h\\
c_{m}\\
\end{bmatrix}\)
(y is the citation frequency, and x is the citation value)
for each researcher (Supplementary Material Fig. S2). The corresponding \(\hat{\alpha}\) and \(\hat{\beta}\) for each relationship allowed us to
calculate a standardized integral (area under the power-law
relationship, Arel) relative to the researcher in
the sample with the highest cm. This implies all
areas were scaled to the maximum in the sample.
A researcher’s
A rel therefore represents her/his
citation
mass, but this value still requires correction for
individual opportunity (time since first publication,
t = current
year –
Y1) to compare researchers at different
stages of their career. This is where career gaps can be taken into
account explicitly for any researcher in the sample by subtracting
ai = the total cumulative time absent from
research (e.g., maternity or paternity leave, sick leave, secondment,
etc.) for individual
i from
t, such that an individual’s
career gap-corrected
\(t'=t-a\)\(\). We therefore
constructed another linear model of the form
Arel~
γ +
θ log
et across
all researchers in the sample, and took the residual (
ε) of an
individual researcher’s
Arel from the predicted
relationship as a metric of citation performance relative to the rest of
the researchers in that sample (Supplementary Material Fig. S3). This
residual
ε allows us to rank all individuals in the sample from
highest (highest citation performance relative to opportunity and the
entire sample) to lowest (lowest citation performance relative to
opportunity and the entire sample). Any researcher in the sample with a
positive
ε is considered to be performing above expectation
(relative to the group and the time since first publication), and those
with a negative
ε fall below expectation. This approach also has
the advantage of fitting different linear models to subcategories within
a sample to rank researchers within their respective groupings (e.g.,
such as by gender; Supplementary Material Fig. S4). An R code function
to produce the index and its variants using a sample dataset is
available from
github.com/cjabradshaw/EpsilonIndex.
Discipline standardization
Each discipline has its own
citation characteristics and trends (16), so we expect that the
distribution of residuals (ε) within each discipline to be
meaningful only for that discipline’s sample. We therefore endeavoured to
scale (‘normalise’) the results such that researchers in different
disciplines could be compared objectively and fairly.
We first scaled the Arel within each discipline
by dividing each i researcher’s Arel by
the sample’s root mean square:
\begin{equation}
A_{\mathrm{\text{rel}}_{i}}^{{}^{\prime}}=\frac{A_{\mathrm{\text{rel}}_{i}}}{\sqrt{\frac{\sum_{i=1}^{n}A_{\mathrm{\text{rel}}_{i}}}{n-1}}}\nonumber \\
\end{equation}
where n = the total number of researchers in the sample (n = 60). We then regressed these discipline-scaled\(A_{\mathrm{\text{rel}}}^{{}^{\prime}}\) against the loge number of years since first publication pooling all disciplines
together, and then ranked these scaled residuals (ε′) as
described above.