Screening of titles, abstracts, and main text
Titles and abstracts were screened for suitability. Suitable abstracts mentioned at least one group of soil fauna measured in at least one reference, undisturbed, or control site, and one site impacted by a GC. To aid in screening of titles and abstracts, we used a machine-learning algorithm in the program Abstrackr alongside human-screening. Whilst the abstracts and titles were being manually screened, all papers were being dynamically assigned confidence scores by Abstrackr. After the manual screening of 9,535 abstracts (of which 6,143 were irrelevant and 3,389 were included), the Abstrackr confidence score was 0.58 or under for the remaining 15,444 articles, a low enough value to indicate the remaining articles were not relevant for the meta-analysis. This cut-off value of 0.58 was chosen based on a quality control procedure in which we randomly sampled 5% of the records within each 0.1 band of confidence scores, and screened their titles to check that they ‘may be’ suitable or were “definitely not” suitable. The cut-off confidence score was then based on the point where the number of ‘definitely not’ suitable papers was the majority of the titles within a 0.01 band. Thus, the 15,444 articles were not considered further.
The full texts of the 3,389 papers with relevant abstracts and titles were then manually screened. In order to be suitable for the analysis the article needed to have (1) measured at least one soil fauna group (e.g., earthworms, macro-fauna, oribatid mites), (2) captured the impact of one or several GCs according to our GC-specific inclusion criteria (see supplementary materials), and (3) presented the necessary data (mean values, variance, n’s) to allow us to calculate an effect size for the meta-analysis.
As no definition, catalogue, or list exists of organisms considered ‘soil biodiversity’ , soil fauna was determined based on sampling protocol. Suitable sampling methods included soil cores, hand-sorting excavated soil blocks, or mustard extraction. Pitfall traps on their own were not considered suitable, as these data are more representative of activity densities of ground-dwelling invertebrates . However, if the pitfall traps were associated with another method targeting the soil, they were considered suitable .