John Tazare

and 12 more

Purpose: There is increasing recognition of the importance of transparency and reproducibility in scientific research. This study aimed to quantify the extent to which programming code is publicly shared in pharmacoepidemiology, and to develop a set of recommendations on this topic. Methods: We conducted a literature review identifying all studies published in “Pharmacoepidemiology and Drug Safety” (PDS) between 2017 and 2022, Data was extracted on the frequency and types of programming code shared, and other key open science practices (clinical codelist sharing, data sharing, study pre-registration, and use of reporting guidelines). We developed six recommendations for investigators who choose to share to programming code and gathered feedback from members of the International Society of Pharmacoepidemiology (ISPE). Results: Programming code sharing by articles published in PDS ranged from 2.4% in 2017 to 13.4% in 2022. It was more prevalent among articles with a methodological focus, simulation studies, and papers which also shared record-level data. We recommend that reporting of open science practices, including code sharing, is standardised to enable continued monitoring. When sharing programming code, we recommend the use of permanent digital identifiers, appropriate licenses, and, where possible, adherence to good software practices around the provision of metadata and documentation, computational reproducibility, and data privacy. Conclusion: Programming code sharing is rare but increasing in pharmacoepidemiology studies published in Pharmacoepidemiology and Drug Safety. We recommend improved and consistent reporting of code sharing, and adherence to good programming practices in order to maximize the utility of code when this is shared.
Purpose The generation of representative disease phenotypes is important for ensuring the reliability of the findings of observational studies. The aim of this manuscript is to outline a reproducible framework for reliable and traceable phenotype generation based on real world data for use in the Data Analysis and Real-World Interrogation Network (DARWIN EU®). We illustrate the use of this framework by generating phenotypes for two complex diseases: pancreatic cancer and systemic lupus erythematosus (SLE). Methods The phenotyping process involves a 14-step process based on a standard operating procedure co-created by the DARWIN EU® Coordination Centre in collaboration with the European Medicines Agency. A number of bespoke R packages were utilised to generate and review codelists for two phenotypes based on real world data mapped to the OMOP Common Data Model. Results Phenotypes were generated for both pancreatic cancer and SLE, and cohorts were generated using the Clinical Practice Research Datalink (UK primary care records) and Pharmetrics (US health claims data). Diagnostic checks were performed, which showed these cohorts had broadly similar incidence and prevalence figures to previously published literature. Additionally, co-occurrent symptoms, conditions, and medication use were in keeping with pre-specified clinical descriptions based on previous knowledge. Conclusions Our detailed phenotyping process makes use of bespoke tools and allows for comprehensive codelist generation and review, as well as large-scale exploration of the characteristics of the generated cohorts. Wider use of structured phenotyping methods will be important in ensuring the reliability of observational studies for regulatory purposes.