DISCUSSION
Within a variety of validation methods, several studies have found that the CPs with the highest PPV to identify TG patients within their study population contained diagnosis codes accompanied by other information, such as procedural or prescription codes in claims data, and key text-strings in EHR data. If key text-strings were not available, most researchers have been able to find most TG patients within their electronic healthcare databases through diagnosis codes alone.
The articles reviewed contained several strengths. The use of CPs provides a low-cost, rapid approach to identify TG people who are missed by traditional structures. Roblin and colleagues provide a replicable SAS program to be applied in other healthcare systems with similar structures.13 Proctor and colleagues offer a hierarchal framework for enhancing a CP by using additional criteria to confirm TG status, and Blosnich and colleagues validated their CP of diagnosis codes through clinician text notes within the VHA health system, which is particularly useful when there is no gold standard of self-reported gender identity.16,17 Yee and colleagues developed a way to using differential SAB and self-reported gender to determine TG status in claims data.18 To help improve their PPV, Jasuja and colleagues had a technical panel of experts in clinical management of TG patients decide on which procedural codes to include, and also conduct chart reviews.19 These additional approaches added 31% of patients outside of diagnosis codes, and they found systematic differences between those found through diagnosis codes and those without. This is important for health disparities research, as those who were older were more likely to be found without gender-identity specific diagnosis codes, and also can be used to improve the diversity and generalizability of study samples as it did for Jasuja and colleagues.19
Quinn and colleagues were able to use key text-strings within data structures that had access to provider notes to create one of the largest cohorts (N=6500) to date for TG people, who can be hard to recruit into cohort studies based on stigmatization or marginalization in society.14 These key text-strings were carefully created through study stakeholders built within study design, who were also part of the TG community. Stakeholders also created a comprehensive list of hormone medications and procedures for gender affirmation, which may have helped increase the sensitivity of their CP. Guo and colleagues extended Quinn and colleague’s CP by using both structured and unstructured data, which gave them the opportunity to use self-reported gender identity data when available, added ICD-10 diagnosis codes, expanded the list of key text-strings to improve sensitivity, and created an automated algorithm that does not require extensive manual chart reviews.6 Their final reported CP was simple (at least one diagnosis code and one key text-string) and generalizable to other health systems with similar structures.
In addition, the data systems used were rich and highly detailed, such as Wolfe and colleagues paper which used the nation’s largest integrated health care system.21 Chyten-Brennan and colleagues were also able to use HIV-funded data that uniquely collected information on gender identity, further strengthening their ability to identify TG patients.23
However, the included studies also have limitations. Refinements of CPs are required to stay up to date on current terminology as TG terms will change over time. The most common false positive in CPs was the misuse of key text-strings by providers that were meant to discuss a relative or loved one of the patient, but not the patient themselves. Therefore, CPs must be careful to assess their validity. Further, it is difficult to validate CPs due to limited access to self-reported gender identity data, and although alternative methods were used in these studies, self-reported gender identity should be utilized as the gold standard.5,27,28 Self-reported gender identity data can lead to bias, particularly if there are no options for those who are non-binary to accurately report their gender identity. The findings of these studies are also not generalizable to the entire TG population since not all TG patients have a TG-related diagnosis, especially those who do not seek gender affirming care. Many studies are limited to the current health care system they are using (ex. Kaiser data, Medicaid data), which will also limit generalizability of the data. In addition to this, there are lack of protections to access gender affirming health care, and not all TG people disclose their TG gender identity to their providers or surveys. This means the true prevalence of TG patients within samples may be higher due to underreporting of transgender gender identities.
Findings from the included studies have provided avenues for future research. The use of more natural language processing methods to identify nuance in CP performance are needed, especially within studies that apply key text-string methods. Papers also call for the standardization of CPs for collection at the population level, and the utilization of accessible software to apply the CP to other healthcare systems with similar data structures. All studies advocated for the incorporation of the recommended two-step method of self-reported gender identification in both EHR and claims data sources, which is still lacking in many data structures and advocated for by other reviews of TG health research.5,29 This would also allow researchers to be able to identify transmasculine (TM) and transfeminine (TF) patients, since there are many health differences, such as differences in the need for recommended preventive screenings such as for cervical cancer.6,19 Future work also calls for more analysis on community level differences in nomenclature and terminology related to TG people of color, and there is a need for larger ongoing longitudinal studies where data is aggregated over time and across place to assess differences between TM, TF, and TG people of color. Authors also emphasize that it is imperative for the medical community to advocate on behalf of TG patients to ensure gender-affirming medical and surgical care is protected by federal law. To ensure more holistic stories of the data are depicted, additional mixed-methods studies are needed as evidence gaps remain for contextual factors specific to the TG experience.
Quantitative evidence of CPs used to identify TG patients can have high validity when self-reported gender identity is not available. While diagnosis codes relevant to TG status are primarily used, other forms of identification such as key text-strings and hormone prescriptions, non-specific endocrine disorder codes are useful additions to consider for researchers planning to use CPs in their TG health research.