DISCUSSION
Within a variety of validation methods, several studies have found that
the CPs with the highest PPV to identify TG patients within their study
population contained diagnosis codes accompanied by other information,
such as procedural or prescription codes in claims data, and key
text-strings in EHR data. If key text-strings were not available, most
researchers have been able to find most TG patients within their
electronic healthcare databases through diagnosis codes alone.
The articles reviewed contained several strengths. The use of CPs
provides a low-cost, rapid approach to identify TG people who are missed
by traditional structures. Roblin and colleagues provide a replicable
SAS program to be applied in other healthcare systems with similar
structures.13 Proctor and colleagues offer a
hierarchal framework for enhancing a CP by using additional criteria to
confirm TG status, and Blosnich and colleagues validated their CP of
diagnosis codes through clinician text notes within the VHA health
system, which is particularly useful when there is no gold standard of
self-reported gender identity.16,17 Yee and colleagues
developed a way to using differential SAB and self-reported gender to
determine TG status in claims data.18 To help improve
their PPV, Jasuja and colleagues had a technical panel of experts in
clinical management of TG patients decide on which procedural codes to
include, and also conduct chart reviews.19 These
additional approaches added 31% of patients outside of diagnosis codes,
and they found systematic differences between those found through
diagnosis codes and those without. This is important for health
disparities research, as those who were older were more likely to be
found without gender-identity specific diagnosis codes, and also can be
used to improve the diversity and generalizability of study samples as
it did for Jasuja and colleagues.19
Quinn and colleagues were able to use key text-strings within data
structures that had access to provider notes to create one of the
largest cohorts (N=6500) to date for TG people, who can be hard to
recruit into cohort studies based on stigmatization or marginalization
in society.14 These key text-strings were carefully
created through study stakeholders built within study design, who were
also part of the TG community. Stakeholders also created a comprehensive
list of hormone medications and procedures for gender affirmation, which
may have helped increase the sensitivity of their CP. Guo and colleagues
extended Quinn and colleague’s CP by using both structured and
unstructured data, which gave them the opportunity to use self-reported
gender identity data when available, added ICD-10 diagnosis codes,
expanded the list of key text-strings to improve sensitivity, and
created an automated algorithm that does not require extensive manual
chart reviews.6 Their final reported CP was simple (at
least one diagnosis code and one key text-string) and generalizable to
other health systems with similar structures.
In addition, the data systems used were rich and highly detailed, such
as Wolfe and colleagues paper which used the nation’s largest integrated
health care system.21 Chyten-Brennan and colleagues
were also able to use HIV-funded data that uniquely collected
information on gender identity, further strengthening their ability to
identify TG patients.23
However, the included studies also have limitations. Refinements of CPs
are required to stay up to date on current terminology as TG terms will
change over time. The most common false positive in CPs was the misuse
of key text-strings by providers that were meant to discuss a relative
or loved one of the patient, but not the patient themselves. Therefore,
CPs must be careful to assess their validity. Further, it is difficult
to validate CPs due to limited access to self-reported gender identity
data, and although alternative methods were used in these studies,
self-reported gender identity should be utilized as the gold
standard.5,27,28 Self-reported gender identity data
can lead to bias, particularly if there are no options for those who are
non-binary to accurately report their gender identity. The findings of
these studies are also not generalizable to the entire TG population
since not all TG patients have a TG-related diagnosis, especially those
who do not seek gender affirming care. Many studies are limited to the
current health care system they are using (ex. Kaiser data, Medicaid
data), which will also limit generalizability of the data. In addition
to this, there are lack of protections to access gender affirming health
care, and not all TG people disclose their TG gender identity to their
providers or surveys. This means the true prevalence of TG patients
within samples may be higher due to underreporting of transgender gender
identities.
Findings from the included studies have provided avenues for future
research. The use of more natural language processing methods to
identify nuance in CP performance are needed, especially within studies
that apply key text-string methods. Papers also call for the
standardization of CPs for collection at the population level, and the
utilization of accessible software to apply the CP to other healthcare
systems with similar data structures. All studies advocated for the
incorporation of the recommended two-step method of self-reported gender
identification in both EHR and claims data sources, which is still
lacking in many data structures and advocated for by other reviews of TG
health research.5,29 This would also allow researchers
to be able to identify transmasculine (TM) and transfeminine (TF)
patients, since there are many health differences, such as differences
in the need for recommended preventive screenings such as for cervical
cancer.6,19 Future work also calls for more analysis
on community level differences in nomenclature and terminology related
to TG people of color, and there is a need for larger ongoing
longitudinal studies where data is aggregated over time and across place
to assess differences between TM, TF, and TG people of color. Authors
also emphasize that it is imperative for the medical community to
advocate on behalf of TG patients to ensure gender-affirming medical and
surgical care is protected by federal law. To ensure more holistic
stories of the data are depicted, additional mixed-methods studies are
needed as evidence gaps remain for contextual factors specific to the TG
experience.
Quantitative evidence of CPs used to identify TG patients can have high
validity when self-reported gender identity is not available. While
diagnosis codes relevant to TG status are primarily used, other forms of
identification such as key text-strings and hormone prescriptions,
non-specific endocrine disorder codes are useful additions to consider
for researchers planning to use CPs in their TG health research.