2.3 Identification and analysis of structural genes in the GSH
metabolic pathway
Based on previous studies (Moreno-Sanchez et al., 2018; Dorion et al.,
2021; Cui et al., 2023), we selected 12 key structural enzymes
(G6PDH , GCL , GGCT , GGT , GPX ,GR , GS , GST , IDH , LAP , OXP ,
and PGDC ) in the GSH metabolic pathway, all of which are encoded
by gene families. We then scanned and downloaded all CDS and protein
sequences encoding the 12 structural enzymes in G. hirsutum using
the sequence similarity blasting and keyword searching on the CottonFGD
website (https://cottonfgd.org/). Given that the GST gene
family is the largest and most complex family in the GSH pathway, in
order to comprehensively identify all gene members in the family, we
used the hidden Markov model (HMM) profile of GST (GST_N,
PF00043; GST_C, PF02798) from Pfam (http://pfam.xfam.org/) as the
query entry to perform a further BLAST search against the protein data
of the G. hirsutum genomes with a threshold of E-value ≤
1e− 5. Different search results were final merged and
redundant genes were removed. Meanwhile, the protein sequences of the
retrieved genes were further verified using the Conserved Domain Search
(https://www.ncbi.nlm.nih.gov/cdd/), SMART
(http://smart.embl-heidelberg.de/), and compared with the
annotated proteins in the Pfam database (http://pfam.xfam.org).
The CDS and protein lengths of all structural genes were analyzed using
TBtools (v. 2.083) software (Chen et al., 2023). Protein sequences of 12
gene families were aligned using ClustalW program in MEGA (v.11)
software (Tamura et al., 2021). Pairwise sequence similarity for each
gene family were calculated using the MatGAT software (Campanella et
al., 2003). The number of introns and exons of the genes was summarized
using the exon-intron structure function on the CottonFGD website.