2.3 Identification and analysis of structural genes in the GSH metabolic pathway
Based on previous studies (Moreno-Sanchez et al., 2018; Dorion et al., 2021; Cui et al., 2023), we selected 12 key structural enzymes (G6PDH , GCL , GGCT , GGT , GPX ,GR , GS , GST , IDH , LAP , OXP , and PGDC ) in the GSH metabolic pathway, all of which are encoded by gene families. We then scanned and downloaded all CDS and protein sequences encoding the 12 structural enzymes in G. hirsutum using the sequence similarity blasting and keyword searching on the CottonFGD website (https://cottonfgd.org/). Given that the GST gene family is the largest and most complex family in the GSH pathway, in order to comprehensively identify all gene members in the family, we used the hidden Markov model (HMM) profile of GST (GST_N, PF00043; GST_C, PF02798) from Pfam (http://pfam.xfam.org/) as the query entry to perform a further BLAST search against the protein data of the G. hirsutum genomes with a threshold of E-value ≤ 1e− 5. Different search results were final merged and redundant genes were removed. Meanwhile, the protein sequences of the retrieved genes were further verified using the Conserved Domain Search (https://www.ncbi.nlm.nih.gov/cdd/), SMART (http://smart.embl-heidelberg.de/), and compared with the annotated proteins in the Pfam database (http://pfam.xfam.org).
The CDS and protein lengths of all structural genes were analyzed using TBtools (v. 2.083) software (Chen et al., 2023). Protein sequences of 12 gene families were aligned using ClustalW program in MEGA (v.11) software (Tamura et al., 2021). Pairwise sequence  similarity for each gene family were calculated using the MatGAT software (Campanella et al., 2003). The number of introns and exons of the genes was summarized using the exon-intron structure function on the CottonFGD website.