RESULTS
Regions appear to be under both positive and negative selection based upon pedigree analysis of high yielding PI416937 derived lines
Using the 52 trios we assembled we identified several regions under both positive and negative selection that can be traced back to PI416937. No regions were found to be statistically significantly under selection after performing a multiple testing correction (
Li 2005). Eickholt et al. (unpublished) identified a region on chromosome 8 significantly associated with a positive yield gain in a segregating RIL population after performing replicated yield testing. This region was referred to as YLD1 in Eickholt et al. (unpublished) and will be referred to likewise in this publication. YLD1 was identified as our eighth most significant region in terms of positive selection as it was tested in 41 trios and inherited to 28 high yielding progeny (p-value: 0.028). Any markers that met this level of statistical significance or greater, were determined by us to be regions with evidence of selection for or against. In total, we identified 9 genomic regions across 3 chromosomes under positive selection and 17 genomic regions across 7 chromosomes under negative selection across from PI416937 (Figure for all chromosomes).
In terms of the regions under positive selection, these ranged from a single marker to 76 markers. Regions were found on chromosomes 8 (2 regions), 13 (3 regions), and 17 (4 regions). The physical distance of the largest region under positive selection was 1,450,940 bp on chromosome 17. This regions was located between 796,471 bp and 1,309,414 bp. There were two separate genomic regions on chromosome 17 with the greatest evidence of positive selection (p-value: 0.002). The first region was a 9,223 bp region located within a larger significant region which was 206,570 bp. This peak region within this larger region was located between 2,409,261 bp and 2,418,484 bp. The second genomic region on chromosome 17 with the greatest evidence of positive selection was 985,307 bp long and located between 2,510,699 bp and 3,496,006 bp. There was a genomic region with equal evidence of positive selection (p-value: 0.002) located on chromosome 13. This was a 99,528 bp region located between 26,986,028 bp and 27,085,556 bp.
In terms of regions under negative selection, these regions ranged from a single marker to 137 markers. Regions were found on chromosomes 5 (2 regions), 8 (2 regions), 9 (2 regions), 12 (2 regions), 13 (6 regions), 16 (1 region), 17 (1 region), and 19 (1 region). The physical distance of the largest region under negative selection was 4,011,395 bp and located on chromosome 12. This region was located between 17,662,053 bp and 21,673,448 bp. There was a peak region with higher statistical significance within this larger region. This peak region was 2,748,460 bp long and located between 17,662,053 bp and 20,410,513 bp. The genomic region with the greatest evidence of negative selection (p-value: 0.002) was 82,263 bp long and located on chromosome 13 between 30,683,322 bp and 30,765,585 bp.
Overlap between QTL mapping studies involving PI416937 and our pedigree analysis
We looked for QTL that had been discovered in mapping studies involving PI416937 to see if there was overlap with favorable alleles from these mapping studies and regions we discovered under favorable selection. One region where there was overlap was with a QTL referred to as canopy wilt 2-6 which is located on chromosome 17 (
Abdel-Haleem 2012). Abdel Haleem et al. (2012) conducted the original QTL mapping study from RILs derived from a cross between PI416937 and Benning in order to map genetic loci associated with the canopy-wilting trait. PI416937 has been shown in previous literature to exhibit slow wilting when undergoing drought stress (
Sloane 1990). This is thought to be a beneficial trait for surviving prolonged droughts through limited transpiration at high vapor pressure deficit (VPD), allowing for conservation of soil moisture (
Tanaka 2010). Canopy wilt 2-6 was mapped between 2,201,427 bp and 5,892,120 bp. Our pedigree analysis revealed three favorable genomic regions from PI416937 that overlapped with this QTL. The first region is a single marker located at 2,202,411 bp. The second region is located between 2,246,668 bp and 2,453,238 bp. The third region is located between 2,510,699 bp and 3,496,006 bp. The peak region with highest significance was located between 2,409,261 bp and 2,418,484 bp. It is important to note that for canopy wilt 2-6, Abdel Haleem found the favorable allele for slow wilting to be inherited from Benning, not PI416937. This may indicate that there is favorable genetic material from PI416937 in this genomic region, but these favorable alleles are not conferring slow wilting under drought stress. Canopy wilt 2-6 was larger than the region we discovered under positive selection so it may be the case that the unfavorable slow wilting allele from PI416937 is located within the portion of the QTL mapped region which was not under significant positive selection in our pedigree analysis. Though PI416937 was found to be the unfavorable allele relative to Benning for canopy wilt 2-6, this may not be the case for PI416937 compared to other parental lines in our pedigree analysis, hence we obtained evidence for positive selection of this region in our analysis.
Advanced yield trials conducted by public breeders tend to be managed more intensively to reduce stressors such as drought. Many of the QTL mapped from PI416937 have been conducted to examine tolerance to drought related conditions so it may not be a surprise that QTL mapped from crosses involving PI416937 do not heavily overlap with regions we found associated with yield. Also, we are looking to identify regions that are under selection across diverse environments in predominantly North Carolina and Georgia as well as across diverse genetic backgrounds from these two breeding programs. These mapping studies may be identifying QTL which are more environment or population specific.
Pedigree analysis of chromosome 8 and subsequent yield analysis of NILs for YLD1
For the mixed model of the combined environments, there was a statistically significant difference in YLD1 (p-value: 0.0418) for yield but there also appeared to be significant GxE (p-value: 0.0027). Due to this observation, the analysis was broken up by location, to see the effect of YLD1 for each environment. For Athens, we saw a statistically significant difference for YLD1 in terms of yield. The NILs containing the PI416937 allele had a least squares means estimate of 4361 kg/ha compared to NILs containing the Boggs allele which had a least squares mean estimate of 4057 kg/ha. This difference in yield was statistically significantly different at an alpha of 0.05 . The NILs heterozygous at the YLD1 loci had a least squares means estimate of 4372 kg/ha. This was statistically significantly greater than the NILs containing the Boggs allele as well, but there was no significant difference from the NILs containing the PI416937 allele. NILs homozygous for the PI416937 allele matured four days later than lines homozygous for the 'Boggs' allele which was statistically significant. The heterozygous lines matured three day later than the lines homozygous for the PI416937 allele which was also statistically significant. This difference in maturity date may be explaining some of the differences in yield as there has been shown in the literature to be a relationship between maturity and seed yield for soybean (
Curtis 2000). We saw no overlap between mapped maturity related genes (
E1-E4, E7) and the YLD1 locus (
Molnar 2003). We also scanned Soybase () for maturity QTL that have been mapped in this region in other studies. We found three maturity related QTL mapped to chromosome 8. These QTL were pod maturity 13-1 (
Specht 2001), pod maturity beginning 1-3 (
Tasma 2001), and pod maturity 22-1 (
Reinprecht 2006). These three QTL were not found to be overlapping with the YLD1 locus but pod maturity 13-1 and pod maturity beginning 1-3 were approximately 5.6 Mb and 4.3 Mb downstream according to nearest sequence-based genetic markers associated with these QTL. When analyzing the data from Plains, we no longer saw a statistical difference in yield when comparing the three genotypic classes for YLD1. No maturity data was taken in Plains so we could not do a maturity date comparison. This indicates a possible genotype by environment interaction where the PI416937 is exhibited a yield benefit in Athens that is not being seen in Plains. There was no significant G x E for plant height so we combined locations in this analysis, NILs homozygous for the PI416937 allele were 4 cm taller than lines homozygous for the 'Boggs' allele and 1 cm shorter than lines that were heterozygous. These differences in height were not statistically significant nor seemed biologically relevant.
The YLD1 region was such a finely defined region (3.7 kb) in our pedigree analysis, that we decided to see what gene models were present using soybase. There was a single gene model present for Glyma.08g299800 which is a paralog to ATG24090.1, a chitinase A found in arabidopsis. Glyma.08g299800 is located from 41,795,912 bp to 41,796,546 bp which partially overlaps with the YLD1 region located from 41,792,467 bp to 41,796,167 bp. Chitinases are commonly associated with plant defense against fungal pathogens or insects as chitin is a common component of fungal cell walls and insect exoskeletons (
Sharma 2011). There are several QTL for various different traits that have been mapped to this region, but one interesting one related to fungal resistance was sclero 9-2 which is located from 39,910,959 bp to 44,689,972 bp according to nearest sequence-based genetic markers in soybase (
Guo 2008). Sclero 9-2 was mapped from a cross of PI 391589B by IA2053. The goal of this cross was to map resistance of
Sclerotinia sclerotiorum. The favorable allele for this QTL was inherited from IA2053 which was the moderately susceptible parent in the cross. It would be useful in the future to compare the YLD1 haplotypes of PI416937 and IA2053 for homology, indicating potential for shared fungal resistance due to this genomic region. We have thus far been unsuccessful in obtaining genomic data on IA2053 to perform said analysis.
Due to the climate conditions of the southeastern U.S., it is reasonable to assume that fungal pressure is a constant concern. In Georgia, from 2005-2013, an average of $2.2 million was spent annually on plant disease control from biotic stressors (). If the PI416937 allele at YLD1 is providing moderate resistance to fungal pathogens, it makes sense why it would show an association with yield in our pedigree analysis of southern U.S. breeding lines. Due to an unusually cool and wet growing season, 2013 was a year in which farmers spent $4.35 million on plant disease control. This was nearly double the average from 2005-2013. Due to the expectation of high disease pressure, the NIL yield trial had two fungicide applications applied in both Athens and Plains. Domark 230 ME (active ingredient: tetracomozole) was applied at 5 fl. oz. per acre the week of of July 29th, 2013 and September 2nd, 2013. The original purpose of yield testing the NILs was to observe if there were yield differences under ideal conditions so allowing these trials to experience disease pressure was not a consideration. That being said, we saw a significant difference in terms of yield in Athens so it's possible disease pressure was high enough due to an especially ideal year for disease pressure to have an impact, even with two fungicide applications. We saw this significant difference in yield in Athens but not Plains. This could be a product of differing levels of disease pressure in the two locations. The association of YLD1 with fungal pathogen resistance needs to be verified in further experiments.
Comparison of PI416937 regions under selection with regions of low diversity in North American elite breeding lines
Regions of the soybean genome have suffered reduced diversity