The genetic diversity present in cultivated plant varieties generally only represents a small fraction of the total diversity present in the species from which the cultivar was derived (Kovach 2008).  This reduction in diversity is dramatically exemplified by soybean, wherein ~85% of North American breeding germplasm is derived from 18 landraces (Cornelious 2002).  Because of lack of variation in cultivars, the global diversity of a crop is commonly mined for beneficial alleles, such as novel alleles for pest-resistance.  These alleles can then be introgressed into cultivars via marker assisted selection.  Historically, the introgression of traits from wild or semi-wild germplasm has generally been limited to simple, Mendelian traits.  Such traits are easier to identify with confidence, they are less dependent on genetic background, and they are simpler to track during introgression.  Though wild yield alleles have been successfully identified using near isogenic lines, these methods are highly resource intensive and often miss relevant alleles(). 
            PI416937 is a Japanese landrace found in the pedigree of many major cultivars in the Southeastern US, most notably Woodruff (Boerma 2012).  Woodruff is 25% PI416937 by pedigree and was found to yield % of elite checks in Southern USDA Regional trials in year. PI416937 has many traits that distinguish it from more common elite soybean varities, including slow-wilting (Fletcher 2007; King 2009Abdel-Haleem 2012), expansive fibrous roots (Hudak 1996Pantalone 1996Busscher 2000; Purcell 2007; Abdel-Haleem 2010), aluminum tolerance (Goldman 1989; Bianchi-Hall 2000), large leaf surface area and overall drought stress tolerance in general (Goldman 1989; Sloane 1990). Many other lines derived from PI416937 exhibited substantial vigor relative to checks and had a significant yield advantage (). Thus, unlike the more common case in which exotic germplasm is used as a donor of a specific, Mendelian trait, PI416937 and its derived lines are a model for the effective use of exotic germplasm in producing immediate yield increases and providing diversity for long-term genetic gain.
            In this study, we aimed to use genome-wide marker data and the known pedigree information related to PI416937 in order to track exotic regions that were selected for and against over the course of the last 30 years.  The idea of exploiting breeding pedigrees to detect selected loci has been use previously in attempts to detect agronomically important loci in soybean (Shoemaker 1992; Lorenzen 1995; Sebastian 1995; Grainger 2013). The approach is analogous to transmission disequilibrium tests pioneered in animal genetics.  Released varieties or breeds are assumed to be the product of selection and thus alleles conferring superior fitness are expected to deviate from random (50%) transmission (BINK 2000).  While original versions of the test depend on a heterozygous parent, the test is easily adapted to selfing crops in which the entire F1 population of an inbred cross is assumed to be heterozygous for all segregating alleles (Jannink 2001). Though the approach is theoretically very powerful, previous studies employing it suffered from low marker density (Shoemaker 1992Lorenzen 1995) or gapped pedigrees that made rigorous statistical inference problematic (Grainger 2013).  Higher marker density allows for the confident inference of shared haplotypes and thus the ability to accurately define and count the number of crosses that truly test a locus for the influence of selection.
            Haplotype consolidation based on identity-by-descent (IBD) should theoretically benefit any genotyping strategy that does not result in complete knowledge of the allelic states of all polymorphisms in the population under study (Jordan 2005).  This conclusion stems from the fact that regions that are IBD will share not only the identical typed alleles but intervening untyped alleles as well (excluding novel mutations that have occurred since divergence from the last common ancestor).  Since there is generally a greater chance of these untyped alleles being the causal mutations underlying a phenotype, IBD more effectively reflects the phenotypic impact of possessing that genomic region.
            In this study we used a two-step process that infers which genomic regions are derived from the two parents and then which regions in the parents are derived from PI416937.  For all markers in the study, any cross which contained a PI416937 allele in one parent and a non-PI416937 allele in the other parent was considered a single test of that locus.  If the PI416937 allele was inherited in such tests more or less frequently than a binomial model would predict, then we considered this evidence for selection.  Identified alleles were independently tested across a range of environments and recombinant-inbred populations for which at least one parent had PI416937 in its pedigree.  

As potential validation for regions that appeared to be under positive selection from PI416937, we examined previous literature for quantitative trait loci (QTL) that had been previously mapped in studies involving PI46937. We also investigated these peak regions under selection for potential candidate genes that may be conferring a yield advantage. Regions we found under negative selection were investigated for QTL conferring traits considered agronomically undesirable for soybean production. We examined five breeding populations composed of F5 derived recombinant inbred lines (RILs) with PI416937 in their pedigrees to examine any relationship between regions under selection within the
RIL populations and our pedigree based analysis. These RIL populations had undergone phenotypic selection based upon visual agronomic traits (i.e., lodging, height, and visual estimation of yield). Selection occurred on individual F5 plants as well as F5 derived plant rows from selected plants. We also wanted to look at PI46937 derived lines from our analysis that were considered the highest yielding to see if their was a relationship between their phenotypic superiority and presence of regions under positive selection from PI416937 while also having an absence of regions under negative selection from PI416937. 

After identifying these regions from PI416937 under breeding selection, we wanted to examine the potential to introgress these exotic alleles into genomic regions that are low in diversity in North American germplasm. There are regions throughout the soybean genome which have been identified that have either lost diversity due to selection or have historically, over the last several decades of North American breeding, never had diversity to test against (). We examined if regions we found under selection from PI416937 overlapped with these regions of low diversity. The potential implications of this would be to have breeders explore targeted introgression of beneficial alleles from PI416937 into regions of low diversity, especially regions which historically have experienced little to no known diversity.