Introduction

Leaf shape is a defining feature of how we recognise and classify plant species and, as such, is an important component of our relationship with nature. Floral morphology and features are largely invariant within species and therefore served as the basis for taxonomic classification. In contrast, leaf shape varies distinctly between and, often extensively, within species. There are some identifiable global trends, indicating convergent evolution, such as a narrowing and more defined serration of leaves toward latitudinal extremes (Peppe et al., 2011; Royer, McElwain, Adams, & Wilf, 2008; Royer, Wilf, Janesko, Kowalski, & Dilcher, 2005; Traiser, Klotz, Uhl, & Mosbrugger, 2005). Despite the vast diversity of leaf shapes represented across plant species, our knowledge of the genetic architecture of natural variation in leaf shape, the evolutionary drivers and adaptive significance of that variation or the molecular control of leaf development remains relatively limited (D. H. H. Chitwood & Sinha, 2016; Ichihashi et al., 2014).
Leaf organogenesis is initiated within an apical meristem from a group of dividing, undifferentiated initial (meristematic) cells (Clark, 1997). As the meristematic cells divide, the daughter cells reach the meristem peripheral zone and enter determinate growth. Once this initiation phase is complete, a boundary region between the meristem and the outgrowing lateral organ is established (Sluis & Hake, 2015), followed by expansion of leaf primordia (Cleland, 2001). Leaf growth is a tightly coordinated process involving both cell proliferation (division) and expansion (Czesnick & Lenhard, 2015), both of which are coordinated by polar hormone distributions (Moon & Hake, 2011; Sluis & Hake, 2015). Leaf morphogenesis continues with growth in the proximo-distal, adaxial-abaxial, and medial-lateral axes, with polarised cell division and expansion along each axis creating a flat leaf lamina and with spatially varying rates of expansion and division determining basic leaf shape (Moon & Hake, 2011; Tsukaya, 2005).
A number of genes with a central role in the control of leaf primordia initiation and subsequent leaf development and pattern formation have been identified from forward genetic screens, largely inArabidopsis thaliana (Tsukaya, 2005). For example, genes including the NAC transcription factors NO APICAL MERISTEM andCUP SHAPED COTYLEDONS (CUC ), are expressed at the boundary region to delineate outgrowing leaf primordia from the meristem (Aida, Ishida, Fukaki, Fujisawa, & Tasaka, 1997; Cheng et al., 2012; Souer, van Houwelingen, Kloos, Mol, & Koes, 1996; Vroemen, Mordhorst, Albrecht, Kwaaitaal, & de Vries, 2003). Most classically,ANGUSTIFOLIA (AN ) and ROTUNDIFOLIA (ROT ) act independently to control polar length and width expansion, respectively (Tsuge, Tsukaya, & Uchimiya, 1996). There are also a number of notable examples of large-effect genes underlying variation in leaf complexity (dissection), such as the A. thaliana HD-zip transcription factorLATE MERISTEM IDENTITY1 (LMI1) homolog underlying leaf morphs in cotton (Andres et al., 2017).
Although forward genetic screens have identified genes acting during leaf organogenesis, most studies have been conducted using A. thaliana mutants that target single genes. Although those genes can be shown to be essential for, or to contribute to, the control of leaf development, they are not necessarily those underlying natural variation. Leaf shape is a complex, multigenic trait (D. H. Chitwood, Kumar, et al., 2013; D. H. Chitwood, Ranjan, Martinez, et al., 2014; Gupta, Rosenthal, Stinchcombe, & Baucom, 2019), meaning that causal variation in protein coding sequence or expression of each contributing gene will be of small effect size in relation to the total variation in the population. As such, alternative approaches are needed to identify loci underlying natural variation. One such approach is to use genetic screens to identify genomic regions involved in the control of complex traits. For example, Genome Wide Association Studies (GWAS) or Quantitative Trait Locus (QTL) mapping can be used, with the results being integrated with targeted expression studies or used to select candidates for forward genetics validation. A number of studies have used such integrative approaches to study leaf shape. Xiao et al.(2014) used a systems genetics approach to study the genetic architecture of leaf development in a Brassica rapa double haploid population, identifying a cohort of candidate genes. Those candidates included well-characterised examples such asAINTIGUMENTA (ANT ), CUC2 and GIBBERELIN 20-OXIDASE 3 (GA20OX3 ) in addition to numerous genes with no currently assigned function during leaf development. Tight genetic regulation of leaf traits has also been reported in Vitis vinifera, with grape leaves displaying large-scale morphological variation among cultivars. Chitwood et al. (2014) and Gupta et al. (2019) showed that many leaf shape traits, including venation patterning, are highly heritable and that genes displaying differential expression can be identified between cultivars with contrasting leaf shapes. A number of studies of leaf shape have similarly been performed using tomato introgression lines and backcross inbred line populations of crosses between domesticated tomato (Solanum lycopersicum ) and wild Solanum spp. (D. H. Chitwood, Kumar, et al., 2013; D. H. Chitwood, Ranjan, Kumar, et al., 2014; Fulop et al., 2016), including transcriptional studies (D. H. Chitwood, Maloof, & Sinha, 2013; Ichihashi et al., 2014; Ranjan et al., 2016). Phenotypic and expression GWAS in A. thaliana and transcription profiling in recombinant inbred lines of Zea mays have shown that the balance of cell division and expansion determining leaf growth differs among genetic backgrounds (Clauw et al., 2016).
Populus is an important model system for genomics, ecological and evo-devo studies. Populus species were appropriately described by Stettler & Bradshaw (1996) as being ”replete with variation”, a statement that is particularly relevant to their extensive variation in leaf shape. Within the genus, the aspen species P. tremula andP. tremuloides , for example, contain extensive intra-specific natural variation in leaf shape (Barnes, 1969, 1975; Bylesjö et al., 2008), heterophylly along extending long shoots (Cox, 2005; Curtis & Lersten, 1978) and striking heteroblasty between leaves produced at the extending shoot apical meristem and those of short shoot buds (Critchfield, 1960). The pre-formed leaves of most aspen species have flattened petioles that cause the characteristic trembling or quaking of the leaf lamina in very light wind. A number of QTL studies have been performed using hybrid Populus crosses, which have highlighted the polygenic control and heritability of leaf physiognomy traits (Wuet al. , 1997; Rae et al. , 2006; Lindtke et al. , 2013; Drost et al. , 2015). Using a pseudo-backcross pedigree of narrow-leaf P. trichocarpa (section Tacamahaca ) and broad-leaved P. deltoides (section Aigeiros ) Drostet al. (2015) identified a major QTL for leaf lamina width and length:width ratio. The mapped locus contains an ADP-ribosylation factor (ARF), which is a strong candidate gene for the regulation of leaf morphology. Street et al. (2008, 2011) performed systems biology analyses to link available microarray expression data to leaf physiognomy QTLs, identifying GRF s (Growth-Regulating Factors ) as candidate genes controlling leaf development. There are also a small number of reports of aberrant or altered leaf phenotypes resulting from genetic transformation studies (Du, Mansfield, & Groover, 2009; Ratke et al., 2018; Rottmann et al., 2000).
We used a GWAS of physiognomy traits in aspen to demonstrate the complex genetic architecture underlying the observed phenotypic variation. Gene expression data were used to identify genes differentially expressed between groups of genotypes with contrasting leaf shapes and these were examined for the presence of SNPs associated with expression variation. To examine whether genes underlying natural variation in these complex traits were central within the developmental transcriptional program, we generated co-expression networks from developmental series of terminal and pre-formed leaves. We examined the developmental profiles of GWAS candidate genes and explored their co-expression network centrality within a leaf development co-expression network. Furthermore, we used a population gene co-expression network to characterise the centrality of candidate genes associated with shape variation in this network and to investigate signatures of selection. Taken together, the results from our analysis indicate that leaf shape variation in aspen is highly polygenic, and is associated with numerous small effect-size variants affecting genes expressed during leaf development.

Materials and Methods

Leaf shape phenotyping in the Swedish Aspen collection

Leaf size and shape parameters were measured in a natural population ofPopulus tremula , the Swedish Aspen (SwAsp) collection, growing in common gardens at Sävar, northern Sweden (63.9°N, 20.5°E) and Ekebo, southern Sweden (55.9°N, 13.1°E). The common garden trials comprised of natural (wild-growing) aspen genotypes collected in 2003 across ten latitudinal degrees, which were cloned and planted in 2004 in a randomised block design in each garden (Luquez et al., 2008). Leaf samples were harvested in Sävar on 14 July 2008 and 28 June 2011 and in Ekebo on 18 July 2008 and 4 August 2011, when leaves were fully expanded and mature, but before the occurrence of substantial damage due to herbivory or the presence of fungal rust infection. Ten undamaged leaves per replicate tree were sampled randomly across the canopy, avoiding leaves from the first or last leaf in a leaf cohort originating from a single bud. In total, in 430, 444, 326 and 393 trees were sampled in Ekebo 2008, Ekebo 2011, Sävar 2008 and Sävar 2011 respectively, comprising between 1 and 8 (median = 3) clonal replicates. One hundred and thirteen genotypes were sampled in both years in Ekebo and in 2011 in Sävar, and 111 genotypes were sampled in 2008 in Sävar. Leaves were stored at 4° - 8°C immediately after harvest. Petioles were removed at the leaf base and the sample of ten leaves per tree was scanned in colour at 300 dpi using a CanoScan 4400F. A 5 x 4 cm Post-it note was scanned as a scale image. The resulting images were analysed using LAMINA (Bylesjö et al., 2008) to obtain leaf size and shape metrics. Median values of the ten leaves per tree were calculated for each leaf size and shape metric and the median value per individual was used for subsequent analyses, such as heritability and QST.

Statistical analyses

All statistical analyses of the SwAsp collection physiognomy data were conducted in R. Phenotypic data were examined for homogeneity of variance. No data transformations were required to meet the assumptions of a normal distribution. Pearson correlations were used for all phenotypic correlations calculated.
We calculated clonal repeatability (R) and used this to provide an upper-bound estimate of broad sense heritability (H 2). We refer to this trait asH2 rather than R as this probably allows a more intuitive interpretation for readers, however we note that the two are not the same (see Dohm, 2002 for discussion). Estimates of broad-sense heritability (H 2) and their 95% confidence intervals, including all clonal replicates, were calculated as
H2 = VG / (VG + VE)
where VG and VE are genetic and environmental variance components, using the heritability function in the R package ‘Heritability’. To estimate population differentiation,Q ST, the following formula was used:
QST=Vpop/(Vpop+(2*Vgeno))
where Vpop and Vgeno are the inter-population and genotype (i.e. inter-individual) genetic variance components, respectively.
Genetic correlations between phenotypes were calculated as
rG(AB) = VG(AB)/ √(VG(A)x VG(B))
where rG(AB), the genetic correlation of phenotype A and phenotype B, was calculated from the VG(AB), the genetic covariance in phenotype A and phenotype B, VG(A) and VG(B) were the genetic variances of phenotypes A and B respectively.
Genetic (clonal) variation for each phenotype between years and common gardens was investigated using analyses of variance (ANOVA) where phenotype was the dependent variable and environment, comprising year and garden, the independent variables using the following model:
Phenotype ~ Environment + Genotype + Environment x Genotype
ANOVA models were implemented in the aov function in R. All effects were considered significant at P<0.05.

Swedish Aspen collection RNA-Sequencing data

The RNA-Seq data used in this study has been described previously in Mähler et al. (2017). It consists of 219 samples distributed among 86 distinct genotypes. The same type of gene expression filtering and adjustment were used in this paper as in Mähler et al.(2017). Genes were required to have an expression variance above 0.05, and the first nine gene expression principal components were regressed out from the data. This left 22,306 genes for further analysis. The data have been uploaded to the European Nucleotide Archive (ENA) with accession number ERP014886.

Genome wide association mapping

A total of 4.5 million SNPs were considered for the GWAS, as detailed in Mähler et al. (2017). Best linear unbiased predictors (BLUPs) of the three leaf traits considered (leaf area, indent depth, and circularity) were calculated using the lmer function in the lme4 R package. The model used was specified in R as
y ~ garden + year + block + (1|genotype)
Where y is the phenotype, garden is the common garden where the phenotype was measured, year is the year the phenotype was measured, block is the location of the tree in the common garden, and genotype is the genotype of the sampled tree. A univariate linear mixed model was applied to the data using GEMMA v0.94 (Zhou & Stephens, 2014) and included days to bud set (Wang et al., 2018) as a covariate in order to account for population structure as well as the built-in estimation of a centred relatedness matrix to control for population structure and relatedness. GEMMA produces different statistics for significance, and in this study, we used P -values based on a likelihood ratio test. These P -values were consequently Benjamini-Hochberg adjusted for multiple testing for each garden and year separately using the p.adjust function in R. To estimate the proportion of variance explained (PVE) by all SNPs for each phenotype, a Bayesian sparse linear mixed model (BSLMM) was applied in GEMMA using the Markov chain Monte Carlo (MCMC) method (Zhou, Carbonetto, & Stephens, 2013). For each of the three traits, MCMC chain length was set to 1 000 000 steps with the first 250 000 discarded as burn-in and thinned to every 100thsample resulting in 10 000 independent draws from the posterior distribution. The median of the posterior distribution of the pve parameter was taken as a point estimate of the proportion of the phenotypic variance explained by all SNPs. To associate genes with SNPs, the v1.0 Populus tremula annotations from the PopGenIE.org web resource was used (Sundell et al., 2015), and any gene within 2 kbp of a SNP were said to be associated with that SNP.
The top 1,000 genes for each of the three traits were selected by ordering the associations by significance and then walking down the list until 1,000 unique genes had been selected.

GWAS gene set enrichment analysis

The gene set enrichment analysis performed was inspired by Subramanianet al. (2005). The most significant SNP within 2 Kbp of a gene was used to rank all genes found in the GWA results. For each gene set a running sum was made where the value was increased proportionally to theP -value of the most significant association within the gene, if the gene was in the gene set being tested, otherwise the value was decreased. The maximum value in this running sum acted as a test statistic, and 10,000 permutations of the gene ranking was performed, and empirical P -values were calculated. The genes in the gene set tested that contributed to the maximum score in the running sum,i.e. genes that occurred before or at the maximum in the running sum, were considered part of the leading-edge subset. The algorithm was implemented in C++ using Rcpp (Eddelbuettel & Balamuta, 2017) and is available athttps://gist.github.com/maehler/239e18e7d9f2b53c792c05f2aca5cebd. Multiple testing correction was performed using the qvalue function in the qvalue R package. Only gene sets containing more than five genes were considered for the enrichment test. The exponent parameter of the GSEA test was set to 1 for all tests.

Genotype extreme group differential expression

The top and bottom quartiles of the BLUPs for leaf area, indent depth, and circularity were contrasted in the gene expression data from the SwAsp population (Mähler et al., 2017) using DESeq2 (Love, Huber, & Anders, 2014). The data were not subset before creating the model, but rather contrasted on the top and bottom quartiles in order to not lose information from samples with intermediate phenotypic values. Genes with an adjusted P -value < 0.05 were considered differentially expressed. Functional enrichment tests were performed as detailed above.
When testing for differences in network centrality among differentially expressed genes, a randomisation scheme was employed in order to assert that there was no circular reasoning behind the connection between DE and network centrality. The sample labels within the top and bottom quartiles for leaf area were shuffled, and differential expression analysis was performed on this new dataset. This was repeated 100 times for leaf area in order to get an indication as to whether DE between random subsets of samples was inherently connected with network centrality.

Plant Material

We collected root cuttings of diameter 5-10 mm from a wild-growing clonal stand of Populus tremula in northern Sweden (as detailed in Sundell et al. (2017)) on 29th July 2013. We divided the root cuttings into 25 cm lengths and placed these onto pre-watered potting compost (K-jord, Hasselfors, Örebro, Sweden) in trays and then covered the root sections with 2 cm of additional compost. The trays were kept damp and placed in a greenhouse with mean day/night temperature of 20/15 °C, humidity 50-70 % and an 18-hour photoperiod. After three weeks, vegetative shoots of approximately 5 cm height were separated from the root sections and planted into two-litre pots containing potting compost (K-jord, Hasselfors, Örebro, Sweden) and grown in a greenhouse for 12 weeks (24h light, 22 degrees, 50-70 % humidity).
Leaf Plastochron Index (LPI; (Erickson & Michelini, 1957; Larson & Isebrands, 1971; Meicenheimer, 2014) has been extensively used forPopulus research as a means to sample leaves of equivalent developmental age from replicate plants. We first established that the production of terminal leaves in aspen obeys the assumptions of LPI.
We collected a developmental series of terminal leaves to perform morphological and transcriptional assays. The first fully unfurled leaf was defined as a reference point and was labelled leaf T0. Three leaves above the reference leaf (labelled as T-1, T-2, T-3) and the apical region, containing the shoot apical meristem and the very youngest leaf primordia (labelled T-4) and two leaves below the reference leaf (labelled T1, T2) were sampled from five replicate trees for RNA extraction and from an additional five replicate trees for morphological analyses. Samples collected for RNA extraction were immediately flash frozen in liquid nitrogen and stored at -80 °C.

RNA extraction

We extracted total RNA from terminal leaves above the reference leaf (samples T-1 through T-4) in the terminal leaf developmental series using the RNAqueous Micro Kit (Life Technologies, Carlsbad, CA, USA) according to the manufacturer’s guidelines. For all other terminal leaves (samples T0 through T2) and for all pre-formed leaf samples (first described by Mähler et al. 2017), RNA was isolated using the mirVana Kit (Life Technologies, Carlsbad, CA, USA) according to the manufacturer’s guidelines. We could not extract RNA from the smallest terminal leaf samples (samples T-1 through T-4) using the mirVana Kit (Life Technologies, Carlsbad, CA, USA) and we therefore used the RNAqueous Micro Kit (Life Technologies, Carlsbad, CA, USA) for those samples. DNA was removed using the DNA-freeTM DNA removal Kit (Life Technologies, Carlsbad, CA, USA) according to the manufacturer’s instructions. RNA purity was measured using a NanoDrop 2000 (Thermo Scientific) and RNA integrity assessed with a Plant RNA Nano Kit using a Bioanalyzer 2000 (Agilent Technologies) using the plant total RNA setting.

Leaf physiognomy

To enable analysis of leaf size and shape (physiognomy) parameters we sampled a terminal leaf developmental series for which a reference leaf was defined as above. We sampled one leaf younger than the reference leaf (leaf T-1), the reference leaf itself (leaf T0) and five subsequent leaves older than the reference leaf (leaf T1 to leaf T5). Leaves younger than leaf T-1 were not suitable for use in this analysis. As such, the RNA set of samples profiled leaves younger than were represented in this physiognomy developmental series. This developmental series was sampled from five clonally replicated trees.
To obtain images for all leaves we first removed the petioles and subsequently scanned the leaves on a flatbed scanner (Canon LiDE210, New York, USA). Scanned images were saved as 300 dpi colour JPEG files. Leaf shape parameters were then calculated using LAMINA (Bylesjö et al., 2008). Quantified traits included leaf area, length, width, serration number and dimensions in addition to a number of calculated traits such as aspect ratio (length:width) and circularity. Principal components were calculated for use in Table 1.

RNA sequencing data pre-processing

RNA-Seq of the long-fraction of the extracted total RNA was performed by the Beijing Genome Institute using the Illumina sequencing platform with mRNA assayed using > 20 million 2 x100 bp paired-end reads per sample. The data are deposited in the European Nucleotide Archive (ENA) as accession PRJEB31491. Protocol details were as presented in (Nystedt et al., 2013).
An in-house pipeline that combines a number of existing tools was used for data processing and expression value calculation (Delhomme et al., 2015): Sequence data quality was assessed using FastQC/0.10.1 (Andrews, n.d.). Sequence reads originating from ribosomal RNAs (rRNA) were identified and removed using SortMeRNA/1.9 (Kopylova, Noé, & Touzet, 2012) for which libraries rfam-5s, rfam-5.8s, silva-arc-16s, silva-bac-16s, silva-euk-18s, silva-arc-23s, silva-bac-23s and silva-euk-28s were used. The sequence data were then processed to remove low quality bases or entire reads and adapter contamination was removed using Trimmomatic/0.32 (Bolger, Lohse, & Usadel, 2014). The trimming parameters used were: SLIDINGWINDOW:5:20 MINLEN:50 while trimming the adapter TruSeq3-PE. After each of rRNA removal and quality filtering the remaining sequence data were assessed again using FastQC/0.10.1. The reads were mapped to the de novo Populus tremula v1.0 genome (pre-release genome available at PopGenIE.org) using STAR/2.4.0f1 (Dobin et al., 2013) with the settings –runThreadN 16, –readFilesCommand zcat, –limitBAMsortRAM, –outQSconversionAdd -31, –outSAMtype BAM SortedByCoordinate, –outSAMstrandField intronMotif, –outSAMmapqUnique 254, –outWigType bedGraph, –outFilterMultimapNmax 100, –alignIntronMax 11000, –chimSegmentMin 1, –sjdbGTFfile Potra01-gene-wo-intron.gtf –quantMode TranscriptomeSAM. Alignments to features were counted to enable gene loci expression quantification using HTSeq/0.6.1 (Anders, Pyl, & Huber, 2014), for which a GFF3 file containing a representative transcript per coding loci was used.

Differential expression inference and functional enrichment analysis

The gene expression data were first visually assessed by performing a Principal Component Analysis (PCA) and clustered heatmaps using blind variance stabilized data (Lin, Du, Huber, & Kibbe, 2008). This identified one sample of leaf 4 from the terminal leaf development series as a clear outlier. We therefore excluded this sample from subsequent analysis. Differentially expressed genes (DEGs) between developmental stages were inferred within the R framework (R Core Team, 2018) using DESeq2 (Love et al., 2014) with a formula including the replicate and developmental stage as a term (~ Replicate + Time).
Gene Ontology (Ashburner et al., 2000) functional enrichment (over-representation) of DEGs at P < 0.05 was analysed using an in-house implementation of the parent-child test (Grossmann, Bauer, Robinson, & Vingron, 2007) and PFAM domain (Finn et al., 2010) enrichments were calculated using a hypergeometric test.

Gene network inference

We inferred a gene co-expression network for the terminal leaf RNA-Seq dataset. The expression data were first filtered to only include genes with nonzero expression in at least\(\lceil\sqrt{N}\ \rceil\) samples, which were transformed to homoscedastic, asymptotically log2 counts using the regularized log transformation as implemented in DESeq2. Ten network inference methods were computed using the Seidr 0.9 toolkit (Schiffthaler, Serrano, Delhomme, & Street, 2018) – ARACNE, CLR, GENIE3, Narromi, PCor, Pearson, PLSNET, Spearman, SVM and Tigress. For each symmetric edge pair, the one with the higher score was kept in case of non-symmetrical scoring by the algorithm. The networks were aggregated using the inverse rank product method (Zhong, Allen, Xiao, & Xie, 2014) and edges were filtered according to the noise corrected backbone (Coscia & Neffke, 2017) at a sigma of 2.32 (which roughly corresponds to a P -value of 1%). Network partitions were identified using InfoMap (Rosvall & Bergstrom, 2008) with default settings. Network comparisons were performed using functions included in Seidr. Graph layout and images were created using Gephi 0.9.2. Node centrality statistics were calculated using Seidr 0.9, except for Kleinberg’s Hub and Authority, which was calculated using Gephi 0.9.2.