Data analysis:
Table 1 presents key characteristics of all capture sites. Most site
characteristics (e.g., latitude, longitude, altitude) were obtained from
Google Earth based on coordinates determined on-site at the time of
capture. Other factors were known a priori (e.g., native versus
non-native status) or were obtained from peer-reviewed literature (e.g.,
Colwell’s predictability indices). Yet other factors (e.g., genetic
group membership) were determined from ongoing projects (Ravinet et al.
2018), and one factor, urbanization at the capture site, was quantified
by us using Google Earth. For this factor, we first found capture sites
in Google Earth using latitude and longitude data. A screen shot of the
site was then taken such that a 10 km radius transect from the capture
site was identifiable. ImageJ was then used to quantify the area within
this 10 km circle that was urbanized (e.g., obvious human-built
structures, which could be confirmed by higher-resolution images
assessable in Goole Earth).
Table 1 also lists dates of birds capture from each location. Whereas
time of year (i.e., phase of the breeding season) could have affected
gene expression, we could not design our study to avoid any such effect
for two reasons. First, field work had to coincide with the availability
of our collaborators at each site; alternative timing of sampling was
not possible. Second, breeding phenology of many sparrow populations is
unknown (e.g., Senegal, Vietnam). We did our best to sample birds
outside what we expected to be the breeding season for each population,
and except for Israel, rarely did we capture obviously immature
individuals.
As gene expression data were non-normal, all were log10transformed before analyses. Our first directive was to evaluate whether
there were interactions between gene and tissue shaping gene expression
across the populations in our study, which would determine whether we
should analyze gene expression for each tissue separately. First, we
constructed a univariate model using the lmer function from the lme4
package in r (Bates 2014) with log10 gene expression as
a response variable, and gene (factor, 3 levels), tissue (factor, 3
levels), and their interaction as fixed effects. Individual and country
were included as random effects to account for non-independence of
tissue samples from the same individuals and individuals sampled in the
same countries, respectively. We also calculated the adjusted
repeatability for random effects (individual and country) following
published methods (Nakagawa and Schielzeth 2010). We then used the ‘sim’
function from the ‘arm’ package (Gelman et al. 2007) to generate a
posterior distribution of estimates and report the posterior mode and
95% CI for repeatability estimates. These analyses revealed significant
gene:tissue interactions, therefore, we analyzed gene expression
separately in subsequent analyses to simplify interpretation.
Next, we determined the set of variables that best-explained among
country variation in gene expression for each gene by model selection
using the “dredge” function from the MuMIn package in R (Barton and
Barton 2015). First, we constructed 3 separate global models (i.e., one
for each gene) with gene expression as the response variable. All global
models included the same explanatory variables: tissue, population type
(native or non-native), temperature predictability (high or low),
precipitation predictability (high or low), genetic group (1 or 2),
urbanization, latitude, altitude, sex, and body mass at capture.
Individual was fit as a random effect to account for repeated measures
(i.e., tissues) within the same individual. We did not include country
as a random effect since this led to model singularity given that the
combination of values for fixed effects was unique to each country. The
“dredge” function runs all subsets of the global model. We then
determined the best-fit models based on AICc values. When alternative
models were within Δ2AICc of the top model, we retained these models and
used the ‘model.ave’ function from the MuMIn package and report the
average model results here.
Finally, given that our initial models revealed significant among
country and among individual repeatability, we assessed correlations in
gene expression across different levels (i.e., within individuals, among
individuals, among countries). Expression levels for each gene were
treated as separate response variables, and we included tissue as a
fixed effect to account for tissue specific differences in gene
expression. Individual ID (band) and study population (country) were
also included as random effects, which allowed us to estimate
covariation between response variables at the among-country,
among-individual, and within-individual (i.e., residual) levels. Models
were constructed using the MCMCglmm function in R (Hadfield 2010).
Models were run for 106,000 iterations with a burn-in of 6000 and
thinning of 100. Thus, 1000 estimates were retained for estimating the
posterior distributions. We extracted among country, among individual,
and within individual correlations between each pairwise combination of
genes following published methods (Houslay and Wilson 2017). Results
presented here used a non-informative inverse-Wishart prior, however, we
verified that results were robust across different prior specifications,
which they were (results not shown). Correlations presented are
posterior modes and 95% confidence intervals.