Code
<- c("tidyverse", "ape", "geiger", "nlme", "phytools", "rr2")
packages # Package vector names
You will need two datasets, that will be provided for you:
A data.frame with species traits – furnariides_traits.csv
A phylogenetic tree – furnariides_tree.nex
The clade we will work on today is the Furnariides (Aves: Passeriformes), also known as the largest continental endemic radiation (Pinto-Ledezma et al., 2019). We will use the phylogenetic tree reconstructed by Jesús. The trait data correspond to several morphological measurements of birds from AVONET (Tobias et al., 2022).
You will need to have a set of R packages to do this lab. Install the following packages:
<- c("tidyverse", "ape", "geiger", "nlme", "phytools", "rr2")
packages # Package vector names
If you don’t want to install the packages one by one, you can use the next command.
# Install packages not yet installed
<- packages %in% rownames(installed.packages())
installed_packages
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages], dependencies = TRUE)
}
This command, will, first, check if you already the packages installed, then if a package is not installed in your computer, will install it.
Load installed packages:
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.4.0 ✔ purrr 1.0.1
✔ tibble 3.1.8 ✔ dplyr 1.0.10
✔ tidyr 1.2.1 ✔ stringr 1.5.0
✔ readr 2.1.3 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
library(ape)
library(geiger)
library(nlme)
Attaching package: 'nlme'
The following object is masked from 'package:dplyr':
collapse
library(phytools)
Loading required package: maps
Attaching package: 'maps'
The following object is masked from 'package:purrr':
map
library(rr2)
Attaching package: 'rr2'
The following object is masked from 'package:ape':
binaryPGLMM
Set up a working directory. Tell R that this is the directory you will be using, and read in your data:
setwd("path/for/your/directory")
Load the data. Instead of reading files from the computer we will pull the required data directly from the internet.
## Trait data
<- read_csv("https://raw.githubusercontent.com/jesusNPL/BiodiversityScience/master/Spring2023/Lab_1/Data/furnariides_traits.csv") %>%
furnaData column_to_rownames("Sciname") %>% # we are using the column "Sciname" as rownames
drop_na(Range.Size)
Rows: 652 Columns: 38
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): Sciname, Species3, Family3, Order3, Mass.Source, Mass.Refs.Other, ...
dbl (23): Total.individuals, Female, Male, Unknown, Complete.measures, Beak....
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Phylogenetic data
<- read.nexus("https://raw.githubusercontent.com/jesusNPL/BiodiversityScience/master/Spring2023/Lab_1/Data/furnariides_tree.nex") furnaTree
Another option is downloading the data and storing it on your computer. You can use the following lines to do that. These lines will do: 1) check your working directory, 2) download the lab data in a zip file, and 3) unzip the downloaded data.
<- getwd() # Will get the working directory
main.dir
<- "https://www.dropbox.com/s/z80ba99eyzmo6p3/Lab_1.zip?dl=1" # Name of the file to download
urls
download.file(url = urls, file.path(main.dir, "Data/Lab_1.zip"), mode = "wb") # download the file in a specific folder
unzip("Data/Lab_1.zip")
OK. You should be ready to go.
Let’s inspect the data first, to do that we will use the function “glimpse()” of the R package {dplyr}
glimpse(furnaData)
Rows: 651
Columns: 37
$ Species3 <chr> "Conopophaga ardesiaca", "Conopophaga aurita", "Con…
$ Family3 <chr> "Conopophagidae", "Conopophagidae", "Conopophagidae…
$ Order3 <chr> "Passeriformes", "Passeriformes", "Passeriformes", …
$ Total.individuals <dbl> 13, 6, 6, 9, 9, 4, 18, 12, 6, 9, 12, 8, 15, 42, 19,…
$ Female <dbl> 4, 2, 3, 4, 3, 1, 10, 3, 2, 1, 2, 2, 4, 8, 5, 3, 4,…
$ Male <dbl> 9, 4, 3, 4, 6, 2, 8, 5, 4, 4, 9, 3, 8, 27, 11, 8, 5…
$ Unknown <dbl> 0, 0, 0, 1, 0, 1, 0, 4, 0, 4, 1, 3, 3, 7, 3, 0, 1, …
$ Complete.measures <dbl> 6, 4, 4, 4, 4, 3, 4, 3, 4, 4, 4, 3, 4, 10, 5, 4, 4,…
$ Beak.Length_Culmen <dbl> 17.5, 15.6, 20.9, 15.2, 16.4, 20.4, 16.0, 17.7, 16.…
$ Beak.Length_Nares <dbl> 9.0, 9.3, 9.6, 9.6, 8.9, 11.4, 9.9, 9.9, 9.4, 63.2,…
$ Beak.Width <dbl> 5.7, 5.8, 5.9, 5.2, 4.9, 6.2, 5.6, 6.2, 5.7, 4.3, 4…
$ Beak.Depth <dbl> 4.3, 4.4, 4.4, 4.6, 4.0, 5.2, 4.9, 4.5, 4.2, 5.8, 5…
$ Tarsus.Length <dbl> 30.3, 27.2, 29.1, 26.9, 28.1, 32.7, 25.5, 25.6, 26.…
$ Wing.Length <dbl> 70.8, 67.7, 74.4, 69.6, 70.0, 79.2, 65.2, 65.6, 69.…
$ Kipps.Distance <dbl> 9.8, 9.6, 8.8, 6.3, 7.8, 9.3, 8.6, 11.2, 7.1, 13.9,…
$ Secondary1 <dbl> 61.4, 58.0, 65.4, 65.5, 62.6, 70.4, 55.4, 54.7, 62.…
$ `Hand-Wing.Index` <dbl> 13.9, 14.2, 11.8, 8.8, 11.0, 11.8, 13.4, 17.1, 10.2…
$ Tail.Length <dbl> 45.2, 32.5, 42.1, 43.0, 45.3, 41.2, 28.3, 30.8, 35.…
$ Mass <dbl> 26.30, 26.30, 27.00, 22.00, 25.21, 42.50, 20.10, 23…
$ Mass.Source <chr> "Dunning", "Dunning", "Dunning", "EltonTraits_Model…
$ Mass.Refs.Other <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Inference <chr> "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO", "NO…
$ Traits.inferred <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Reference.species <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ Habitat <chr> "Forest", "Forest", "Forest", "Forest", "Forest", "…
$ Habitat.Density <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ Habitat_recode <chr> "Forest", "Forest", "Forest", "Forest", "Forest", "…
$ Migration <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ Trophic.Level <chr> "Carnivore", "Carnivore", "Carnivore", "Carnivore",…
$ Trophic.Niche <chr> "Invertivore", "Invertivore", "Invertivore", "Inver…
$ Primary.Lifestyle <chr> "Insessorial", "Insessorial", "Insessorial", "Inses…
$ Min.Latitude <dbl> -22.34, -11.46, -12.58, -9.55, -32.41, -15.14, -27.…
$ Max.Latitude <dbl> -12.44, 7.43, 8.04, -4.00, -12.65, -2.02, -6.60, -0…
$ Centroid.Latitude <dbl> -16.59, -3.08, -2.03, -7.65, -21.94, -7.84, -18.32,…
$ Centroid.Longitude <dbl> -66.99435, -62.26265, -76.24056, -35.89943, -48.840…
$ Range.Size <dbl> 100334.39, 3387968.70, 160640.32, 58582.69, 1896972…
$ Species.Status <chr> "Extant", "Extant", "Extant", "Extant", "Extant", "…
Let’s check if our trait data contain the same species as in the phylogeny
<- name.check(phy = furnaTree, data = furnaData)
tmp tmp
$tree_not_data
[1] "Philydor_novaesi"
$data_not_tree
character(0)
It indicates that the species Philydor novaesi is not present in the trait data, so let’s drop this species from the phylogeny.
<- drop.tip(phy = furnaTree, tip = tmp$tree_not_data) furnaTree
We can double check if our data match after dropping the missing species
name.check(phy = furnaTree, data = furnaData)
[1] "OK"
Now it seems that we are ready to go!
Let’s start by looking at the phylogeny of these birds and learning a bit about how to work with trees in R.
What does your tree look like?
plot(furnaTree)
Whoa. That’s ugly. Let’s clean it up.
plot.phylo(furnaTree, no.margin = TRUE, cex = 0.5)
Better. You can mess around with tree plotting functions in plot.phylo() as much as you’d like. Try this for example:
plot.phylo(furnaTree, type = "fan", no.margin = TRUE, cex = 0.3)
Pretty.
It may be useful to understand how trees are encoded in R. Typing in just the name of the tree file like this:
furnaTree
will give you basic information about the phylogeny: the number of tips and nodes; what the tips are called; whether the tree is rooted; and if it has branch lengths.
str(furnaTree)
will tell you more about tree structure. Trees consist of tips connected by edges (AKA branches)
$tip.label furnaTree
gives you a list of all your terminal taxa, which are by default numbered 1-n, where n is the number of taxa.
$Nnode furnaTree
gives you the number of nodes. This is a fully bifurcating rooted tree, so it has 1 fewer node than the number of taxa.
$edge furnaTree
This tells you the beginning and ending node for all edges.
Put that all together with the following
plot.phylo(furnaTree, type = "fan", no.margin = TRUE, cex = 0.7, label.offset = 0.1)
nodelabels(cex = 0.5)
tiplabels(cex = 0.5)
There are many ways to manipulate trees in R using {Ape}, {Phytools}, and other packages. This just gives you a bare-bones introduction.
Let’s ask some questions using the trait data that were measured for these birds. First, explore the data in the “furnaData” matrix. Here are some options for visualizing data matrices:
%>%
furnaData head() # this will show you the first few rows of your data matrix and its header
%>%
furnaData dimnames() # this will show you the row and column headers for your matrix
%>%
furnaData View() # this will let you visualize the entire matrix
After looking at the data, please answer the next questions
What variables were measured for each of these species of Furnariides? How many species of Furnariides were used?
Is that the same number that were in your phylogeny?
Hand-wing index is one of your variables. Let’s isolate it so we can work with it easily:
<- furnaData[, "Hand-Wing.Index"]
hwi names(hwi) <- rownames(furnaData)
# data vectors have to be labelled with tip names for the associated tree.
# This is how to do that.
hist(hwi)
In lecture, Brie talked about one model of character evolution, called a Brownian Motion model. This model assumes that a trait evolves from a starting state (z0) according to a random walk with the variance specified by the rate parameter \(\sigma^{2}\) (sigma-squared). In short, Brownian motion describes a process in which tip states are modeled under the assumption of a multivariate normal distribution. On a phylogeny, the multivariate mean of tip states is equal to the root state estimate, and variance accumulates linearly through time.
What does Brownian Motion evolution of hand-wing index in Furnariides look like?
<- fitContinuous(furnaTree, hwi)
brownianModel # this will show you the fit statistics and parameter values brownianModel
Here, you can see the estimates for ancestral state (z0), and the rate parameter (\(\sigma^{2}\)), as well as some measures of model fit. The fit of the model is determined using maximum likelihood, and expressed as a log likelihood (lnL). The higher the lnL, the more probable the data given the model. However, when comparing different models, we can’t use the lnL, because it does not account for the difference in the number of parameters among models. Models with more parameters will always fit better, but do they fit significantly better? For example an OU model has 4 parameters (alpha [\(\alpha\)], theta [\(\theta\)], z0, and sigma-squared [\(\sigma^{2}\)]), so it should fit better than a BM model, which includes only z0 and sigsq. To account for this, statisticians have developed another measure of fit called the AIC (Akaike Information Criterion): AIC = (2xN)-2xlnL, where N is the number of parameters. This penalizes the likelihood score for adding parameters. When selecting among a set of models, the one with the lowest AIC is preferred. We will use this information later on in this lab.
In addition to assessing model fit, we can use the Brownian Motion model to reconstruct ancestral states of a character on a tree. To visualize what BM evolution of this trait looks like on a tree. The contMap() command in phytools estimates ancestral states and plots them on a tree.
contMap(furnaTree, hwi, fsize = 0.5, lwd = 3)
Describe the evolution of Hand-wing index on this tree. How many times have exteremely high and extremely low Hand-wing index evolved on this tree?
What does this say about our ability to test hypotheses about the evolution of Hand-wing index?
Let’s go ahead and test some hypotheses. Range Size is another trait in your data matrix. Let’s assess whether there is a correlation between HWI and body mass? We will extract the column “Range Size” from the datamatrix and assign it species names, just as we did for “HWI” above.
<- furnaData[, "Range.Size"]
rangeSize names(rangeSize) <- rownames(furnaData)
Let’s look at a plot of range size as a function of Hand-wing index.
%>%
furnaData drop_na(Range.Size) %>%
ggplot(aes(x = `Hand-Wing.Index`, y = log(Range.Size))) +
geom_point(alpha = 0.5, color = "darkgray") +
labs(x = "Hand-Wing Index", y = "log(Range Size)")
Hm. looks promising. How would you describe the relationship between these two variables?
Why did we log scale range size?
Let’s be more quantitative in describing that relationship with a linear model.
<- lm(log(rangeSize) ~ hwi)
lm_hwi_rangesize summary(lm_hwi_rangesize)
%>%
furnaData drop_na(Range.Size) %>%
ggplot(aes(x = `Hand-Wing.Index`, y = log(Range.Size))) +
geom_point(alpha = 0.5, color = "darkgray") +
labs(x = "Hand-Wing Index", y = "log(Range Size)") +
geom_smooth(method = "lm")
`geom_smooth()` using formula = 'y ~ x'
The coefficients table from the summary() command shows the slope and intercept for the linear model describing range size as a function of Hand-wing index. Each line shows the estimated coefficient (Estimate), the standard error (Std. Error) of that estimate, as well as a t-statistic and associated p-value, testing whether those parameters are equal to 0. The Multiple R-squared is an estimate of how much variance in the response variable can be explained by the predictor variable.
Write the linear model for this relationship. Are the parameters significantly different from 0?
What is the R^2 value for this data?
How do you feel about that?
Nice. But. We have not considered the fact that these birds are related to each other, in fact, all this birds are monophyletic–i.e., the clade includes an ancestral taxon and all of its descendants. As such, they may share their hand-wing index and range size simply due to the fact that their ancestors had large HWI and range size or the reverse. In other words, we need to account for non-independence of residuals due to phylogeny. One way to do that is to use phylogenetic-generalized-least-squares regression (PGLS).
<- gls(log(rangeSize) ~ hwi,
pglsModel correlation = corBrownian(phy = furnaTree, form=~names(rangeSize)),
method = "ML")
Let’s break this command down. This command infers a linear model for Range Size as a function of HWI (gls(rangeSize ~ hwi), but it specifies existing correlation structure in the data (correlation =) as the covariance of these traits assuming a Brownian motion model (corBrowinan()) based on the Furnariides tree (phy = furnaTree) and a correctly ordered list of taxon names (form=~names(rangeSize)). The model is fit using maximum likelihood (method = “ML”). To see the results:
summary(pglsModel)
coef(pglsModel)
R2(pglsModel)
%>%
furnaData drop_na(Range.Size) %>%
ggplot(aes(x = `Hand-Wing.Index`, y = log(Range.Size))) +
geom_point(alpha = 0.5, color = "darkgray") +
labs(x = "Hand-Wing Index", y = "log(Range Size)") +
#geom_smooth(method = "lm") +
geom_abline(intercept = coef(pglsModel)[1], slope = coef(pglsModel)[2],
color = "red", linewidth = 1.5)
# will plot the pgls regression line on your biplot.
Write the linear model for this relationship. Are the parameters significantly different from 0?
What is the R^2 value for this data? (use the likelihood-based R^2 value)
How do you feel about that?
Compare results from the pgls analysis with those that you got from the regular linear model you ran earlier.
Brownian Motion is only one model of evolution for a continuous variable. Another model is the Ornstein-Uhlenbeck (OU) model, which allows the trait mean to evolve towards a new state (theta), with a selective force (alpha). These two new parameters, plus the starting state (z0) and the rate of evolution (sigsq) parameters from the BM model, make for a 4-parameter model. The Early Burst (EB) model allows the rate of evolution to change across the tree, where the early rate of evolution is high and declines over time (presumably as niches are filled during an adaptive radiation. The rate of evolution changes exponentially over time and is specified under the model r[t] = r[0] x exp(a x t), where r[0] is the initial rate, a is the rate change parameter, and t is time. The maximum bound is set to -0.000001, representing a decelerating rate of evolution. The minimum bound is set to \(log(10^{-5})\)/depth of the tree.
Let’s evaluate the relative fit of these three models to the Hand-wing index trait.
<- fitContinuous(furnaTree, hwi) brownianModel
<- fitContinuous(furnaTree, hwi, model = "OU") OUModel
<- fitContinuous(furnaTree, hwi, model = "EB") EBModel
And recover the parameter values and fit estimates.
brownianModel
OUModel EBModel
Compare all models and select the best fitting model.
aicw(c(brownianModel$opt$aicc, OUModel$opt$aicc, EBModel$opt$aicc))
Make a table with the AIC and lnL values for each model. Which model provides the best fit for Hand-wing index?
Now, add the results for a model fitting analysis of the range size trait to this table.
So, we were wrong. An OU model fits HWI better (and you should be able to explain how we know that). Unfortunately, a PGLS analysis with an OU model specified is currently computationally difficult. The best we can do is report the results from our model fitting analysis, and realize that the parameters from BM might not be the best fit.
However, we can still test our hypothesis that species with large HWI also present large range size, and account for phylogeny when we do. First, we should compare the uncorrected linear model of range size as a function of HWI vs the PGLS that uses the covariance structure of the residuals under a Brownian Motion model.
%>%
furnaData drop_na(Range.Size) %>%
ggplot(aes(x = `Hand-Wing.Index`, y = log(Range.Size))) +
geom_point(alpha = 0.5, color = "darkgray") +
labs(x = "Hand-Wing Index", y = "log(Range Size)") +
geom_smooth(method = "lm") +
geom_abline(intercept = coef(pglsModel)[1], slope = coef(pglsModel)[2],
color = "red", linewidth = 1.5)
You might want to know if these regressions really differ in their ability to predict range size from HWI. Asked in another way, are the slopes from these two regressions significantly different from each other? You need to know that a 95% confidence interval for the slope parameter is b (the slope) plus/minus 1.96 standard errors (this is derived from a normal distribution). To calculate your 95% confidence intervals:
<- summary(lm_hwi_rangesize)
rshwi.sum
#for the uncorrected linear model
$coef[2, 1] + c(-1.96, 1.96)*rshwi.sum$coef[2, 2]
rshwi.sum
#for Brownian Motion, the 95% CI
coef(pglsModel)[2] + c(-1.96, 1.96)*sqrt(pglsModel$varBeta[2, 2])
Did phylogenetic correction make a difference in this case?
What do you conclude about the evolution of range size as a function of hand-wing index?
Phylogenetic signal is the tendency of related species to resemble each other more than species drawn at random from the same tree.
Blomberg’s K compares the variance of PICs to what we would expect under a Brownian motion (BM) model of evolution. K = 1 means that close relatives resemble each other as much as we should expect under BM. K < 1 that there is less phylogenetic signal than expected under BM and that K > 1 means that there is more. In addition, a significant p-value returned from a randomization test tells us that the phylogenetic signal is significant, in other words, close relatives are more similar than random pairs of taxa in the dataset.
<- phylosig(tree = furnaTree, # Phylogeny
K_hwi x = hwi, # trait
method = "K", # method
test = TRUE)
print(K_hwi)
plot(K_hwi)
Pagel’s \(\lambda\) is a tree transformation that stretches the tip branches relative to internal branches, making the tree more and more like a complete polytomy of a star phylogeny. If \(\lambda = 0\) there is no phylogenetic signal, while \(\lambda = 1\) correspond to BM and \(0 < \lambda < 1\) in between.
<- phylosig(tree = furnaTree,
LB_hwi x = hwi,
method = "lambda",
test = TRUE)
print(LB_hwi)
plot(LB_hwi)
Describe the results of phylogenetic signal. Does Hand-wing index present phylogenetic signal?
So far, we have been dealing with continuous characters, those that take values along some continuum. Things like height, weight, length, temperature, humidity, etc. are continuous variables. There is another type of variable called a discrete variable, that takes, well, discrete values. Color (e.g. red, blue, green); Locomotory type (e.g. scansoial, terrestrial, fossorial) are examples of discrete variables.
Look at the data in the furnaData matrix. Which of these variables are discrete and which are continuous?
In your data matrix, Habitat has been coded for each of these species. Let’s examine some biogeography for these birds by reconstructing the ancestral habitat (i.e. habitat of origin) and dispersal history. First, isolate your variable.
<- furnaData$Habitat_recode
habitat names(habitat) <- rownames(furnaData)
table(habitat)
We can simultaneously fit a model of discrete character evolution and create a set of plausible character histories using a method called stochastic character mapping or simply SIMMAP:
<- make.simmap(furnaTree, habitat, model = "SYM", nsim = 100) habitat_anc
This analysis results in a “Q” matrix showing the relative probabilities of change from state to state. For this character, this would represent dispersal events between habitats. The higher the value, the higher the probability of that type of change.
Which pair of habitats shows the highest probability of interchange?
Which pair shows the least?
Does this make sense geographically?
Now, you can plot a random simulation of change in this character, that is based on the values inferred above.
plotSimmap(habitat_anc[[1]], fsize = 0.5) # this plots the first of 100 simulations
How many transitions are there between black and red?
Are there any reversals?
Are there any branches with more than one change?
Using the command above, but changing the index from 1 to other values, look at a number of reconstructions. How much variation do you see?
We can summarize these simulations and estimate the relative probability of each habitat as an ancestor for each node on our tree:
<- summary(habitat_anc)
habitat_summary
# print the results habitat_summary
Plot the resulting reconstruction
plot(habitat_summary, cex = c(0.5, 0.2), fsize = 0.5, offset = 90)
legend("bottomleft", fill = c("black", "red", "green"),
legend = c("Forest", "Savanna", "Transition"))
We can also use the plot function “plotSimmap” to change the colors of the habitats
<- setNames(c("darkgreen", "khaki", "gray"),
cols c("Forest", "Savanna", "Transition"))
plotSimmap(habitat_anc[[1]], cols)
add.simmap.legend(colors = cols, prompt = FALSE, x = 0, y = -0.5,
vertical = FALSE)
Where did Furnariides likely originate?
How many transitions from Forest to Savanna?
Are there any reversals?
Does a Savanna ancestor ever move back to Forest?