Note

The main goal of this tutorial is to present basic aspects for anyone to be free of initial fear and start using R to perform data analysis. Every learning process becomes more effective when theory is combined with practice; in this sense, we strongly recommend that you follow the exercises in this short tutorial at the same time that you run the commands on your computer and not just read them passively.

1 Why R?

R is a language and a statistical programming environment and graphics or also called an “object-oriented programming”, which means that using R involves the creation and manipulation of objects on a screen, where the user has to say exactly what they want to do rather than simply press a button (black box paradox). So, the main advantage of R is that the user has control over what is happening and also a full understanding of what they want before performing any analysis.

With R, it is possible to manipulate and analyze data, make graphics and write from small commands to entire programs. Basically, R is the open version of the S language, created by Bell’s Lab in 1980. Interestingly, the S language is super popular among different areas of science and is the base for commercial products such as SPSS, STATA, and SAS, among others. Thus, if we have to add another advantage to R, is that R is an open language and free!

There are different sources and web-pages with a lot of information about R, most of them are super useful and can be found at DataCamp, CRAN, R Tutorial.

Also, when we are reporting our results in the form of a report, scientific paper or any kind of document, we would need to cite the used software, the easiest to cite R is using the internal function citation().

Code

citation()


To cite R in publications use:

  R Core Team (2022). R: A language and environment for statistical
  computing. R Foundation for Statistical Computing, Vienna, Austria.
  URL https://www.R-project.org/.

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {R: A Language and Environment for Statistical Computing},
    author = {{R Core Team}},
    organization = {R Foundation for Statistical Computing},
    address = {Vienna, Austria},
    year = {2022},
    url = {https://www.R-project.org/},
  }

We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also 'citation("pkgname")' for
citing R packages.

2 First steps

First that all, we need to know about WHERE are we working at. In other words, our working directory. To get information that information we just need to type getwd() in the script or the console.

Code

getwd()

[1] "/Users/jesusnpl/Dropbox/My Mac (Jesús’s MacBook Pro)/Documents/GitHub/BiodiversityScience/Spring2023/Lab_0"

If the working directory is not the correct one, we just need to order R to SET the correct address.

Code

setwd("Your path or directory")

There is an R package called {here} that is super convenient for setting your working directory if the path is really long. You didn’t hear that from me ;p!

Ok, we are now in the correct place, so we can continue with the practice.

2.1 Directory structure

For training purposes, we will create a directory structure where the main folder is our current working environment, so we will create a series of subfolders where we store, the data, the scripts and whatever we want… To do that we will use the function dir.create(). Let’s practice!

Expand to learn more about this issue

Every class you will need to check your working directory in order follow the labs without issues.

Code

?dir.create

Code

dir.create("BioSci") # this can be your main folder and you can change the name

dir.create("Data") # folder that store the data 

dir.create("R-scripts") # folder that store the scripts used in the course

dir.create("Figures") # folder that store the figures created in the course

dir.create("Results") # The results

dir.create("Temp")

To check if the subfolders were created within the main folder, just use the function dir(), this simple function will print in the console the name of the files that are currently in your working directory.

We can SET our working directory into one of the subfolders that we just created using the function setwd()

Code

setwd("Results")

However, for practicality it is super-ultra-mega recommendable to work in the MAIN FOLDER, so go back to the previous folder or main folder by just using the function setwd(), instead of using a folder name, we will use simply two dots, yes two dots “..”. This simple operation will return to the main folder.

Code

setwd("..")

2.2 The importance of the question mark “?” or the help function

Maybe, the most important (at least for Jesús) function of R is help or ?. Using help or the question mark, we can ask to R about almost anything (sadly we can’t order pizza, yet)… so, let’s practice!

Code

help("logarithm")

Code

?log

Code

??log

Other important and useful functions in R, are: head(), tail(), dim(), str, summary(), names(), class(), rm(), save.image, saveRDS() and readRDS(), load(), source(), all these simple functions will help us to understand our data.

3 Objects: creation and manipulation

In R you can create and manipulate different kind of data, from a simple numeric vector to complex spatial and/or phylogenetic data frames. The main six kinds of objects that you can create and manipulate in R, are: vector, factor, matrix, data frame, list and functions.

So, let’s start with the first object, the Vector.

3.1 Vector

Vectors are the basic object in R and basically, contains elements of the same type (e.g., numbers, characters). Within vector exist three types: numeric, character and logic.

3.1.0.1 Numeric vector

IMPORTANT R is case sensitive, so you need to pay attention when you name the objects.

Code

a <- 10 # numeric value 

b <- c(1, 2, 3, 4, 5) # numeric vector

class(b) # ask to R which type of object is b

[1] "numeric"

Code

seq_test <- seq(from = 1, to = 20, by = 2) # Here is a sequence of numbers from 1 to 20, every two numbers

x = seq(10, 30) # This is a sequence from 10 to 30. What is the difference with the previous numeric vector? 

sample(seq_test, 2, replace = T) # Sort two numbers within the object seq_test

[1] 13 11

Code

rep_test <- rep(1:2, c(10, 3)) # Repeat the number one, ten times and the number 2 three times

ex <- c(1:10) # Create a sequence of 1 to 10

length(ex) # Length of the object example

[1] 10

Code

aa <- length(ex) # What we are doing in here?

str(seq_test) # Look at the structure of the data

 num [1:10] 1 3 5 7 9 11 13 15 17 19

3.1.0.2 Character vector

We can also create vector of characters, which mean that instead of storing numbers we can store characters.

Code

research_groups <- c(Jeannine = "Plants", Jesus = "Birds and plants", Laura = "Plants")

research_groups

          Jeannine              Jesus              Laura 
          "Plants" "Birds and plants"           "Plants"

Explore the character vector using the function str()

Code

str(research_groups)

 Named chr [1:3] "Plants" "Birds and plants" "Plants"
 - attr(*, "names")= chr [1:3] "Jeannine" "Jesus" "Laura"

You can try to create a different character vector, for example, using the names of your peers.

3.1.0.3 Logic vector

This kind of vector is super useful when the purpose is to create or build functions. The elements of a logic vector are TRUE, FALSE, NA (not available).

Code

is.factor(ex) # Is it a factor? (FALSE)

[1] FALSE

Code

is.matrix(ex) # Is it a matrix? (FALSE)

[1] FALSE

Code

is.vector(ex) # Is it a vector? (TRUE)

[1] TRUE

Code

a < 1   # 'a' is lower than 1? (FALSE)

[1] FALSE

Code

a == 1   # 'a' is equal to 1? (TRUE)

[1] FALSE

Code

a >= 1   # 'a' is higher or equal to 1? (TRUE)

[1] TRUE

Code

a != 2   # the object 'a' is different of two? (TRUE) (!= negation)

[1] TRUE

3.2 Factor

A factor is useful to create categorical variables, that is very common in statistical analyses, such as the Anova.

Code

data <- factor(c("small", "medium", "large"))

Code

is.factor(data) # Check if the object is correct.

[1] TRUE

3.3 Matrix

A matrix is bidimensional arrangement of vectors, where the vectors need to be of the same type, that is, two or more numeric vectors, or two or more character vectors.

Code

matx <- matrix(1:45, nrow = 15)
rownames(matx) <-  LETTERS[1:15] # names of the rows
colnames(matx) <- c("Sample01", "Sample02", "Sample03") # names of the columns or headers

Code

matx # Inspect the matrix

  Sample01 Sample02 Sample03
A        1       16       31
B        2       17       32
C        3       18       33
D        4       19       34
E        5       20       35
F        6       21       36
G        7       22       37
H        8       23       38
I        9       24       39
J       10       25       40
K       11       26       41
L       12       27       42
M       13       28       43
N       14       29       44
O       15       30       45

Code

class(matx) # Ask, which kind of data is?

[1] "matrix" "array"

Code

matx[, 1] # We can use brackets to select a specific column

 A  B  C  D  E  F  G  H  I  J  K  L  M  N  O 
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

Code

matx[1, ] # We can use brackets to select a specific row

Sample01 Sample02 Sample03 
       1       16       31

Code

head(matx)

  Sample01 Sample02 Sample03
A        1       16       31
B        2       17       32
C        3       18       33
D        4       19       34
E        5       20       35
F        6       21       36

Code

tail(matx)

  Sample01 Sample02 Sample03
J       10       25       40
K       11       26       41
L       12       27       42
M       13       28       43
N       14       29       44
O       15       30       45

Code

str(matx)

 int [1:15, 1:3] 1 2 3 4 5 6 7 8 9 10 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:15] "A" "B" "C" "D" ...
  ..$ : chr [1:3] "Sample01" "Sample02" "Sample03"

Code

summary(matx) # summary statistics of the data in the matrix

    Sample01       Sample02       Sample03   
 Min.   : 1.0   Min.   :16.0   Min.   :31.0  
 1st Qu.: 4.5   1st Qu.:19.5   1st Qu.:34.5  
 Median : 8.0   Median :23.0   Median :38.0  
 Mean   : 8.0   Mean   :23.0   Mean   :38.0  
 3rd Qu.:11.5   3rd Qu.:26.5   3rd Qu.:41.5  
 Max.   :15.0   Max.   :30.0   Max.   :45.0

In general, when we are exploring our data for example using head() the function will return only the 6 first rows of our matrix, however, we can add another argument into the function. For example, head(matx, 10), just add the number 10 after the comma and is possible to see the first 10 lines. This simple operation is useful specially when our matrix is large >100 rows.

Function tail

You can use the function tail() to check the last rows of your data.

3.4 Data frame

The difference between a matrix and a data frame is that a data frame can handle different types of vectors. You can explore more about the data frames asking R ?data.frame. Let’s create a data frame and explore its properties.

Code

df <- data.frame(species = c("rufus", "cristatus", "albogularis", "paraguayae"), 
                 habitat = factor(c("forest", "savanna", "urban", "transition")), 
                 high = c(10, 2, 7, 4), distance = c(3, 9, 5, 6))

Code

class(df)

[1] "data.frame"

Code

matx2 <- as.data.frame(matx) # We can also transform our matrix to a data frame
class(matx2)

[1] "data.frame"

Code

str(df)

'data.frame':   4 obs. of  4 variables:
 $ species : chr  "rufus" "cristatus" "albogularis" "paraguayae"
 $ habitat : Factor w/ 4 levels "forest","savanna",..: 1 2 4 3
 $ high    : num  10 2 7 4
 $ distance: num  3 9 5 6

3.5 List

The list is an object that consists of an assembly of objects sorted in a hierarchical way. Here we will use the data previously created.

Code

lst <- list(data, df, matx)

We can now go ahed and inspect the list.

Code

str(lst)

List of 3
 $ : Factor w/ 3 levels "large","medium",..: 3 2 1
 $ :'data.frame':   4 obs. of  4 variables:
  ..$ species : chr [1:4] "rufus" "cristatus" "albogularis" "paraguayae"
  ..$ habitat : Factor w/ 4 levels "forest","savanna",..: 1 2 4 3
  ..$ high    : num [1:4] 10 2 7 4
  ..$ distance: num [1:4] 3 9 5 6
 $ : int [1:15, 1:3] 1 2 3 4 5 6 7 8 9 10 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:15] "A" "B" "C" "D" ...
  .. ..$ : chr [1:3] "Sample01" "Sample02" "Sample03"

And also check if the object created is, in fact, a list.

Code

class(lst)

[1] "list"

Now, inspect the objects that are stored into our object lst. To do this, we just need to use two brackets [[]].

Code

lst[[1]]

[1] small  medium large 
Levels: large medium small

Code

lst[[2]]

      species    habitat high distance
1       rufus     forest   10        3
2   cristatus    savanna    2        9
3 albogularis      urban    7        5
4  paraguayae transition    4        6

Code

lst[[3]]

  Sample01 Sample02 Sample03
A        1       16       31
B        2       17       32
C        3       18       33
D        4       19       34
E        5       20       35
F        6       21       36
G        7       22       37
H        8       23       38
I        9       24       39
J       10       25       40
K       11       26       41
L       12       27       42
M       13       28       43
N       14       29       44
O       15       30       45

At to this point, we have explored the most common objects in R. Understanding the structure of each class of objects (from vectors to lists) is maybe the most critical step to learning R.

4 Install and load packages

Although R is a programming language, it is also possible to use different auxiliary packages that are available for free to download and to install in our computers. Install new packages into R is easy and just needs a simple function install.packages() and of course, an Internet connection. For more information on how to install new packages, you just need to ask R using ?install.packages

Code

install.packages("PACKAGE NAME")

The reverse function is remove.packages().

Most of the time, we do not remember if we already have a package installed on our computer, so if we are tired and do not want to go to our R folder packages and check if the package is, in fact, installed, we can use the following command.

Code

if ( ! ("PACKAGE NAME" %in% installed.packages())) 
  {install.packages("PACKAGE NAME", dependencies = TRUE)}

To load an installed package you can just type, library() or require()

Code

library("PACKAGE NAME")
require("PACKAGE NAME")

Sometimes we need to install a lot of packages, and installing them one by one will require time and patience, which, most of the time, we don’t have Lol. To solve that issue, we can create a vector with the names of the packages and create a simple function that helps us to install R with just one click!

Code

# Package vector names
packages <- c("ggplot2", "phytools", "picante", "tidyr", "dplyr")

Code

# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
  install.packages(packages[!installed_packages])
}

Now we can load all the packages listed in last vector.

Code

# Packages loading
invisible(lapply(packages, library, character.only = TRUE))

R Package {pacman}

{pacman} is an R package that allows you to install and load several packages at once just by using one command line, for example:

pacman::p_load(packages) # packages object is a vector we created before.

5 R as a calculator

R can be used as a calculator, for example, we can use the information created before to make some arithmetic operations.

Code

b[4]+seq_test[10] # b and seq_test are numeric vectors created before, here we are just

[1] 23

Code

# obtaining the position 4 and 10 from those vectors and summing the values

Code

b[4]*seq_test[10] # same but multiplying the two values

[1] 76

Code

seq_test[5]/df[3, 3] # df is a data frame and here we are extracting the value in the

[1] 1.285714

Code

# row 3 and column 3 and using it as a divisor of the values in the position 5 of the 
# vector seq_test.

Code

matx[, 3][4]-df[4, 4] # What is difference of this equation compared to the previous?

 D 
28

Code

seq_test^7 # power function in which each values is raised to the seventh power.

 [1]         1      2187     78125    823543   4782969  19487171  62748517
 [8] 170859375 410338673 893871739

Code

seq_test*7 # what happen in here?

 [1]   7  21  35  49  63  77  91 105 119 133

Code

seq_test+7

 [1]  8 10 12 14 16 18 20 22 24 26

Code

seq_test-7

 [1] -6 -4 -2  0  2  4  6  8 10 12

Code

mean(seq_test) # mean value of the numeric vector seq_test

[1] 10

Code

max(seq_test)

[1] 19

Code

min(seq_test)

[1] 1

Code

sum(seq_test)

[1] 100

Code

log(seq_test)

 [1] 0.000000 1.098612 1.609438 1.945910 2.197225 2.397895 2.564949 2.708050
 [9] 2.833213 2.944439

Code

sqrt(seq_test)

 [1] 1.000000 1.732051 2.236068 2.645751 3.000000 3.316625 3.605551 3.872983
 [9] 4.123106 4.358899

Code

cor(matx[, 1], matx[, 2])

[1] 1

6 Data import/export

As indicated before, using R you can handle different kind of information (from vectors to data frames) and basically most of our data is usually stored in an Excel spreadsheet or in files that have the extension of .csv (comma-separated values file) or .txt (Text X Text or text file that contains unformatted text).

Most of these files are imported in R are data frames, but, as we were practicing, we now have the tools to handle or transform the information into different objects.

The function to import data to R is simple read.table() or read.csv(), and using these simple functions, you can import the data and transform it in other kind of objects So, lets practice!

Code

dat <- read.table("Data/lab_0/Sample.txt")

dat2 <- read.table("Data/lab_0/Sample.txt", row.names = 1, header = TRUE)

dat3 <- read.csv("Data/lab_0/Sample.csv")

Code

class(dat)

[1] "data.frame"

Code

class(dat2)

[1] "data.frame"

Code

class(dat3)

[1] "data.frame"

We can also extract a sample of our data frame.

Code

dat3Sample <- dat3[1:50, 1:4]

dim(dat3Sample)

[1] 50  4

We can also import the data frame as a matrix.

Code

dat4 <- na.omit(as.matrix(read.csv("Data/lab_0/Sample.csv", row.names = 1, header = TRUE)))
class(dat4)

[1] "matrix" "array"

Code

head(dat4, 10)

    Longitude.x. Latitude.y.        PD SR      PSVs SR.1        vars     aMRD
192       -108.5        26.5 0.3789908  3 0.7965072    3 0.004459021 17.66667
222        -78.5        25.5 0.2167810  2 0.3536695    2 0.012361281 12.50000
229       -107.5        24.5 0.2684503  2 0.7393375    2 0.012361281 18.50000
230       -106.5        24.5 0.2684503  2 0.7393375    2 0.012361281 18.50000
235       -101.5        24.5 0.2789230  2 0.8175072    2 0.012361281 19.50000
245       -106.5        23.5 0.2684503  2 0.7393375    2 0.012361281 18.50000
246       -105.5        23.5 0.2684503  2 0.7393375    2 0.012361281 18.50000
247       -104.5        23.5 0.2684503  2 0.7393375    2 0.012361281 18.50000
248       -103.5        23.5 0.2684503  2 0.7393375    2 0.012361281 18.50000
250       -101.5        23.5 0.2789230  2 0.8175072    2 0.012361281 19.50000
        aMDR      aAGES
192 14.27362 0.05628246
222 26.88631 0.01707297
229 14.49822 0.05489227
230 14.49822 0.05489227
235 17.25644 0.05057945
245 14.49822 0.05489227
246 14.49822 0.05489227
247 14.49822 0.05489227
248 14.49822 0.05489227
250 17.25644 0.05057945

Code

dat4[1:20, 1:4] # Show the first 20 rows and 4 columns.

    Longitude.x. Latitude.y.        PD SR
192       -108.5        26.5 0.3789908  3
222        -78.5        25.5 0.2167810  2
229       -107.5        24.5 0.2684503  2
230       -106.5        24.5 0.2684503  2
235       -101.5        24.5 0.2789230  2
245       -106.5        23.5 0.2684503  2
246       -105.5        23.5 0.2684503  2
247       -104.5        23.5 0.2684503  2
248       -103.5        23.5 0.2684503  2
250       -101.5        23.5 0.2789230  2
251       -100.5        23.5 0.3817156  3
252        -99.5        23.5 0.3817156  3
255        -82.5        23.5 1.2950881 14
256        -81.5        23.5 0.5936688  5
257        -80.5        23.5 0.2961977  2
261       -106.5        22.5 0.2684503  2
262       -105.5        22.5 0.2684503  2
263       -104.5        22.5 0.2684503  2
264       -103.5        22.5 0.2684503  2
265       -102.5        22.5 0.2684503  2

You can also import your data using the same functions, but without specifying the address. Notice that we do not recommend this procedure as you can’t control the directory structure, but is useful when you just are exploring data.

Code

dat5 <- na.omit(read.csv(file.choose()))

You can also save your data from R using the function write.table() or write.csv(). Lets save the dat3Sample. Notice that always we need to specify the correct address, in our case we will save the data in the subolder Data.

Code

is.na(dat3Sample)

write.csv(dat3Sample, file = "Data/Lab_0/dat3Sample.csv")

7 Phylogenetic data

To study biodiversity is important to first understand the data we are using and one common data used now is the phylogenetic data or phylogenetic trees that describe the evolutionary relationships between and among lineages. From here until the end of this short tutorial we will try to explain the basics of how to import/export and handle phylogenetic information. You can find extra information in the second chapter of the MPCM Book.

7.1 Formats

The two most common formats in which the phylogenies are stored are the Newick and Nexus (Maddison et al., 1997).

Code

"((A:10,B:9)D:5,C:15)F;"

[1] "((A:10,B:9)D:5,C:15)F;"

Using this notation, the parenthesis links the lineages to a specific node of the tree and the comma “,” separates the lineages that descend from that node. The colon punctuation “:” can be used after the name of the node and the subsequent numeric values represent the branch length. Finally, the semicolon punctuation “;” indicate the end of the phylogenetic tree.

Now we can see how this format works, but first, check if we have the R packages for this purpose. Here we will use the R package Analyses of Phylogenetics and Evolution, AKA ape.

Code

if ( ! ("ape" %in% installed.packages())) {install.packages("ape", dependencies = TRUE)}

Code

require(ape)

Loading required package: ape

Now we can read the phylogenetic tree we just created above in Newick format.

Code

## Here we will create a phylogenetic tree in Newick format
newick_tree <- "((A:10,B:9)D:5,C:15)F;"

## Read the tre
newick_tree <- read.tree(text = newick_tree)

And now we can plot the phylogentic tree

Code

plot(newick_tree, show.node.label = TRUE)

The other format is the Nexus, and after some time using it, we can say that the Nexus format have more flexibility for working. An example of a Nexus format is as follow:

Code

"#NEXUS
BEGIN TAXA;
DIMENSIONS NTAXA=3;
TaxLabels A B C;
END;
BEGIN TREES;
TREE=((A:10,B:9)D:5,C:15)F;
END;"

[1] "#NEXUS\nBEGIN TAXA;\nDIMENSIONS NTAXA=3;\nTaxLabels A B C;\nEND;\nBEGIN TREES;\nTREE=((A:10,B:9)D:5,C:15)F;\nEND;"

We can create and save a nexus file from scratch using the next code.

Code

## First create a Nexus file in the working directory 
cat(
 "#NEXUS
 BEGIN TAXA;
 DIMENSIONS NTAXA=3;
 TaxLabels A B C;
 END;
 BEGIN TREES;
 TREE=((A:10,B:9)D:5,C:15)F;
 END;",
file = "../Data/Lab_0/Nexus_tree.nex"
)

Now, using the function read.nexus() we can read the nexus file.

Code

## Now read the phylogenetic tree, but look that instead of using read.tree we are using read.nexus
nexus_tree <- read.nexus("../Data/Lab_0/Nexus_tree.nex")

And also plot the imported nexus file.

Code

## lets plot the example
plot(nexus_tree, show.node.label = TRUE)

Now, let’s inspect our phylogenetic trees.

Code

str(nexus_tree)

List of 5
 $ edge       : int [1:4, 1:2] 4 5 5 4 5 1 2 3
 $ edge.length: num [1:4] 5 10 9 15
 $ Nnode      : int 2
 $ node.label : chr [1:2] "F" "D"
 $ tip.label  : chr [1:3] "A" "B" "C"
 - attr(*, "class")= chr "phylo"
 - attr(*, "order")= chr "cladewise"

Code

nexus_tree$tip.label

[1] "A" "B" "C"

If we want to know about the branch length of the tree we just need to select edge.lenght

Code

nexus_tree$edge.length

[1]  5 10  9 15

An important component of a phylo object is the matrix object called edge. In this matrix, each row represents a branch in the tree and the first column shows the index of the ancestral node of the branch and the second column shows the descendant node of that branch. Let’s inspect!

Code

nexus_tree$edge

     [,1] [,2]
[1,]    4    5
[2,]    5    1
[3,]    5    2
[4,]    4    3

We know it is a little hard to follow even with small trees as the example, but if we plot the phylogenetic tree, the information within it it’s easier to understand.

Code

# Lets plot the tree
plot(nexus_tree, show.tip.label = FALSE)
# Add the internal nodes
nodelabels()
# Add the tips or lineages
tiplabels()

Finally, the phylogenies can also be imported in form of a list and in phylogenetic comparative methods this list of phylogenies is called multiPhylo, and we can import/export these multiPhylos in the two formats.

Code

# Simulate 10 phylogenies, each one with 5 species
multitree <- replicate(10, rcoal(5), simplify = FALSE)
# Store the list of trees as a multiPhylo object
class(multitree) <- "multiPhylo"

Code

# Plot a single tree from the 10
plot(multitree[[10]])

Code

par(mfrow = c(2, 2))
plot(multitree[[1]])
plot(multitree[[3]])
plot(multitree[[7]])
plot(multitree[[10]])

Code

# Exporting the phylogenies as a single Newick file. 
write.tree(phy = multitree, file = "../Data/Lab_0/multitree_example_newick.txt")
multitree_example_newick <- read.tree("../Data/Lab_0/multitree_example_newick.txt")
multitree_example_newick

10 phylogenetic trees

Code

# Exporting the phylogenies as a single Nexus file. 
write.nexus(phy = multitree, file = "Data/Lab_0/multitree_example_nexus.nex")
multitree_example_nexus <- read.nexus("Data/Lab_0/multitree_example_nexus.nex")
multitree_example_nexus

10 phylogenetic trees

The :: operator

If you know exactly which package contains the function you want to use you can reference it directly using the :: operator. Simply place the {package name} before the operator and the name of the function after the operator to retrieve it.

In simple words, if you just want to use a specific function of an R package and not the entire package, the :: operator can do it for you. for example:

multitree_example_nexus <- ape::read.nexus(“Data/Lab_0/multitree_example_nexus.nex”)

8 Gentle intro to loops

In programming one of the most important tool is the loop AKA for. Basically, a loop runs for n number of steps in a previously defined statement.

The basic syntax struture of a loop is:

Code

for (variable in vector) {
  execute defined statements
}

When we are writing some piece of code it is common to use the loop variable i to determine the number of steps. Why not other letter?, well i is the first letter of the word iteration —duh! Anyway, you can use any letter or word as a loop variable.

So, let’s take a look.

Code

for (i in 1:10){
  cat(i, sep = '')
}

12345678910

Notice that the number of steps is determined by the loop variable and in this example is a sequence of steps from 1 to 10, that correspond to the second element of the for loop, the vector.

You can modify the previous statement to obtain different results, for example:

Code

for (i in 1:10){
  cat(i, sep = '\n')
}

Or using a previous object:

Code

for (i in 5:length(ex)){
  cat(i, sep = '\n')
}

Or to make calculations

Code

for (i in 5:length(ex)){
  b2 <- b^2
  b3 <- b*2
  b4 <- b+10
}

To finish this short tutorial, we will welcome all of the members of the Biodiversity Science cohort 2023.

Code

BioSciNames <- read.csv("Data/lab_0/BioSci_2023.csv")[, 1]

Code

for (i in 1:length(BioSciNames)){ 
  
  print(paste0("Hi ", BioSciNames[i], ", welcome to the first practice of Biodiversity Science 2023!"))
  
  Sys.sleep(2) # wait two seconds before the next iteration or name
}

[1] "Hi Landon Aufderhar, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Sara Berger, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Jeannine Cavender-Bares, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Jaron Cook, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Betsy Custis (she/her), welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Ashley Darst, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Tiana De Grande, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Megan DeCook (she/her), welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Victoria Deitschman, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Sally Donovan, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Shelby Erickson, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Gwynneth Foley, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Ashley Halverson, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Mackenna Kaufer (she/her), welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Faith Kelly (she/her), welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Emma Klubberud (she/her), welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Laura Ostrowsky, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Abha Panda, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Nguyen Thanh Vy Phan, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Nathaniel Pierce, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Jesus Pinto Ledezma, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Leah Ray, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Ayden Reed, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Henry Rosato, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Lisa Russell (she/her), welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Ethan Schindler, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Nathan Schneider, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Erja Smith, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Yiyang Wang, welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Cathy Wiegand (she/her), welcome to the first practice of Biodiversity Science 2023!"
[1] "Hi Eliana Wilson (she/her), welcome to the first practice of Biodiversity Science 2023!"

We have covered basic aspects of R, from exploring and managing objects to import/export data and basics into loops. We hope that this short tutorial can be helpful not only for the Biodiversity Science course but for your specific projects. Remember, practice, practice, practice!

References

Maddison, D. R., Swofford, D. L., & Maddison, W. P. (1997). Nexus: An Extensible File Format for Systematic Information. Systematic Biology, 46(4), 590–621. https://doi.org/10.1093/sysbio/46.4.590