In this lab we will explore some aspects of hyperspectral remote sensing obtained from the National Ecological Observatory Network (NEON) with the goal of predicting biodiversity from the sky. Specifically, we will use hyperspectral remote sensing data from the NEON Airborne Observation Platform (AOP) (more information here and here). As you will see, working with hyperspectral information/data is similar to work with any other information/data (e.g., species abundance, presence-absence) and consequently it can be used to calculate any metric of biodiversity. In this sense, spectral diversity can be considered as a dimension of biodiversity.

Note. Part of the text used in this tutorial was extracted from here with some modifications.

Set up a working directory and store the data files in that directory. Tell R that this is the directory you will be using, and read in your data:

setwd("path/for/your/directory")

Install and load the following packages. When installing the package {rhdf5} R will ask you if you want to “Update all/some/none? [a/s/n]:” please in your console type n.

packages <- c("maptools", "rgdal", "raster", "neonUtilities", "rasterdiv",
"BiocManager", "dplyr", "tidyr", "ggplot2", "plyr", "reshape2")

# Install packages not yet installed
installed_packages <- packages %in% rownames(installed.packages())

if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages], dependencies = TRUE)
}

BiocManager::install("rhdf5")

sapply(packages, require, character.only = TRUE)
library(rhdf5)

# Data preparation

dir.create("Data/NEON")

The first step is to prepare the information required to download the hyperspectral data from NEON-AOP. To do that we will first download the Terrestrial Observation System Sampling Locations dataset. You can download it from NEON Spatial Data & Maps, specifically under the Terrestrial Observation System Sampling Locations tab or by clicking HERE. Once you have downloaded the data, please store it within the folder NEON that is located within the folder Data

NEON_plots <- readOGR(dsn = "Data/NEON/All_NEON_TOS_Plots_v8",
layer = "All_NEON_TOS_Plot_Polygons_V8")

From the spatial distribution of the NEON sites/plots we loaded into R, we will select the NEON site Harvard Forest & Quabbin Watershed (HARV).

HARV_plots <- NEON_plots@data %>%
filter(siteID == "HARV" & plotType == "distributed" & subtype == "basePlot") %>%
arrange(plotID)

View(HARV_plots)

Now from the NEON site HARV we will select the plot number one (plotID = HARV_001) and extract the coordinates in UTM (Universal Transverse Mercator) system as spatial reference to download the hyperspectral data from the NEON-AOP.

east <- HARV_plots$easting names(east) <- HARV_plots$plotID

north <- HARV_plots$northing names(north) <- HARV_plots$plotID

east
north

coords_HARV_001 <- c(east[1], north[1])
coords_HARV_001 

Now we have the UTM coordinates from the plot HARV_001 we can download the Hyperspectral Remote Sensing Data in HDF5 Format for that site.

byTileAOP(dpID = "DP3.30006.001", # NEON-AOP product
site = "HARV", # Site code
year = "2018", # Year
check.size = TRUE,
easting = coords_HARV_001[1], northing = coords_HARV_001[2], # Coordinates UTM
savepath = "Data/NEON", # Path
token = NA) 

# Hyperspectral remote sensing data exploration

We are now ready for start using hyperspectral data. Note that the downloaded data is in a HDF5 format, this format natively compresses data stored within it (i.e., makes it smaller) and supports data slicing, in other words, you can extract only the portions of the data that you need to work with rather than reading the entire dataset into memory.

The downloaded hyperspectral data is stored within a folder DP3.30006.001 that in turn is stored within the folder scheme Data/Neon. Please pay attention to the full folder scheme, this scheme is composed of several folders with a file NEON_D01_HARV_DP3_725000_4700000_reflectance.h5 in HDF5 format. In order to load and work with this data you might need to specify the full path to that file.

f <- "Data/NEON/DP3.30006.001/2018/FullSite/D01/2018_HARV_5/L3/Spectrometer/Reflectance/NEON_D01_HARV_DP3_725000_4700000_reflectance.h5"

Note that you did not loaded all hyperspectral data into R but the path to the file, so using the path we can just call the metadata of the downloaded data.

### HDF5 file structure

When we are exploring the data structure in a HDF5 file, we really need to pay attention to the first two columns, these two columns is informing us about the location of the data (group) and the name of the data (name) stored into the file. For example the Map_info dataset is located in /HARV/Reflectance/Metadata/Coordinate_System and the Reflectance dataset under /HARV.

The wavelength dataset contains the middle wavelength values for each band in the dataset and the Reflectance dataset contains the image data, these two datasets are going to be used for both data processing and visualization.

View(h5ls(f, all = TRUE))

### On bands and wavelengths

A band represents a group of wavelengths. For example, the wavelength values between 695 nm and 700 nm might be one band as captured by an imaging spectrometer. The imaging spectrometer collects reflected light energy in a pixel for light in that band. When we are working with a multispectral (e.g., Landsat, MODIS) or hyperspectral (e.g., NEON-AOP - NASA/JPL AVIRIS-NG) dataset, the band information is reported as the center wavelength value. This value represents the center point value of the wavelengths represented in that band. Thus in a band spanning 695-700 nm, the center would be 697.5 nm.

# Get information about the wavelengths of HARV plot 001

wlInfo
# Read wavelengths from the HDF5 file

tail(WL)
# Extract reflectance metadata

reflInfo

For example, the center wavelength value associated with the band 34 is 546.8372

WL[34]

The HDF5 read function reads data in the order: Bands, Cols, Rows. Let’s get these information from the reflectance metadata.

# Read dimensions of the hyperspectral data
nRows <- reflInfo$Dimensions[1] nCols <- reflInfo$Dimensions[2]
nBands <- reflInfo$Dimensions[3] nRows nCols nBands Now using the the information obtained from the reflectance metadata let’s extract the band 34. # Extract or "slice" data for band 34 from the HDF5 file b34 <- h5read(f, "/HARV/Reflectance/Reflectance_Data", index = list(34, 1:nCols, 1:nRows)) # get band 34 # what type of object is b34? class(b34) ## [1] "array" The returned object is an array, the arrays are matrices with more than 2 dimensions, i.e., are matrices stacked or piled in a single object. # convert from array to matrix by selecting only the first band b34 <- b34[1,,] # check it class(b34) # plot the image image(b34) The previous image is hard to visually interpret, let’s log the data and see what happens. image(log(b34)) ### Data cleaning An image data in raster format will often contain a data ignore value and a scale factor. The data ignore value represents pixels where there are no data. Usually, no data values may be attributed to the sensor not collecting data in the area of the image or to processing results which yield null values. From the reflectance metadata we can define the ignore value as -9999. Thus, let’s set all pixels with a value == -9999 to NA (no value). # there is NO data value in our raster - let's define it myNoDataValue <- as.numeric(reflInfo$Data_Ignore_Value)
myNoDataValue

# set all values equal to -9999 to NA
b34[b34 == myNoDataValue] <- NA

# plot the image now
image(b34)
# We need to transpose x and y values in order for our final image to plot properly
b34 <- t(b34)
image(log(b34), main = "Transposed Image")

# Creating a georeferenced raster

In order to get a raster file suitable for further analysis, we first need to define the Coordinate reference system (CRS) of the raster. Again, we obtian the necessary onformation form the HDF5 file.

# Extract the EPSG from the h5 dataset

# convert the EPSG code to a CRS string
myCRS <- crs(paste0("+init=epsg:", myEPSG))

Define final raster with projection info.

b34ras <- raster(b34, crs = myCRS)

# view the raster attributes
b34ras

Let’s take a look at the georeferenced raster. Take note of the coordinates on the x and y axis.

image(log(b34ras),
xlab = "UTM Easting",
ylab = "UTM Northing",
main = "Properly Oriented Raster")

Next we define the extents of our raster. The extents will be used to calculate the raster’s resolution. We get this information from the reflectance information obtained in a previous step.

# Grab the UTM coordinates of the spatial extent
xMin <- reflInfo$Spatial_Extent_meters[1] xMax <- reflInfo$Spatial_Extent_meters[2]
yMin <- reflInfo$Spatial_Extent_meters[3] yMax <- reflInfo$Spatial_Extent_meters[4]
# define the extent (left, right, top, bottom)
rasExt <- extent(xMin, xMax, yMin, yMax)

# view the extent to make sure that it looks right
rasExt 
# assign the spatial extent to the raster
extent(b34ras) <- rasExt

# look at raster attributes
b34ras
# let's change the colors of our raster and adjust the zlims
col <- terrain.colors(25)

image(b34ras,
xlab = "UTM Easting",
ylab = "UTM Northing",
main = "Raster with Custom Colors",
col = col,
zlim = c(0, 1000))

We can now save the created raster.

# write out the raster as a geotiff
writeRaster(b34ras,
file = "Data/NEON/DP3.30006.001/HARV_plot_001_band_34.tif",
format = "GTiff",
overwrite = TRUE)

## Creating a RGB raster

In the previous step we created a raster file of a single band, but each hyperspectral data from the NEON-AOP contains 426 bands. In this step we will construct a raster stack file, i.e., a raster with N bands, in other words, a raster of rasters. You can do it manually as we did for the band 34, but let’s take advantage of R and write a function that do the work for us.

# file: the hdf file
# band: the band you want to process
# noDataValue: values to be omitted
# extent: raster extent
# CRS: coordinates system
# returns: a matrix containing the reflectance data for the specific band

band2Raster <- function(file, band, noDataValue, extent, CRS){
# first, read in the raster
index = list(band, NULL, NULL)) # path to the HDF5 file
# Convert from array to matrix
out <- (out[1,,]) # output
# transpose data to fix flipped row and column order
# depending upon how your data are formatted you might not have to perform this
# step.
out <- t(out)
# assign data ignore values to NA
# note, you might chose to assign values of 15000 to NA
out[out == myNoDataValue] <- NA

# turn the out object into a raster
outr <- raster(out, crs = CRS)

# assign the extents to the raster
extent(outr) <- extent

# return the raster object
return(outr)
}

Now apply the function to create a RGB raster file, to do this, we will use the bands 58, 34 and 19, respectively.

# create a list of the bands we want in our stack
rgb <- list(58, 34, 19)

# lapply tells R to apply the function to each element in the list
rgb_harv <- lapply(rgb, FUN = band2Raster, file = f,
noDataValue = myNoDataValue,
extent = rasExt,
CRS = myCRS)

# check out the properties or rgb_rast
# note that it displays properties of 3 rasters.
rgb_harv

Success!!! we created a list of three rasters. Now in order to get the raster stack just apply the function stack() from the package {raster}.

# Create a raster stack from our list of rasters
rgb_harv_stack <- stack(rgb_harv)
rgb_harv_stack

As a final step, let’s assign the names of the bands to the raster stack object.

# Create a list of band names
bandNames <- paste("Band_", unlist(rgb), sep = "")

# set the rasterStack's names equal to the list of bandNames created above
names(rgb_harv_stack) <- bandNames

# check properties of the raster list - note the band names
rgb_harv_stack

# scale the data as specified in the reflInfo$Scale Factor rgb_harv_stack <- rgb_harv_stack/as.integer(reflInfo$Scale_Factor)

# plot one raster in the stack to make sure things look OK.
plot(rgb_harv_stack$Band_58, main = "Band 58") And plot resulting RGB raster. # create a 3 band RGB image plotRGB(rgb_harv_stack, r = 1, g = 2, b = 3, stretch = "lin") Cool, right? As we did with the band 34, let’s save the RGB raster in a GeoTIFF format. # write out final raster writeRaster(rgb_harv_stack, file = "Data/NEON/DP3.30006.001/HARV_plot_001_RGB.tif", format = "GTiff", overwrite = TRUE) ## Vegetation indices calculation Now, we will calculate vegetation indices, specifically, we will calculate the Normalized Difference Vegetation Index (NDVI). The $$NDVI$$ is computed as the difference between near-infrared ($$NIR$$) and red ($$RED$$) reflectance divided by their sum.: $NDVI = \frac{NIR - R}{NIR + R}$ To calculate the NDVI from the NEON-AOP hyperspectral data, first select the bands 58 (red) and 90 (NIR) and then create a raster stack as we did for the RGB raster. # Calculate NDVI # select bands to use in calculation (red, NIR) ndvi_bands <- c(58, 90) #bands c(58, 90) in full NEON hyperspectral dataset # create raster list and then a stack using those two bands ndvi_harv <- lapply(ndvi_bands, FUN = band2Raster, file = f, noDataValue = myNoDataValue, extent = rasExt, CRS = myCRS) ndvi_harv <- stack(ndvi_harv) # make the names pretty bandNDVINames <- paste("Band_", unlist(ndvi_bands), sep = "") names(ndvi_harv) <- bandNDVINames # view the properties of the new raster stack ndvi_harv Write a function for NDVI calculation. #calculate NDVI NDVI_func <- function(ras) { (ras[,2] - ras[,1])/(ras[,2]+ras[,1]) } Apply the function and plot the result. ndvi_calc <- calc(ndvi_harv, fun = NDVI_func) plot(ndvi_calc, main = "NDVI for the NEON HARV Field Site") Now, play with breaks and colors to create a meaningful map add a color map with 4 colors. myCol <- rev(terrain.colors(4)) # use the 'rev()' function to put green as the highest NDVI value # add breaks to the colormap, including lowest and highest values (4 breaks = 3 segments) brk <- c(0, .25, .5, .75, 1) # plot the image using breaks plot(ndvi_calc, main = "NDVI for the NEON HARV Field Site", col = myCol, breaks = brk) We can save the resulting NDVI raster. writeRaster(ndvi_calc, file = "Data/NEON/DP3.30006.001/HARV_plot_001_NDVI.tif", format = "GTiff", overwrite = TRUE) # Plot spectral signatures derived from hyperspectral remote sensing data Now we will extract all reflectance values for a selected pixel in the hyperspectral data from HARV and use this values to plot its spectral signatures. For practice purpose we will select a pixel at the position (100, 35), in other words, we are selecting the pixel at the row 100 and column 35. In addition to get the the reflectance for all bands we need to inform an empty space (as NULL). # extract all bands from a single pixel aPixel <- h5read(f, "/HARV/Reflectance/Reflectance_Data", index = list(NULL, 100, 35)) class(aPixel) # The line above generates a vector of reflectance values. # Next, we reshape the data and turn them into a dataframe b <- adply(aPixel, c(1)) class(b) # create clean data frame aPixeldf <- b[2] # add wavelength data to matrix aPixeldf$Wavelength <- WL

head(aPixeldf)

Now select the scale factor from the reflectance object and scale the reflectance values for all bands.

scaleFact <- reflInfo$Scale_Factor # add scaled data column to the data frame aPixeldf$scaled <- (aPixeldf\$V1/as.vector(scaleFact))

# make nice column names
names(aPixeldf) <- c('Reflectance', 'Wavelength', 'ScaledReflectance')

tail(aPixeldf)
ggplot(data = aPixeldf) +
ylab("Reflectance")