Background

In a recent publication in PNAS, “Memory and resource tracking drive blue whale migrations” (Abrahms et at. 2019), we compared 10 years (1999-2008) of blue whale movements with the timing of the spring phytoplankton bloom (which leads to increased whale prey) in the California Current Ecosystem to show how blue whales track resource waves during the feeding season. The complete tracking data set, which includes 143 animals tagged between 1993 and 2008, has previously been used in a number of publications, as listed in the Movebank Repository README.txt file file. The paper by Bailey et al. (2009) provides a good entry point to the data set, and we continue to use it new studies. However, with the publication of the PNAS paper we felt it was timely to formally publish the data set and make it more widely available, and we chose to do so in a Movebank Data Repository (Mate et al. 2019). The data also can be viewed on Movebank (Movebank Study ID: 650188969, “Blue whales Eastern North Pacific 1993-2008 - Argos Data”).

Both the PNAS paper and the Movebank Study received some attention in social media. On February 28, 2019 a Twitter post by @MovebankTeam advertising the availability of the data elicited a response from @AniMove encouraging us to post an animation of the movements and corresponding ocean conditions using the fantastic moveVis R package by @schwalbwillmann:

...

...

...

Eager to see our data visualized in this manner, I decided to take on the animation challenge. The steps implemented here largely follow the moveVis online documentation as well as the move and moveVis package vignettes available on CRAN. At this point it should be mentioned that for the PNAS paper we used a derived version of the tracks that had been regularized with a state-space model (Bailey et al. 2009) and that had been further annotated with a suite of environmental variables (e.g., bathymetry, sea surface temperature, sea surface height, phytoplankton chlorophyll-a concentration), whereas the data set that is deposited on Movebank corresponds to the raw Argos locations, with no additional processing or annotation (although it is possible to do so in Movebank with the Env-DATA System). So for this post I chose to animate the raw tracks to illustrate the process of accessing the publically available data directly from the Movebank Data Repository (you will need an internet connection in order to replicate some of the steps on this post), taking advantage of the utilities available in the move R package (see here for additional details). The animation implemented here shows the movement patterns of the whales over a single “common year” with the purpose of visualizing the movement pattern over the course of the seasons. In a future post I will create a different animation of the whale tracks encoded by the underlying ocean conditions (e.g. phytoplankton chlorophyll-a concentration), using the derived (i.e., regularized and annotated) version of the tracks, which is not publically available at this time.

Accessing the blue whale Argos tracks on Movebank

Load the R packages we will require:

library("tidyverse")
library("lubridate") # masks base::date()
library("RColorBrewer")
library("move") # masks dplyr::select() and tidyr::extract()
library("moveVis")
library("argosfilter") # masks move::distance, etc.

Import the data as a MoveStack object. Note that the data can be accessed via the Movebank Study using the function move::getMovebankData(), which provides greater control over how the data get imported, but requires having a user account on Movebank:

study.name = "Blue whales Eastern North Pacific 1993-2008 - Argos Data"
loginStored <- movebankLogin(username = "my_username", 
                             password = "my_password")
dat.trks <- getMovebankData(study = study.name, login = loginStored)

Alternatively, we can circumvent the login credentials requirement because the data have also been published as a Movebank Data Repository, which is publically available:

dat.trks <- getDataRepositoryData("doi:10.5441/001/1.5ph88fk2")
## Duplicated locations found on 1 occassions, those are omited, consider using a more informed approach then using the first one.
# A Large MoveStack (15777 elements, 2.9 Mb)

After the data are read in we get the message about duplicated locations. Duplicates are common in raw Argos data but can be problematic for analysis. Despite our extensive quality control, it looks like one duplicated location remained in the data set, but move::getDataRepositoryData() was able to detect and omit it as part of its checking for internal consistency. Note that for more pervasive duplication problems, function move::getMovebankData() includes the argument removeDuplicatedTimestamps, which can be set to FALSE, then the function getDuplicatedTimestamps() can be used to find the offending entries and decide which one to remove.

Also note that we manually identified the most egregious outliers in the raw Argos data prior to submitting the data to Movebank for display purposes on the Movebank Study map page. During the import process, these outliers are recognized and omitted from the main data set (they are kept in the slot “manually.marked.outlier”). However, raw Argos data are noisy and further cleaning will be necessary to remove additional outliers prior to animating the tracks.

Preparing the movement data

Inspect the MoveStack object using move functions (these lines are left commented to avoid excessive output here):

# Inspect the contents of the data file:
#show(dat.trks)

# Number of individual animals:
#n.indiv(dat.trks) # 143

# Longitude and latitude extent of the tracks:
#extent(dat.trks)
#xmin: -151.57, xmax: -87.547, ymin: 6.194, ymax: 53.148

# Temporal extent of the tracks:
#range(dat.trks@timestamps)
# "1993-08-28 18:20:00 UTC" "2009-04-10 12:34:34 UTC"

# Temporal extent of the tag deployments:
#range(as.POSIXct(dat.trks@idData$deploy.on.date, tz = "UTC"))
# "1993-08-28 18:20:00 UTC" "2008-07-28 23:00:00 UTC"

# Number of locations per track:
trks.len <- n.locs(dat.trks)
#range(trks.len) # 1-456

# Duration (in days) of each track:
trks.dur <- as.Date(dat.trks@idData$deploy.off.date) - 
  as.Date(dat.trks@idData$deploy.on.date)
#range(trks.dur) # 0-504 days
#median(trks.dur) # 58 days

The above indicates there are 143 tracks from tags deployed from 1993 to 2008, with between 1 and 456 Argos locations and durations ranging from 0 to 504 days (median of 58 days).

Also make a quick plot of the tracks:

plot(dat.trks, type = "b", xlab = "location_long", ylab = "location_lat")
Figure 1: A quick plot of the raw Argos tracks.

Figure 1: A quick plot of the raw Argos tracks.

Although they are not easy to see, we notice on the plot that several outliers remain in the tracking data (evident as spikes). These should be excluded as well, but we’ll take care of them after the next step.

The first step is to exclude some very short tracks (< 7 locations or < 7 days) from the data, as they don’t add much and they can be problematic for other computations. For this purpose, it is more convenient to convert the MoveStack object to a data frame (we’ll convert it back to a MoveStack object at the animation step).

# Convert the MoveStack object to a data frame:
dat.trks.df <- as(dat.trks, "data.frame")
#glimpse(dat.trks.df) # 15777x37

# Identify the very short tracks:
trks.short <- dat.trks.df %>% 
  group_by(trackId) %>% 
  summarise(nlocs = n(),
            duration = as.Date(first(deploy.off.date)) - 
              as.Date(first(deploy.on.date))) %>% 
  filter(nlocs < 7 | duration < 7) # 37x3
#glimpse(trks.short) # there are 37 short tracks

# Subset data to the long tracks:
dat.trks.long.df <- dat.trks.df %>% 
  filter(!trackId %in% trks.short$trackId) %>% 
  droplevels() # 15585x37

Now we can proceed to exclude the location outliers that still remain in the tracking data. For this purpose, we’ll use the filtering protocol implemented in the function argosfilter::sdafilter(), which uses a combination of speed, distance, and angle criteria between consecutive locations to identify outliers (Freitas et al. 2008). The sdafilter is especially indicated for Argos tracks from marine animals, whose locations are predominantly of low quality (the Douglas filter available on Movebank is another good alternative). The sdafilter filter allows for specification of the speed, distance, and angle criteria, but we’ll use the default values. Note that the sdafilter filter also considers the Argos location class (LC), and it appears that LC 3 were incorrectly imported as blank (“”) in the tracking data, so we first need to fix these, using the convenient function forcats::fct_recode.

# Inspect the LCs:
#levels(dat.trks.long.df$argos.lc)
# ""  "0" "1" "2" "A" "B" "Z"
#table(dat.trks.long.df$argos.lc)
#   ""    0    1    2    A    B    Z 
# 269 4266 2080  653 2781 4690 1038 

# Fix the missing LC 3 values and convert the argos.lc column from factor to
# character:
dat.trks.long.df <- dat.trks.long.df %>% 
  mutate(argos.lc = fct_recode(argos.lc, "3" = ""), 
         argos.lc = as.character(levels(argos.lc))[argos.lc])
#glimpse(dat.trks.long.df)

# Run the sdafilter on each track and exclude locations flagged for removal:
# (sdafilter defaults are: vmax = 2, ang = c(15, 25), distlim = c(2500, 5000)
dat.trks.long.df <- dat.trks.long.df %>% 
  group_by(trackId) %>% 
  mutate(cfilter = sdafilter(lat = location.lat, 
                             lon = location.long, 
                             dtime = timestamps, 
                             lc = argos.lc)) %>% 
  filter(cfilter != "removed") %>%  # == "not"
  ungroup() %>% 
  droplevels() # just in case full tracks are removed in the process
#glimpse(dat.trks.long.df) # NOW 10312x38
#15585-10312 # 5273 locs were removed by the sdafilter

Now we can generate an inventory of the clean tracking data.

# Create a summary for the long tracks, including track ID, the number of
# locations (i.e., rows) per track, the tracking duration, the deployment on and
# off timestamps, and the deployment on and off year. Also add a column with the
# difference between year on and year off, which will come in handy later.
trks.long.info <- dat.trks.long.df %>% 
  group_by(trackId) %>% 
  summarise(nlocs = n(),
            duration = as.Date(first(deploy.off.date)) - 
              as.Date(first(deploy.on.date)), 
            date.on = as.POSIXct(first(deploy.on.date)), 
            date.off = as.POSIXct(first(deploy.off.date)), 
            year.on = year(first(deploy.on.date)), 
            year.off = year(first(deploy.off.date))) %>% 
  mutate(year.diff = year.off - year.on) # 106x8
#glimpse(trks.long.info)
#unique(trks.long.info$year.diff) # 0 1 2

# Create another data frame summarizing the number of tags deployed in each year:
n.trks.yr <- trks.long.info %>% 
  count(year.on) # 12x2
#range(n.trks.yr$year.on) # on 12 years between 1994 and 2008...
#range(n.trks.yr$n) # between 1 and 16 tags were deployed

After excluding short tracks, the MoveStack object now has 106 tracks spanning the years 1994 and 2008. In each year, between 1 and 16 tags were deployed:

Table 1: Summary of blue whale tracks available by year
Yr Deployed No. tags
1994 2
1995 8
1998 7
1999 15
2000 6
2001 1
2002 2
2004 16
2005 14
2006 9
2007 14
2008 12

Next we deal with the timestamps. An animation showing individual colors for such a high number of animals (106) and spanning such a long period of time (1994-2008) would be rather tedious and impractical. Instead, it would be more useful to represent all the tracks simultaneously over the course of a “common year” to visualize the seasonal pattern in movements. For this purpose, several manipulations are necessary for generating a date field spanning a common year for all the tracks (e.g., using 1994 as the common year). We’ll use the convenient function lubridate::update() for this purpose.

# Create additional columns in the data frame containing the necessary pieces
# for date manipulation. Put all years into a common one while accounting for
# tracks that cross into the next year (or two, in one case). For tracks that
# span across one or more years, add a year according to trks.long.info$year.diff
# using lubridate::update().
dat.trks.long.df <- dat.trks.long.df %>% 
  mutate(year.diff.full = rep.int(trks.long.info$year.diff, 
                                  trks.long.info$nlocs), 
         year.diff.inst = rep.int(trks.long.info$year.off, 
                                  trks.long.info$nlocs) - year(timestamps), 
         year.shift = year.diff.full - year.diff.inst, 
         timestamps.new = update(timestamps, 
                                 year = min(trks.long.info$year.on) + year.shift))
#glimpse(dat.trks.long.df) # NOW 10312x42
#range(dat.trks.long.df$timestamps)
# "1994-09-14 21:22:00 UTC" "2009-04-10 12:34:34 UTC"
#range(dat.trks.long.df$timestamps.new)
# "1994-01-14 05:00:00 UTC" "1996-01-06 16:08:02 UTC"
#plot(dat.trks.long.df$timestamps, dat.trks.long.df$timestamps.new)

Note that lubridate::update() correctly accounts for leap years in the data set (1996, 2000, 2004, 2008) by rolling over the date of locations on February 29 to March 1. However, this causes an issue for locations with original and rolled over dates both being on March 1 in the common year, so it is best to remove all locations occurring on February 29 of leap years from the data.

# Create a column in dat.trks.long.df identifying locations occurring on
# February 29 of a leap year:
dat.trks.long.df <- dat.trks.long.df %>% 
  mutate(ind.bad.date = leap_year(timestamps) & yday(timestamps) == 60) # NOW 10312x43
#which(dat.trks.long.df$ind.bad.date) # 14

# Exclude the 14 offending locations from the data frame version of the data set:
dat.trks.long.df <- dat.trks.long.df %>% 
  filter(!ind.bad.date)
#glimpse(dat.trks.long.df) # NOW 10298x43

The last issue to deal with is one very long track (“X2004CA.Bmu.00840”) lasting 504 days and spanning across two calendar years, which is inconvenient for the animation, as we want it to span across a single year. For this purpose it is simplest to exclude the data for this track extending past the next longest track, which is top_n(trks.long.info, 2, duration): 326 days:

# Determine the last date of the tracking data set without the long track:
#trk.vlong <- dat.trks.long.df %>% 
#  filter(trackId == "X2004CA.Bmu.00840") # 226x43
#range(trk.vlong$timestamps.new)
# "1994-08-20 17:06:00 UTC" "1996-01-06 16:08:02 UTC"
trk.rest <- dat.trks.long.df %>% 
  filter(trackId != "X2004CA.Bmu.00840") # 10072x43
#range(trk.rest$timestamps.new)
# "1994-01-14 05:00:00 UTC" "1995-08-05 23:51:58 UTC"

# Exclude the locations from the long track occurring after "1995-08-05 23:51:58
# UTC" from the data set:
dat.trks.long.df <- dat.trks.long.df %>% 
  filter(timestamps.new <= max(trk.rest$timestamps.new)) # NOW 10262x43
#10298-10262 # 36 locations from the very long track excluded

Animating the data over the course of a common year

Now the data are ready for animation, but first we must convert the final movement data back to a MoveStack object. This can be done with move::move() or moveVis::df2move:

# Convert the final data frame back to a MoveStack object:
dat.trks.final <- df2move(dat.trks.long.df, 
                          proj = "+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0", 
                          track_id = "trackId", 
                          x = "location.long", 
                          y = "location.lat", 
                          time = "timestamps.new")
# A Large MoveStack (10262 elements, 1.4 Mb)
#glimpse(dat.trks.final)
#plot(dat.trks.final, type = "b", pch = 20)

# We can also add the original timestamps to keep it together with the new
# MoveStack:
dat.trks.final@data$timestamps.orig <- dat.trks.long.df$timestamps

Align the movement data to a uniform time scale with a consistent temporal resolution:

#  Use 1-day intervals:
dat.trks.int <- align_move(dat.trks.final, res = 1, digit = 0, unit = "days", 
                          spaceMethod = "greatcircle")
# A Large MoveStack (11373 elements, 4.5 Mb)
#show(dat.trks.int) # 11373 timestamps

# Extract the unique timestamps from the aligned data frame, indicating the
# number of individual frames that will be contained in the animation. Use this
# timestamp to derive the month and year day in the common year for labeling the
# maps.
#n_distinct(timestamps(dat.trks.int)) # 564 unique timestamps
frames.ts <- sort(unique(timestamps(dat.trks.int))) # 564x1
lbl.tstamp <- paste("Month:", toupper(month(frames.ts, label = TRUE)), 
                    "| Year day:", yday(frames.ts), sep = " ") # 564x1

Animate the aligned tracks (colored by year) over the course of a common year with the magic of moveVis:

# Select 12 colors from the ColorBrewer palette "Set3":
#col12CB <- rev(brewer.pal(12, "Set3"))
#display.brewer.pal(12, "Set3")
# Or join two smaller palettes with bolder colors to obtain 12 colors (avoid
# light yellow, which offers poor contrast):
col12CB <- rev(c(brewer.pal(8, "Dark2"), 
             brewer.pal(5, "Accent")[c(1, 2, 3, 5)])) # 12 colors

# Replicate each color according to the number of tracks in each year:
col12CBx106 <- rep.int(col12CB, n.trks.yr$n) # 106

# Generate frames for the animation. Supply a fake timestamp and guide legend:
frames <- frames_spatial(m = dat.trks.int, 
                         path_colours = col12CBx106, 
                         path_size = 2, # default is 3
                         map_service = "carto", 
                         map_type = "voyager_no_labels", 
                         path_legend = FALSE) %>% 
  add_labels(x = "Longitude", y = "Latitude", 
             title = "Blue whale Argos tracks, 1994-2008", 
             subtitle = "Oregon State University, Marine Mammal Institute") %>% 
  add_text(labels = lbl.tstamp, x = -117, y = 57, colour = "black", 
           size = 6, type = "label") %>% 
  add_text(labels = "D.Palacios, OSU", x = -90, y = 1.5, colour = "black", 
           size = 3, type = "text") %>% 
  add_gg(gg = expr(annotate(geom = "label", x = -90, 
                            y = seq(from = 54, by = -2.25, len = 12), 
                            label = n.trks.yr$year.on, colour = "white", 
                            fill = col12CB, size = 5, fontface = "bold")))
# A Large list (564 elements, 40 Mb)
# Preview one of the frames (total = 564 @ 1-day resolution):
#frames[[320]]

Render the animation as a gif for export:

animate_frames(frames, 
               out_file = "bw_moveVis_yr.gif", 
               overwrite = TRUE, 
               fps = 5) # 15 Mb
Figure 2: The final result – an animation of 106 blue whale Argos tracks in the Movebank Data Repository.

Figure 2: The final result – an animation of 106 blue whale Argos tracks in the Movebank Data Repository.

And voila! The animation has 564 daily frames spanning the common period January 14 (day 14) of the first year to August 5 (day 212) of the following year. At 5 frames per second it takes 113 seconds to play. The tracks are colored by year of tagging according to the key shown on the map, with the number of tracks in each year as indicated in the table above. A total of 106 animals are represented in the animation, although the number of animals shown at any given time varies greatly, reflecting the timing and location of tagging, as well as the variable duration of the individual tags. Tagging was primarily conducted off the California coast in summer-fall during the feeding season, but in three years (2001, 2002, 2008) a few animals were also tagged in winter in the breeding gounds at the Costa Rica Dome and the Gulf of California (see Bailey et al. 2009). Thus, the animation starts out in January, showing the movements of the few animals tagged in the breeding grounds in winter and their northward migration in spring to the Pacific coast of Baja California. Around day 200 we start seeing the bulk of the animals tagged in the summer off California, as they spend the feeding season off the western coast of North America (including offshore waters). By late fall (day 300) we see the start of the southward migration back to the breeding grounds in the Gulf of California and the Costa Rica Dome, which continues into the winter. Finally, we see once again the northward movement toward California for the few tags that lasted into the spring and summer of the following year. While most animals conform to this seasonal pattern, it is interesting to observe a few animals moving contrary to this pattern, especially during the winter months towards offshore waters in high latitudes. Much more can be said about these movements, and we are still mining new information out of this data set. Stay tuned for new findings… Meanwhile enjoy the fireworks!

References

Abrahms B, Hazen EL, Aikens EO, Savoca MS, Goldbogen JA, Bograd SJ, Jacox MG, Irvine LM, Palacios DM, Mate BR (2019) Memory and resource tracking drive blue whale migrations. Proceedings of the National Academy of Sciences 116 (12) 5582-5587. doi:10.1073/pnas.1819031116

Bailey H, Mate BR, Palacios DM, Irvine L, Bograd SJ, Costa DP (2009) Behavioural estimation of blue whale movements in the Northeast Pacific from state-space model analysis of satellite tracks. Endangered Species Research 10:93-106. doi:10.3354/esr00239

Freitas C, Lydersen C, Ims RA, Fedak MA, Kovacs KM (2008) A simple new algorithm to filter marine mammal Argos locations. Marine Mammal Science 24:315-325. doi:10.1111/j.1748-7692.2007.00180.x

Mate BR, Palacios DM, Irvine LM, Follett TM (2019) Data from: Behavioural estimation of blue whale movements in the Northeast Pacific from state-space model analysis of satellite tracks. Movebank Data Repository. doi:10.5441/001/1.5ph88fk2