Divergence analysis: Part 1

Part 1/2 on how to do a divergence analysis in R.

A common question in pupillometry is whether two different kinds of stimuli produce a different pupillary response. For example, we may be interested in seeing whether negative feedback (i.e., being informed that you are mistaken) produces a larger pupillary response compared to positive feedback. A typical experimental task for this question consists of multiple trials in which the participant gives a response. The participant is then presented with the feedback, and their pupillary response is measured for a specific duration. We want to test at which time points during the feedback presentation the pupillary response to negative feedback is different from the pupillary response to positive feedback. In two posts, I will show two methods to do this.

The methods I will discuss are based on a cool R package for analyzing eye tracking data, called eyetrackingR. The authors of eyetrackingR have also created an amazing website that explains how to use the package, which can be found here. They have a section on estimating divergences that covers multiple methods for figuring out when exactly two responses are different from each other. Two of these methods I will cover in these blog posts.

However, instead of using their package to analyze the results, I am going to repeat the steps that they describe in estimating divergences using standard tidyverse functions. So, I won’t actually be using their package. The reason for that is that when it comes to data analysis, I want to know what’s happening behind the scenes (to some degree, at least). This means I want to always deal with raw data (as opposed to pre-processed data) and also try and write the code for each data-processing step myself. The end result is that I feel more confident about what it is that I’m doing. In the case of estimating divergences, it’s actually quite easy to start with raw data and process it to get what we want, so let’s get started.


In the vignette on estimating divergences, the authors of eyetrackingR use data from a 2-alternative forced choice (2AFC) word recognition task administered to 19- and 24-month-olds. On each trial, infants were shown a picture of an animate object (e.g., a horse) and an inanimate object (e.g., a spoon). After inspecting the images, the images disappeared and they heard a label referring to one of them (e.g., “The horse is nearby!”). Finally, the objects re-appeared on the screen and they were prompted to look at the target (e.g., “Look at the horse!”). The measure of interest is the proportion that the infants looked at the target, at every time point during the target presentation. Although this is not a pupillometry study, the methods below can also be applied to pupillometry data.

Let’s begin by loading the relevant packages, reading in the data, and a bit of data cleaning.

# Load packages

# Load data
word_recognition <- read_csv("../../static/data/word_recognition.csv", 
  col_types = list(Sex = "c"))

# Set a default ggplot theme

# Rename some of the variables to have a lower_case style, rather than camelback
word_recognition <- word_recognition %>%
    time_from_trial_onset = TimeFromTrialOnset,
    participant = ParticipantName,
    trial = Trial,
    track_loss = TrackLoss,
    animate = Animate

# Check out the result

Observations: 195,912
Variables: 15
$ participant           <chr> "ANCAT139", "ANCAT139", "ANCAT139",...
$ Sex                   <chr> "F", "F", "F", "F", "F", "F", "F", ...
$ Age                   <dbl> 21.86, 21.86, 21.86, 21.86, 21.86, ...
$ TrialNum              <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
$ trial                 <chr> "FamiliarBottle", "FamiliarBottle",...
$ time_from_trial_onset <int> 0, 17, 33, 50, 67, 83, 100, 117, 13...
$ Subphase              <chr> "Preview", "Preview", "Preview", "P...
$ TimeFromSubphaseOnset <int> 0, 17, 33, 50, 67, 83, 100, 117, 13...
$ AOI                   <chr> "TrackLoss", "TrackLoss", "TrackLos...
$ animate               <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,...
$ Inanimate             <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,...
$ track_loss            <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,...
$ MCDI_Total            <int> 47, 47, 47, 47, 47, 47, 47, 47, 47,...
$ MCDI_Nouns            <int> 29, 29, 29, 29, 29, 29, 29, 29, 29,...
$ MCDI_Verbs            <int> 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,...

We load in the tidyverse package and the broom package. The broom package is a great package that can be used to turn the output of statistical tests into easy to work with data frames. When reading in the data, we have to specify the column type for Sex because the values in Sex start with a bunch of F’s, which read_csv() thinks represent logical values rather than string values, so we explicitly tell it to read in the data as a string.

The next step is to filter out some of the data. Specifically, we want to only have the observations in the period between 15500 and 21000ms (as per the vignette in eyetrackingR).

response_window <- filter(word_recognition, time_from_trial_onset >= 15500 & 
    time_from_trial_onset < 21000)

Next, we want to calculate the amount of missing data we have. First, we group the data by participant and trial. We then use summarize() to count the number of observations and the number of observations that are missing. The percentage of missing data can then easily be calculated by dividing the amount of missing observations by the total number of observations.

track_loss <- response_window %>%
  group_by(participant, trial) %>%
    n = n(), 
    missing = sum(track_loss)
    ) %>%
  mutate(missing_by_trial = missing / n)

The next step is removing trials with too much missing data. We add the track loss information to the original data frame, so we can perform a simple filter. We remove all observations of trials with more than 25% missing data, by selecting all the rows with less than or equal to 25% missing data.

response_window_clean <- left_join(response_window, track_loss, 
  by = c("participant", "trial"))
response_window_clean <- filter(response_window_clean, missing_by_trial <= .25)

The question we want to answer with the data is whether infants are more likely to look at the target stimulus when the target reflects an animate object or an inanimate object. The data does not yet contain a variable indicating whether a trial displayed an animate or inanimate target, so we create one.

response_window_clean <- mutate(response_window_clean, target = 
    if_else(str_detect(trial, "(Spoon|Bottle)"), "Inanimate", "Animate"))

Following this, we group the data into time bins and create a new variable indicating whether the infant looked at the target. We calculate the proportion of samples that the infant looked at the target, relative to the total number of samples.

response_time <- response_window_clean %>%
    time_bin = floor(time_from_trial_onset / 100),
    hit = if_else(AOI == "Animate", 1, 0)
  ) %>%
  group_by(participant, target, time_bin) %>%
    samples_in_AOI = sum(hit, na.rm = TRUE),
    samples_total = n() - sum(track_loss, na.rm = TRUE),
    proportion = samples_in_AOI / samples_total

We are now ready to produce the first graph.

ggplot(response_time, aes(x = time_bin, y = proportion)) +
  stat_summary(fun.data = mean_se, geom = "ribbon", alpha = .25, 
    aes(fill = target)) +
  stat_summary(fun.data = mean_se, geom = "line", aes(color = target)) +
  coord_cartesian(ylim = c(0, 1)) +
  labs(x = "Time in trial", y = "Looking at AOI: Animate (proportion)", 
    fill = "Target", color = "Target")

The graph shows that there is an effect of Target. Yet, we also see that at the start of the trial there is no effect of Target, and it also seems to diminish later on in the trial. The goal is now to figure out when exactly there is a difference between the two lines.

Many many t-tests

In the divergence vignette, the authors of eyetrackingR show a simple but crude method to estimate divergences. It consists of performing a statistical test on each time bin separately. For example, we could perform a t-test on each time bin. If the result is significant, we can conclude that there’s an effect, at least for that time bin. In the eyetrackingR package, there is a function called analyze_time_bins() to do that. However, we can do it ourselves using a collection of tidyverse functions and the broom package.

We split up the data by each time bin, using the split() function. This creates a list in which each element consists of the data belonging to a particular time bin. It is on this data that we perform a t-test. Using map_df(), we apply the tidy() function from the broom package to the result of each t-test, which turns the output of the t-test into a data frame. map_df() then automatically merges all data frames together into one data frame. On this data frame, we convert the time_bin variable to a numeric variable, we calculate an adjusted p-value to correct for multiple comparisons, and we calculate the standard error of the estimate. Finally, we make sure the data frame is a tibble data frame (because it has better printing functions; see here for more information on tibbles). The code looks like this:

tb_analysis <- response_time %>%
  split(response_time$time_bin) %>%
  map(~ t.test(proportion ~ target, data = .)) %>%
  map_df(tidy, .id = "time_bin") %>%
    time_bin = as.numeric(time_bin),
    p.value.adjusted = p.adjust(p.value, method = "holm"),
    se = (conf.high - conf.low) / (1.96 * 2)
    ) %>%

This code results in the following data frame:

# A tibble: 55 x 13
   time_bin estimate estimate1 estimate2 statistic p.value parameter
      <dbl>    <dbl>     <dbl>     <dbl>     <dbl>   <dbl>     <dbl>
 1      155 -0.00618     0.782     0.788   -0.0695 9.45e-1      48.1
 2      156  0.00518     0.788     0.783    0.0575 9.54e-1      47.9
 3      157  0.0270      0.790     0.763    0.290  7.73e-1      48.8
 4      158  0.122       0.853     0.731    1.24   2.20e-1      45.3
 5      159  0.118       0.843     0.725    1.18   2.43e-1      47.8
 6      160  0.193       0.874     0.680    1.92   6.20e-2      43.9
 7      161  0.289       0.846     0.556    2.70   9.63e-3      46.3
 8      162  0.347       0.882     0.535    3.34   1.72e-3      43.4
 9      163  0.403       0.916     0.513    4.26   1.27e-4      38.7
10      164  0.493       0.918     0.426    5.13   9.31e-6      37.3
# ... with 45 more rows, and 6 more variables: conf.low <dbl>,
#   conf.high <dbl>, method <chr>, alternative <chr>,
#   p.value.adjusted <dbl>, se <dbl>

In eyetrackingR, the result of analyze_time_bins() can be plotted with a cool plot that shows exactly which time bin passes a certain threshold (e.g., significance). We can make a similar plot, but for that we first need to perform a few calculations. First, we need to decide what counts as a pass. In the vignette, they use a t-statistic threshold of 2. They then calculate in which windows of time the t-statistic exceeds this threshold. We can do the same in the following way:

clusters <- tb_analysis %>%
    pass = if_else(statistic >= 2, 1, 0),
    cluster = if_else(pass == 1 & lag(pass) == 0, 1, 0),
    cluster = if_else(pass == 1, cumsum(cluster), NA_real_)
    ) %>%
  group_by(cluster) %>%
    start = min(time_bin),
    end = max(time_bin)
  ) %>%

We first create a pass variable that indicates whether the statistic exceeded the threshold. We then figure out clusters of significance. That is, we figure out when a statistic first exceeds the threshold and see how many of the subsequent bins also exceed the threshold. A sequence of time bins in which the statistic exceeds the threshold is considered a cluster. There could be 0 clusters (not a single statistic exceeds the threshold) or there could be multiple. In our case, we find two clusters:

# A tibble: 2 x 3
  cluster start   end
    <dbl> <dbl> <dbl>
1       1   161   192
2       2   194   209

The first cluster starts in time bin 161 and ends in 192. The second cluster starts in 194 and ends in 209. We can visualize this with the following code:

ggplot() +
  geom_line(data = tb_analysis, aes(x = time_bin, y = estimate, group = 1)) +
  geom_ribbon(data = tb_analysis, aes(x = time_bin, ymax = estimate + se, 
    ymin = estimate - se), alpha = .20) +
  geom_rect(data = clusters, aes(xmin = start, xmax = end, ymin = -Inf, 
    ymax = Inf), alpha = .20) +
  labs(x = "Time", y = "Parameter estimate (+/- Std. Error")

However, as the authors of eyetrackingR correctly note, these results are not corrected for multiple comparisons. So, instead we can base the threshold on a corrected p-value. In our case, we applied a Holm correction to the p-values. We then repeat our cluster code, changing the threshold to .05 for the adjusted p-value.

The adjusted results can be found below:

# A tibble: 2 x 3
  cluster start   end
    <dbl> <dbl> <dbl>
1       1   163   184
2       2   206   207

So, by performing a sequence of t-tests we can figure out which time bins show a significant effect. We can even produce nice plots that show which clusters of time bins are significant. It’s an easy method, but it does come at a cost. This method is not particularly powerful, so we may conclude that some time bins do not show a significant effect, even though there is an effect. Thankfully there are a few alternative methods. In the estimating divergences vignette of eyetrackingR, two alternative methods are discussed. The first is a bootstrapped smoothed divergence analysis and the second is a bootstrapped cluster-based permutation analysis. I will cover this last analysis in Part 2 of this blog post.