Creating a trial-based time variable

Unprocessed eye tracker data does not come with a time variable that starts at 0 when a trial begins. In this post I show how to create this variable.

One of the first issues I ran into when starting to analyze eye tracker data was that the raw data does not contain a trial-based time variable. I expected that the timestamp variable would be a variable that starts at 0 when a trial begins and that it would keep adding up at a rate equal to the sampling frequency, until the end of the trial. Instead, you are likely to get a timestamp variable that looks like a random set of numbers, as shown here:


# A tibble: 10 x 4
   subject timestamp trial pupil
     <int>     <int> <int> <dbl>
 1       1 212275472     1  3.73
 2       1 212292222     1  3.74
 3       1 212308845     1  3.74
 4       1 212325470     1  3.76
 5       1 212342094     1  3.75
 6       1 212358844     2  3.76
 7       1 212375469     2  3.77
 8       1 212392094     2  3.76
 9       1 212408718     2  3.77
10       1 212425344     2  3.76

The timestamp variable actually reflects the internal clock of the hardware used to get the data. This means that each measurement recording has a specific clock time associated with it, rather than a time stamp related to an event in the experiment.

Fortunately, it’s relatively easy to turn this variable into a more useful variable. What we want is a variable, say time, that starts at 0 when a trial begins. The subsequent measures, in the same trial, should then be timed relative to the start of the trial. We want this for every trial.

The required steps to get this variable are as follows:

  1. For each trial, get the minimum of the timestamp variable (e.g., 212275472 in trial 1)
  2. Repeat this value across the entire trial
  3. Subtract this value from the timestamp variable

In R, using the tidyverse, this is done like this:


data <- data %>%
  group_by(subject, trial) %>%
  mutate(time = (timestamp - min(timestamp)) / 1000)

We take our data frame, group the data by subject and trial (because we want the minimum for each individual trial), and create a new variable called time that is equal to the value in timestamp minus the minimum of the timestamp for that trial. Additionally, we divide the result by a 1000 because the internal clock is in microseconds, and I prefer milliseconds. The result is this:


# A tibble: 10 x 5
# Groups:   subject, trial [2]
   subject timestamp trial pupil  time
     <int>     <int> <int> <dbl> <dbl>
 1       1 212275472     1  3.73   0  
 2       1 212292222     1  3.74  16.8
 3       1 212308845     1  3.74  33.4
 4       1 212325470     1  3.76  50.0
 5       1 212342094     1  3.75  66.6
 6       1 212358844     2  3.76   0  
 7       1 212375469     2  3.77  16.6
 8       1 212392094     2  3.76  33.2
 9       1 212408718     2  3.77  49.9
10       1 212425344     2  3.76  66.5

We see that our new variable time indeed starts at 0, adds up until the next trial, where it starts at 0 again. Excellent!