In this post, I will illustrate how to use
tidystats by analyzing data of the
Many Labs 1 (Klein et al. 2014) replication of Lorge & Curtiss (1936).
Analyzing the data
Lorge and Curtiss (1936) examined how a quotation is perceived when it is attributed to a liked or disliked individual. In one condition the quotation was attributed to Thomas Jefferson and in the other it was attributed to Vladimir Lenin. They found that people agree more with the quotation when the quotation was attributed to Jefferson than Lenin. In the Many Labs replication study, the quotation was attributed to either George Washington, the liked individual, or Osama Bin Laden, the disliked individual. The also used a different quotation, which was:
I have sworn to only live free, even if I find bitter the taste of death.
We are again interested in testing whether the source of the quotation affects how it is evaluated. The evaluation was assessed on a 9-point Likert scale ranging from 1 (strongly agree) to 9 (strongly disagree). I reverse coded this in the data that we’ll use. You can follow along by copy-pasting the code from this example.
Before getting into how
tidystats should be used, let’s first simply
analyze the data. I have designed
tidystats to be minimally invasive.
In other words, to use
tidystats, you do not need to substantially
change your data analysis workflow.
We’ll start with a basic setup where we load some packages and the data.
# Load necessary packages library(tidyverse) library(tidystats) # Load example data data <- data(quote_source) # Create color variables for the plots background_color <- "#1a1a1a" green <- "#00B88D" yellow <- "#f39c12"
Our main effect of interest is the difference in responses to the quote between the two conditions. Here I visualize this different with a violin plot.
ggplot(quote_source, aes(x = source, y = response, fill = source)) + geom_violin(alpha = 0.95, color = "transparent") + stat_summary(fun = "mean", geom = "point", color = background_color) + scale_y_continuous(breaks = c(1, 3, 5, 7, 9)) + labs(x = "Quote source", y = "Quote agreement") + scale_fill_manual(values = c(green, yellow)) + guides(fill = FALSE)
This looks like the effect is in the expected direction. Participants agreed more with the quotation when they believed the quote to be from George Washington compared to Osama Bin Laden.
tidystats comes with its own functions to
calculate descriptives. One of them is the
inspired by the
describe function from the
psych package. You can
use it together with
group_by from the
dplyr package to calculate a
set of descriptives for multiple groups.
quote_source %>% group_by(source) %>% describe_data(response)
To test whether the difference in agreement between the two sources is statistically significant, we perform a t-test. Normally, we would just run the t-test like so:
t.test(response ~ source, data = quote_source)
However, since we want to use
tidystats to later save the statistics
from this test, we will store the output of the t-test in a variable.
This, and the final section of R code, will be the only thing you need
to change in order to incorporate
tidystats in your workflow.
Once you’ve stored the result of the t-test in a variable, you can look at the output by sending it the console, which will print the output.
main_test <- t.test(response ~ source, data = quote_source) main_test
|t||p||df||difference||95% CI lower||95% CI upper|
This shows us that there is a statistically significant effect of the quote source, consistent with the hypothesis.
Next, let’s run some additional analyses. One thing we can test is
whether the effect is stronger in the US compared to non-US countries.
To test this, we perform a regression analysis. Here we also store the
result in a variable, but this is actually quite common in regression
analyses because you want to apply the
summary function to this
variable in order to obtain the inferential statistics.
us_moderation_test <- lm(response ~ source * us_or_international, data = quote_source) summary(us_moderation_test)
There appears to be a significant interaction. Let’s inspect the interaction with a graph:
ggplot(quote_source, aes(x = us_or_international, y = response, fill = source)) + geom_violin(alpha = .95, color = "transparent") + stat_summary(fun.data = "mean_cl_boot", position = position_dodge(.9), color = background_color) + scale_fill_manual(values = c(green, yellow)) + scale_y_continuous(breaks = c(1, 3, 5, 7, 9)) + labs(x = "Region", y = "Quote agreement", fill = "Source")
We see that the effect of the source appears to be larger in the US. Given that the positive source was George Washington, this makes sense.
Let’s do one more analysis to see whether the effect is stronger in a lab setting compared to an online setting.
lab_moderation_test <- lm(response ~ source * lab_or_online, data = quote_source) summary(lab_moderation_test)
We see no significant interaction in this case. This means we do not find evidence that running the study in an online setting significantly weakens the effect; good to know!
Now let’s get to
tidystats. We have three analyses we want to save: a
t-test and two regression analyses. We stored each of these analyses
in separate variables, called
The main idea is that we will add these three three variables to a list
and then save the list as a file on our computer. You create an empty
list using the
list function. Once you have an empty list, you can add
statistics to this list using the
accepts a list as its first argument, followed by a variable containing
a statistics model. In our case, this means we need to use the
add_stats function three times, as we have three different analyses we
want to save. Since this can get pretty repetitive, we will use the
piping operator to pipe the three steps together and save some typing.
Before we do so, however, note that we can take this opportunity to add
some meta-information to each test. For the sake of this example, let’s
say that the t-test was our primary test. We also had a suspicion that
the location (US vs. international) would matter, but it wasn’t our main
interest. Nevertheless, we preregistered these two analyses. During data
analysis, we figured that it might also matter whether the study was
conducted in the lab or online, so we tested it. This means that this is
an exploratory analysis. With
add_stats, we can add this information
when we add the test to our empty list.
In the end, the code looks like this:
results <- list() results <- results %>% add_stats(main_test, type = "primary", preregistered = TRUE) %>% add_stats(us_moderation_test, type = "secondary", preregistered = TRUE) %>% add_stats(lab_moderation_test, type = "secondary", preregistered = FALSE)
I recommend to do this at the end of the data analysis script in a
section called ‘tidystats.’ This confines most of the
to a single section, keeping it organized, and it will keep most of your
script readable to those unfamiliar with
After all the analyses are added to the list, the list can be saved as a
.json file to your computer’s disk. This is done with the
function. The function requires the list as its first argument, followed
by a file path. I’m a big fan of using RStudio Project files so that you
can define relative file paths. In this case, I create the .json file in
the ‘Data’ folder of my project folder.
If you want to see what this file looks like, you can inspect it here. Open the file in a text editor to see how the statistics are structured. As you will see, it is not easy for our human eyes to quickly see the results, but it’s easy for computers.
Once you’ve saved the file, you can share the file with others or use it
to report report the results in your manuscript. Below I show a video of
how you can upload the file in the
tidystats Word add-in and report
This post was last updated on 2021-07-09.