1/26/2018

Part 1: The why

Science 2.0

In the last several years we have seen many improvements to how we do science. We now:

  • Replicate more
  • Pre-register more
  • Share more

Science 2.0

In the last several years we have seen many improvements to how we do science. We now:

  • Replicate more
  • Pre-register more
  • Share more

But I think we can do better.

PDF prisons

Although we share more, we mostly share our data and the scripts needed to analyze that data.

What we do not share, is the output of our analyses. These can only be found in our papers.

PDF prisons

Although we share more, we mostly share our data and the scripts needed to analyze that data.

What we do not share, is the output of our analyses. These can only be found in our papers.

Resulting in what I call PDF prisons.

Why is the output important?

Statistical output is important because:

  • Hypothesis testing (duh)
  • Effect size estimation
    • e.g., practical application, power analyses
  • Assessing evidential value
    • e.g., p-curve analysis

Why is the output important?

Statistical output is important because:

  • Hypothesis testing (duh)
  • Effect size estimation
    • e.g., practical application, power analyses
  • Assessing evidential value
    • e.g., p-curve analysis


In short, it's super important.

Yet…

We see the Results section not just as a collection of statistics, but also a story, aimed at a reader with limited time and motivation.

  • Creating a conflict between being comprehensive (reporting everything) and providing a streamlined story

And we do not have a good workflow solution that allows us to easily organize the output of statistical analyses and communicate this.

  • Meaning we have to manually copy over statistical results into or out of a manuscript

Two problems in statistics reporting

I think these observations are responsible for two problems in statistics reporting:

  1. Not all statistics are reported
  2. Not all statistics are correctedly reported

Insufficient reporting example #1

Summarizing statistics

"We found no other main or interaction effects (all Fs < 0.66, all ps > .42). The model examining assessments of men’s warmth revealed no main or interaction effects (all Fs < 0.3, all ps > .58)."

Insufficient reporting example #2

Arbitrary cut-off points

"In contrast, participants’ support for group-based dominance did not differ between the low (M = 2.21, SD = 1.23) and moderate (M = 2.20, SD = 1.24) social mobility conditions, F(1, 193) = 0.001, p > .250, d = .01."

Insufficient reporting example #3

Regression tables without exact p-values, confidence intervals, etc.

Incorrect reporting

Inaccuracies in statistics reporting are prevalent (Nuijten et al., 2017).

In psychology, about half of all articles contain at least one inconsistent result in which the reported p-value does not correspond with the accompanying test statistic and degrees of freedom (Nuijten, Hartgerink, Assen, Epskamp, & Wicherts, 2015).

Incorrect reporting causes

Probable reasons for these errors are (Lakens, 2015):

  • Typos
  • Copy-paste errors
  • Incorrect rounding
  • Incorrect use of '<' instead of '='
  • (Not mentioning the test was one-sided)


statcheck was created to detect inconsistencies in statistics reporting, and can also be used to correct mistakes.

Simply go to http://www.statcheck.io and upload your manuscript. statcheck will then extract a portion of the statistics and check whether they are consistent.

statcheck limitations

However, statcheck cannot prevent all mistakes

  • it cannot extract all statistics (e.g., from tables)
    • extracts about 61% of the statistics
  • it is good (really good!), but not perfect

"We stress that statcheck is an algorithm that will, like any automated procedure exposed to real-world data, sometimes lead to false positives or false negatives. These limitations should be taken into account, preferably by manually double-checking inconsistencies detected by statcheck." (Nuijten, Assen, Hartgerink, Epskamp, & Wicherts, 2017)

Silliness

I think it is silly to have to go through the effort of text extraction to get at the output of statistical analyses.

Whether it is to check for mistakes or for extracting the statistics needed for your research.

What is a better solution?

Add an extra step in the data analysis workflow:

  • Before writing up the results, first organize the output of the statistical analyses in a separate text file
    • Storing statistics outside of the manuscript frees up the manuscript to be more streamlined
  • Then use this text file to write up the results
    • An organized output file can be used to automate statistics reporting

New data analysis workflow

  1. Data preparation
    • Reading in data
    • Preparing data for data analysis
    • Saving prepared data
  2. Data analysis
    • Test pre-registered hypotheses
    • Exploration
  3. Create statistics output file
  4. Write up results

Interim-conclusion

Interim-conclusion

There are two statistics-output related problems:

  1. Not all statistics are reported
  2. Not all statistics are correctedly reported

I believe we should work on structuring the output of statistical analyses in such a manner that we can share all output, outside of our manuscript, and use it to, without error, write up the results in our manuscript.

Part 2: The how

Challenge

How to combine the output of different analyses in a structured, sensible manner?

Challenge

How to combine the output of different analysis in a structured, sensible manner?

Solution: Make it tidy (Wickham, 2014)

Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.

Tidy statistics output

Each row is a statistic

Each column is a useful variable (e.g., the kind of statistic)

Each table is a statistical analysis

Tidy statistics output

Each row is a statistic

Each column is a useful variable (e.g., the kind of statistic)

Each table is a statistical analysis

Example:

Solution

A tidy data structure enables us to combine the output of different kinds of output, into a single file.

So, what we need is way to grab the output of a statistical analysis, make it tidy, and combine it with other analyses.

Solution

A tidy data structure enables us to combine the output of different kinds of output, into a single file.

So, what we need is way to grab the output of a statistical analysis, make it tidy, and combine it with other analyses.

…this is what tidystats does.

tidystats

tidystats is an R package to easily create a text file containing the output of statistical models.

This enables researchers to:

  1. Easily share all statistics
  2. Easily report statistics

How does it work?

The idea is that you start with an empty list, and whenever you conduct an analysis that you want to share the results of, you add it to the list.

At the end of data analysis, you take this list and simply save it as a text file.

This text file is a comma separated data file, which can be shared and read by anyone.

Installing tidystats

# Install from CRAN
install.packages("tidystats")

# Install latest version
library(devtools)
install_github("willemsleegers/tidystats")

Setup

Start by loading the package, followed by creating an empty list to store results into.

# Load package
library(tidystats)

# Create empty list
results <- list()

Example: t-test

Let's analyse Student's sleep data.

Example: t-test

t.test(extra ~ group, data = sleep, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  extra by group
## t = -1.8608, df = 18, p-value = 0.07919
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.363874  0.203874
## sample estimates:
## mean in group 1 mean in group 2 
##            0.75            2.33

Example: t-test

# Run the t-test and store the results in a variable
model1 <- t.test(extra ~ group, data = sleep, var.equal = TRUE)

# Add the output to the list
results <- add_stats(model1, results, identifier = "sleep_t_test")

More examples

# Correlation
results <- cor.test(sleep$extra, as.numeric(sleep$group)) %>%
  add_stats(results, identifier = "sleep_correlation")

# ANOVA
results <- aov(extra ~ group, data = sleep) %>%
  add_stats(results, identifier = "sleep_anova")

# Regression
results <- lm(extra ~ group, data = sleep) %>%
  add_stats(results, identifier = "sleep_regression")

I use the piping symbol (%>%) to directly put the output in add_stats().

Save results

write_stats(results, "data/results.csv")

Writing up the results

Start by reading in the text file containing the results. This creates a list (like it was when you created it).

Optionally, you can set this list as the default tidystats list in options() so that you do not need to

# Read in the results
results <- read_stats("data/results.csv")

# Optionally: Set the list as the default list in options
options(tidystats_list = results)

Results overview

Reporting statistics

Because all statistics are structered in a known manner, it is very easy to write functions that return APA-styled output.

In some cases, this means you only need to provide the identifier in order to produce an APA line of output.

For example, report("sleep_t_test") results in: t(18) = -1.86, p = .079

Reporting statistics

We can combine this with R Markdown to write a results section.

For example:

We found a significant positive effect of drugs on hours of sleep, 
`report("sleep_t_test", results = results)`.

Result:

We found a significant positive effect of drugs on hours of sleep, t(18) = -1.86, p = .079.

Real-life example

Bastian and I applied tidystats to our Airbnb paper.

Illustrates a real-life application of tidystats, and additional features such as: - Adding notes to document the analysis - Indicating whether the analysis was confirmatory or exploratory

Real-life example

Real-life example

Results (including code)

We did not find an effect of host trustworthiness, `report("M1", term = "trustworthiness_median_z")`, thereby failing to replicate the findings by Ert et al. (2016). 

Additionally, and also inconsistent with the findings of Ert et al. (2016), we did find a significant effect of host attractiveness, `report("M2", term = "attractiveness_median_z")`

Real-life example

Results (without code)

We did not find an effect of host trustworthiness, b = -0.0036, SE = 0.0047, t(1010) = -0.76, p = .45, 95% CI [-0.013, 0.0057], thereby failing to replicate the findings by Ert et al. (2016).

Additionally, and also inconsistent with the findings of Ert et al. (2016), we did find a significant effect of host attractiveness, b = 0.011, SE = 0.0048, t(1010) = 2.34, p = .020, 95% CI [0.0018, 0.021]

Limitations of tidystats

  • R-only
  • A bit of a learning curve
  • Does not yet support most analyses

Why R?

  • I know R
  • R is rapidly increasing in popularity (together with Python)
  • R does not cost the university thousands of euros
  • R is used by GUI-based data analysis software such as JASP and jamovi
  • R empowers researchers by motivating a programming mindset

Conclusion

Conclusion

I think we should have an explicit step in our data analysis workflow that is focused on the output of statistical analyses.

With the aim of sharing all statistical output.

And preventing errors in statistical reporting.

References

Lakens, D. (2015). Checking your stats, and some errors we make. Retrieved from daniellakens.blogspot.nl/2015/10/checking-your-stats-and-some-errors-we.html

Nuijten, M. B., Assen, M. van, Hartgerink, C., Epskamp, S., & Wicherts, J. (2017). The validity of the tool "statcheck” in discovering statistical reporting inconsistencies. https://doi.org/10.17605/osf.io/tcxaj

Nuijten, M. B., Borghuis, J., Veldkamp, C. L. S., Dominguez-Alvarez, L., Assen, M. A. L. M. V., & Wicherts, J. M. (2017). Journal data sharing policies and statistical reporting inconsistencies in psychology. Collabra: Psychology, 3(1), 31. https://doi.org/10.1525/collabra.102

Nuijten, M. B., Hartgerink, C. H. J., Assen, M. A. L. M. van, Epskamp, S., & Wicherts, J. M. (2015). The prevalence of statistical reporting errors in psychology (1985 - 2013). Behavior Research Methods, 48(4), 1205–1226. https://doi.org/10.3758/s13428-015-0664-2

Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10). https://doi.org/10.18637/jss.v059.i10