1/26/2018

In the last several years we have seen many improvements to how we do science. We now:

- Replicate more
- Pre-register more
- Share more

But I think we can do better.

Although we share more, we mostly share our **data** and the **scripts** needed to analyze that data.

What we do not share, is the **output** of our analyses. These can only be found in our papers.

Resulting in what I call PDF prisons.

Statistical output is important because:

- Hypothesis testing (duh)
- Effect size estimation
- e.g., practical application, power analyses

- Assessing evidential value
- e.g., p-curve analysis

In short, it's **super** important.

We see the Results section not just as a collection of statistics, but also a story, aimed at a reader with limited time and motivation.

- Creating a conflict between being comprehensive (reporting everything) and providing a streamlined story

And we do not have a good workflow solution that allows us to easily organize the output of statistical analyses and communicate this.

- Meaning we have to manually copy over statistical results into or out of a manuscript

I think these observations are responsible for two problems in statistics reporting:

- Not all statistics are reported
- Not all statistics are
*correctedly*reported

Summarizing statistics

"We found no other main or interaction effects (

all). The model examining assessments of men’s warmth revealed no main or interaction effects (Fs < 0.66, allps > .42all)."Fs < 0.3, allps > .58

Arbitrary cut-off points

"In contrast, participants’ support for group-based dominance did not differ between the low (

M= 2.21,SD= 1.23) and moderate (M= 2.20,SD= 1.24) social mobility conditions,F(1, 193) = 0.001,,p> .250d= .01."

Regression tables without exact *p*-values, confidence intervals, etc.

Inaccuracies in statistics reporting are prevalent (Nuijten et al., 2017).

In psychology, about **half of all articles** contain at least one inconsistent result in which the reported *p*-value does not correspond with the accompanying test statistic and degrees of freedom (Nuijten, Hartgerink, Assen, Epskamp, & Wicherts, 2015).

Probable reasons for these errors are (Lakens, 2015):

- Typos
- Copy-paste errors
- Incorrect rounding
- Incorrect use of '<' instead of '='
- (Not mentioning the test was one-sided)

`statcheck`

was created to detect inconsistencies in statistics reporting, and can also be used to correct mistakes.

Simply go to http://www.statcheck.io and upload your manuscript. `statcheck`

will then extract a portion of the statistics and check whether they are consistent.

However, statcheck cannot prevent all mistakes

- it cannot extract all statistics (e.g., from tables)
- extracts about 61% of the statistics

- it is good (really good!), but not perfect

"We stress that statcheck is an algorithm that will, like any automated procedure exposed to real-world data, sometimes lead to false positives or false negatives. These limitations should be taken into account, preferably by manually double-checking inconsistencies detected by statcheck." (Nuijten, Assen, Hartgerink, Epskamp, & Wicherts, 2017)

I think it is silly to have to go through the effort of *text extraction* to get at the output of statistical analyses.

Whether it is to check for mistakes or for extracting the statistics needed for your research.

Add an extra step in the data analysis workflow:

*Before*writing up the results, first organize the output of the statistical analyses in a separate text file- Storing statistics outside of the manuscript frees up the manuscript to be more streamlined

- Then
*use*this text file to write up the results- An organized output file can be used to automate statistics reporting

- Data preparation
- Reading in data
- Preparing data for data analysis
- Saving prepared data

- Data analysis
- Test pre-registered hypotheses
- Exploration

**Create statistics output file**- Write up results

There are two statistics-output related problems:

- Not all statistics are reported
- Not all statistics are
*correctedly*reported

I believe we should work on structuring the output of statistical analyses in such a manner that we can share *all* output, outside of our manuscript, and *use* it to, without error, write up the results in our manuscript.

How to combine the output of different analyses in a structured, sensible manner?

**Solution**: Make it tidy (Wickham, 2014)

Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.

Each row is a statistic

Each column is a useful variable (e.g., the kind of statistic)

Each table is a statistical analysis

**Example**:

A tidy data structure enables us to combine the output of different kinds of output, into a single file.

So, what we need is way to grab the output of a statistical analysis, make it tidy, and combine it with other analyses.

…this is what **tidystats** does.

*tidystats* is an R package to easily create a text file containing the output of statistical models.

This enables researchers to:

- Easily share
*all*statistics - Easily report statistics

The idea is that you start with an empty list, and whenever you conduct an analysis that you want to share the results of, you add it to the list.

At the end of data analysis, you take this list and simply save it as a text file.

This text file is a comma separated data file, which can be shared and read by anyone.

# Install from CRAN install.packages("tidystats") # Install latest version library(devtools) install_github("willemsleegers/tidystats")

Start by loading the package, followed by creating an empty list to store results into.

# Load package library(tidystats) # Create empty list results <- list()

Let's analyse Student's sleep data.

t.test(extra ~ group, data = sleep, var.equal = TRUE)

## ## Two Sample t-test ## ## data: extra by group ## t = -1.8608, df = 18, p-value = 0.07919 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -3.363874 0.203874 ## sample estimates: ## mean in group 1 mean in group 2 ## 0.75 2.33

# Run the t-test and store the results in a variable model1 <- t.test(extra ~ group, data = sleep, var.equal = TRUE) # Add the output to the list results <- add_stats(model1, results, identifier = "sleep_t_test")

# Correlation results <- cor.test(sleep$extra, as.numeric(sleep$group)) %>% add_stats(results, identifier = "sleep_correlation") # ANOVA results <- aov(extra ~ group, data = sleep) %>% add_stats(results, identifier = "sleep_anova") # Regression results <- lm(extra ~ group, data = sleep) %>% add_stats(results, identifier = "sleep_regression")

I use the piping symbol (%>%) to directly put the output in `add_stats()`

.

write_stats(results, "data/results.csv")

Start by reading in the text file containing the results. This creates a list (like it was when you created it).

Optionally, you can set this list as the default tidystats list in `options()`

so that you do not need to

# Read in the results results <- read_stats("data/results.csv") # Optionally: Set the list as the default list in options options(tidystats_list = results)

Because all statistics are structered in a known manner, it is very easy to write functions that return APA-styled output.

In some cases, this means you only need to provide the identifier in order to produce an APA line of output.

For example, `report("sleep_t_test")`

results in: *t*(18) = -1.86, *p* = .079

We can combine this with R Markdown to write a results section.

For example:

We found a significant positive effect of drugs on hours of sleep, `report("sleep_t_test", results = results)`.

Result:

We found a significant positive effect of drugs on hours of sleep,

t(18) = -1.86,p= .079.

Bastian and I applied `tidystats`

to our Airbnb paper.

Illustrates a real-life application of `tidystats`

, and additional features such as: - Adding notes to document the analysis - Indicating whether the analysis was confirmatory or exploratory

Results (including code)

We did not find an effect of host trustworthiness, `report("M1", term = "trustworthiness_median_z")`, thereby failing to replicate the findings by Ert et al. (2016). Additionally, and also inconsistent with the findings of Ert et al. (2016), we did find a significant effect of host attractiveness, `report("M2", term = "attractiveness_median_z")`

Results (without code)

We did not find an effect of host trustworthiness,

b= -0.0036,SE= 0.0047,t(1010) = -0.76,p= .45, 95% CI [-0.013, 0.0057], thereby failing to replicate the findings by Ert et al. (2016).

Additionally, and also inconsistent with the findings of Ert et al. (2016), we did find a significant effect of host attractiveness,b= 0.011,SE= 0.0048,t(1010) = 2.34,p= .020, 95% CI [0.0018, 0.021]

- R-only
- A bit of a learning curve
- Does not yet support most analyses

I think we should have an explicit step in our data analysis workflow that is focused on the output of statistical analyses.

With the aim of sharing **all** statistical output.

And preventing errors in statistical reporting.

Lakens, D. (2015). Checking your stats, and some errors we make. Retrieved from daniellakens.blogspot.nl/2015/10/checking-your-stats-and-some-errors-we.html

Nuijten, M. B., Assen, M. van, Hartgerink, C., Epskamp, S., & Wicherts, J. (2017). The validity of the tool "statcheck” in discovering statistical reporting inconsistencies. https://doi.org/10.17605/osf.io/tcxaj

Nuijten, M. B., Borghuis, J., Veldkamp, C. L. S., Dominguez-Alvarez, L., Assen, M. A. L. M. V., & Wicherts, J. M. (2017). Journal data sharing policies and statistical reporting inconsistencies in psychology. *Collabra: Psychology*, *3*(1), 31. https://doi.org/10.1525/collabra.102

Nuijten, M. B., Hartgerink, C. H. J., Assen, M. A. L. M. van, Epskamp, S., & Wicherts, J. M. (2015). The prevalence of statistical reporting errors in psychology (1985 - 2013). *Behavior Research Methods*, *48*(4), 1205–1226. https://doi.org/10.3758/s13428-015-0664-2

Wickham, H. (2014). Tidy data. *Journal of Statistical Software*, *59*(10). https://doi.org/10.18637/jss.v059.i10