Just one of those projects we do to try out R-code, visualization techniques (i.e. ggplot2 code, mostly), and efficient collaborative reporting via RMarkdown.

But then Lukas couldn’t sleep and instead decided to work on the project a little, creating a particular plot he thought was interesting. He posted the plot on /r/dataisbeautiful, a subreddit for data visualizations, and apparently people were more interested than anticipated.

The issue with this plot going slightly viral is that many people didn’t bother to check the full analysis over at the project page, which lead to misinterpretations of what a histogram is, false assumptions about the structure of our data and, of course, people being really defensive about their junk. Well who’da thunk.

Bottom line is: A single plot can not represent everything about an analysis.

Read the stuff around the plot.

In that regard, we learned that posting individual plots on reddit, assuming people would actually bother to read the linked analysis, was naive. Nobody reads things on the internet. That would be silly.

At this point we’d like to mention that at least someone from *I Fucking Love Science* actually bothered to send us a few questions instead of writing a misinterpreting post about the analysis. And here it is.

Well, so much for that. At least we now have had our 15 minutes of fame.

It was fun.

And it felt slightly wrong.

- Indy100, from The Independent
- This Latvian site, apparently glad about the data for their country, if Google Translate can be believed.
- The Greek Huffington Post, which so far did the best job describing what we did, as far as we can tell.
- breakingnews.ie, which also got the gist of it.
- metro.co.uk, which didn’t believe we’re not obsessed about penises

Teil der jüngsten Initiative zur Evaluation der allgemeinen Apokalypsetauglichkeit der lokalen Bevölkerung umfasst eine umfangreiche Erhebung zombieapokalypserelevanter Kompetenzen.

Seit gestern, dem 17. Oktober, ist eine erste Pilotstudie zum Thema live auf unserer neuen Studienplatform, `survey.tadaa-data.de`

.

Hier findest Du auch den Link zur aktuellen Studie, oder, weil die meisten sowieso nur diesen kurzen Text überfliegen und den großen bunten Knopf suchen, hier noch einmal der Direktlink in groß:

Die Ergebnisse der Pilotstudie werden nach Studienabschluss selbstverständlich in Blogpostform publiziert.

]]>The goal is simple: There are certain things we tend to always do one after another, like performing effect size calculations after a t-Test.

The convenience tadaatoolbox aims to provide is exactly this: Do the usual stuff and leave me alone.

As an example, take one of the first functions I wrote for the package, `tadaa_t.test`

:

```
tadaa_t.test(data = ngo, response = stunzahl, group = geschl)
```

Männlich | Weiblich | t | p | df | conf_low | conf_high | method | alternative | d | power |
---|---|---|---|---|---|---|---|---|---|---|

33.616 | 33.664 | -0.108 | 0.91 | 248 | -0.92 | 0.824 | Two Sample t-test | two.sided | 0.014 | 0.051 |

What happened here?

Let’s take a look step by step:

- We took the values you provided: A dataset (the infamous
`ngo`

data), the*response*or dependent variable and the*group*or independent variable - We performed a regular ol’ t-Test via the common R function
`t.test`

- We calculated the effect size using an internal function that’s also available in the package, see
`?effect_size_t`

- We calculated the power of the test via the
`pwr`

package - We tidied it up a bit using the
`pixiedust`

package (no, seriously) to make everything a little nicer - And finally, we returned a neat table to the console.

Notable bonus features:

- Remember how we didn’t bother to check for heteroskedasticity / homogenity of variance? That’s because the function does that under the hood and uses the appropriate setting for
`var.equal`

. MIND = BLOWN - The print method is customizable, and if you use the function in an RMarkdown document, you can specify
`print = "markdown"`

to return a markdown table so knitr can render it to a neat table, just like in this blogpost - The power calculation notices which type of t-Test is called and calculates power for the specific test
- The effect size is also aware of the test type, and calculated via the bonus feature function
`effect_size_t`

Pretty neat, hm? Yeah.

Next up in the convenience department we have our old friend, the *ANOVA*.

We’re not digging too deep into the post-hoc area as we did with the t-Test, and we also don’t bother testing for the prerequisites, but we do at least give you effect sizes.

```
tadaa_aov(stunzahl ~ geschl, data = ngo)
```

term | df | sumsq | meansq | F | p.value | part.eta.sq | cohens.f |
---|---|---|---|---|---|---|---|

geschl | 1 | 0.144 | 0.144 | 0.012 | 0.91 | 0 | 0.007 |

Residuals | 248 | 3037.456 | 12.248 | NA | NA | NA | NA |

Or for two predictors:

```
tadaa_aov(stunzahl ~ geschl * jahrgang, data = ngo)
```

term | df | sumsq | meansq | F | p.value | part.eta.sq | cohens.f |
---|---|---|---|---|---|---|---|

geschl | 1 | 0.144 | 0.144 | 0.015 | 0.9 | 0 | 0.008 |

jahrgang | 2 | 536.28 | 268.14 | 27.203 | < 0.001 | 0.182 | 0.472 |

geschl:jahrgang | 2 | 96.056 | 48.028 | 4.872 | < 0.01 | 0.038 | 0.2 |

Residuals | 244 | 2405.12 | 9.857 | NA | NA | NA | NA |

Notice how we give you both the partial eta^2 and Cohen’s f. The latter is used for power calculations in software like *G*Power* as well as the `pwr`

package in R, while the former is generally used as an interpretable effect size, at least according to my stats class.

And lastly, we give you a simple template to create interaction plots with `tadaa_int`

.

Building your own interaction plots with `ggplot2`

is kind of annoying, since you have to group/summarize your data beforehand and then write two relatively complex ggplots. `tadaa_int`

does the work for you, and if you choose `grid = FALSE`

, it returns a list of two `ggplot2`

objects which you can save and modify as you wish with custom `scale_*`

or `theme`

components. If you choose `grid = TRUE`

, the plots are arranged horizontally and printed as one image, which should probably be sufficient for most use case, especially in interactive use for explorative purposes.

```
tadaa_int(data = ngo, response = stunzahl, group1 = jahrgang, group2 = geschl, grid = TRUE)
```

I’m considering exposing more arguments to the user, e.g. the arrangement (horizontal vs. vertical), or the `shape`

of the `geom_point`

used for the `response`

means, but if you’re into that much customization, you’re probably more than comfortable with building the plot yourself anyway.

An additional plotting bonus is Tobi’s `tadaa_heatmap`

, a simple template for heatmaps:

```
tadaa_heatmap(ngo, stunzahl, leistung, jahrgang)
```

In the “minor conveniences” department, we have a bunch of wrappers for common statistics. The statistics themselves are usually calculated by base R or the packages `vcd`

or `ryouready`

, but they’re tweaked so they’re comfortable for use with `dplyr`

and other tidy data functions in that they only ever return a single (usually numeric) value, which makes it easy to use them in `summarize`

or `mutate`

.

The functions are listed below:

`modus`

: A simple function to extract the mode of a frequency table.- This is
*will return a character string*denoting multiple values, if applicable! `nom_chisqu`

: Simple wrapper for`chisq.test`

that produces a single value.`nom_phi`

: Simple wrapper for`vcd::assocstats`

to extract phi.`nom_v`

: Simple wrapper for`vcd::assocstats`

to extract Cramer’s V.`nom_c`

: Simple wrapper for`vcd::assocstats`

to extract the contingency coefficient c.`nom_lambda`

: Simple wrapper for`ryouready::nom.lambda`

to extract appropriate lambda.`ord_gamma`

: Simple wrapper for`ryouready::ord.gamma`

.`ord_somers_d`

: Simple wrapper for`ryouready::ord.somers.d`

.

A side effect of having written all these wrappers is that we can now also provide easy functions to calculate all the stats relevant for a specific scale (nominal & ordinal):

```
tadaa_nom(ngo$abschalt, ngo$geschl)
```

Chi^2 | Cramer’s V | c | Lambda (x dep.) | Lambda (y dep.) | Lambda (sym.) |
---|---|---|---|---|---|

5.35 | 0.15 | 0.15 | 0.03 | 0.15 | 0.09 |

```
tadaa_ord(ngo$abschalt, ngo$geschl)
```

Gamma | Somers’ D (x dep.) | Somers’ D (y dep.) | Somers’ D (sym.) |
---|---|---|---|

-0.29 | -0.15 | -0.15 | -0.15 |

Like previous `tadaa_*`

-functions, these take a `print`

argument so you can easily include them in RMarkdown documents by setting `print = "markdown"`

.

Please note that I’m aware it’s suboptimal to just calculate all the stats, presumably to pick and choose which fits your needs best, but keep in mind that the intention of this package is to make teaching easier and provide convenient tools to communicate stats, so yes, if you’re currently working on a *real science* thing, this is all just fun and games.

And at last, there’s a couple little functions I wrote primarily because I found myself writing the same few lines multiple times and thought “there should be a easier way to do this”… which is, coincidentally, pretty much the story behind everything in this package. Well.

`generate_recodes`

: To produce recode assignments for`car::recode`

for evenly sequenced clusters.`interval_labels`

: To produce labels for clusters created by`cut`

.`tadaa_likertize`

: Reduce a range of values to`n`

classes (methodologically wonky).`delet_na`

: Customizable way to drop`NA`

observations from a dataset.`labels_to_factor`

: If you mix and match`sjPlot`

,`haven`

and`ggplot2`

, you might need to translate`labels`

to`factors`

, which is precisely what this functions does. Drop in`data.frame`

with`label`

, receive`data.frame`

with`factors`

.`drop_labels`

: If you subset a`labelled`

dataset, you might end up with labels that have no values with them. This function will drop the now unused`labels`

.`pval_string`

: Shamalessly adapted from`pixiedust::pvalString`

, this will format a p-value as a character string in common`p < 0.001`

notation and so on. The difference from the`pixiedust`

version is that this function will also print`p < 0.05`

.

Also, since I really like the `rmdformats::readthedown`

RMarkdown template, I made a few tweaks to a `ggplot2`

theme to match the template, you can use it by adding `+ theme_readthedown()`

to your ggplots.

It’s a little brighter and let’s you choose which axis (x, y, both) to emphasize visually.

```
tadaa_int(ngo, stunzahl, jahrgang, geschl, grid = F)[[1]] +
theme_readthedown(axis_emph = "y")
```

For everything I missed, there’s our vignette.

This is it. The upcoming version (`0.10`

) is going to be ready for CRAN soon, while `0.9`

is already available.

Try it and submit issues and feature requests as much as you want.

The next neat feature is probably going to be a `tadaa_normtest`

function that gives you an easy way to perform tests for normality over subgroups.

¯\_(ツ)_/¯

As of last night, v0.10.0 is live on CRAN, and it brought the promised `tadaa_normtest`

with options for our favorite tests for normality: Anderson-Darling, Shapiro-Wilk, Pearson’s χ² and even that Kolmogorov-Smirnov one you shouldn’t really use.

See the full release notes on GitHub.