Using R to analyze the redacted Mueller Report

Updated 4/22/2019 @ 11:08 CST

Since the beginning of the year, I have become increasingly active in the #rstats community to learn more about R programming and to share my excitement for computational media linguistics. One of my favorite aspects of this community is that there are tons of people who are equally excited about doing comp ling/NLP/text-as-data work.

When the redacted Mueller Report came out, I was not surprised to see many people get to work analyzing the text however they could. Below, I’ve curated a couple of tweets analyzing the Mueller Report using R. These tweets were selected by the very informal process of digging through my feed, and searching for #rstats tweets about the Mueller Report (top and latest). If you have an analysis of the Mueller Report using R, please let me know and I’ll add it to this collection!

I am somewhat hesitant to share all the tweets in this way… I have often critiqued “string of tweet” posts as Twitter curation pretending to be journalism (also, it is a major reason why IRA tweets ended up in news stories—a point that our UW-Madison Disinformation Research Group makes in our report). But I think this is the best way to show all the awesome analyses done on the Mueller Report using R (at least, those shared on Twitter). So let’s get to it!

Collecting the Data

Garrick Aden-Buie, in fact, has done a lot of great work on the Mueller Report. In a blog post, for example, he talks about using pdftools and Emil Hvitfeldt’s ggpage to “highlight the most-often referenced people in the report” (you can check out his full blog post here).

Because these individuals provided code and scraped material of the Mueller Report, even more R programmers and data scientists were able to do text and linguistic analysis on the data (including me)! In addition to my list, Stas Kolenikov has a great ongoing Twitter thread of text analyses of the Mueller Report.

Based on the tweets, {tidytext} seemed to be the most popular package used (although there are others, including {cleanNLP}, {textnets}). Below are some of these analyses.

Analyzing the Data

My code for this analysis can be found here.

Christopher Yee and Christopher Lortie both have a couple of other great tweets analyzing the Mueller Report, so I encourage you to check their feeds! Yee also has his code available on his blog.

Here’s an analysis by Chris Bail using his textnets package.