This is the third iteration of my Jealousy List, which is a list of articles so good I wish I had been the one to write them. My first two lists were posted about a year ago (see the list of lists here) and this one is long overdue, so I apologize for some of the posts being a little less recent. Regardless, here are a list of posts I’ve found in the past few weeks and months that I found exceptional in some way, entertaining, informative, or just plain cool.
Saskia Freytag. “Workshop: Dimension reduction with R”.
So I wrote a tutorial on dimension reductions in #rstats. It has actually turned out to be fairly comprehensive. It uses a fun example dataset on cereals (🎉 - not looking for at iris) I would love some feedback: https://t.co/VsiHX2KYdT
— Saskia Freytag (@trashystats) August 16, 2019
You may have heard of PCA (Principle Components Analysis) as a way to reduce a bunch of variables down to a more manageable number. As it turns out, this is just one way to do a dimension reduction on your data. Freytag’s workshop does a really nice job at explaining some of the different dimension reduction techniques that are out there, including helpful plots for the visual learners out there. It also goes into detail about the pros and cons of each method, and gives some sample R code showing you how to run the analysis yourself. I wish there were more workshops like there floating around! Plus, it uses data from a bunch of cereals, and I’m pretty sure I’ve used that dataset before in my workshops…
Austin Wehrwein. “Burden of roof: Revisiting housing costs with tidycensus”.
In this episode I use the A+ tidycensus #rstats package to examine housing costs along with income data AND stretch wordplay as far as it will go. Burden of roof: revisiting housing costs with tidycensus. https://t.co/gDVlOb9N9H pic.twitter.com/i3gxzz6C8G
— Austin Wehrwein (@awhstin) August 2, 2019
This is a short blog post, but I really like it because is succinctly shows how to quickly produce a really complelling story with some data and a nice visual. It uses the tidycensus package by Kyle Walker to extract some information about median income and housing prices per US county, and then creates a stunning map to display the data. I also learned that the county I live in now, Clarke County, Georgia, was among the top 25 worst counties in the country in this regard. I guess that’s where all my money is going!
Garrick Aden-Buie. “Custom Discrete Color Scales for ggplot2”.
I wrote up a short blog post on creating custom ggplot2 color scales. I focused on discrete color scales to demo a setup that makes binary colors easy, but I hope the post is helpful if you're working on a #ggplot2 theme for your org or brand. #rstats https://t.co/jQDxE61K3W
— Garrick Aden-Buie (@grrrck) August 16, 2019
I’ve become a bit of a color snob, so I appreciate a good post on colors in data visualization. This one is less about the colors themselves, and more about how to more easily implement your own custom color scheme in ggplot2. Aden-Buie even goes so far as to provide helpful tips for when you compile all these custom commands into an R package. I’ll definitely be using this whenever I get my package off the ground.
Rafael Irizarry. “Dynamite Plots must Die”. From Simply Statistics.
Open letter to journal editors: dynamite plots must die. Dynamite plots, also known as bar and line graphs, hide important information. Editors should require authors to show readers the data and avoid these plots. https://t.co/0GNKEIUCJL pic.twitter.com/OS9ytEFRZN
— Rafael Irizarry (@rafalab) February 22, 2019
Data visualization must have been on my mind for a while now, because five months ago I bookmarked this blog post so that it’d make it on my next Jealousy List. This is a nice critique about Dynamite Plots, or basically bar plots with those little error bars at the top. Basically, they obscure the underlying distribution and can be replaced by a very small table. Fortunately, the complaint comes with a few recommendations for alternative visuals.
Julia Silge. “Introducing Tidylo”.
There will probably always be at least one Julia Silge post on my Jealousy Lists. This one introduces a new R package, tidylo, which calculates weighted log odds using within the framework of the tidyverse. The post itself is, as always, a fun read and there are some great visuals. This’ll make it really easy to choose a baby name characteristic of like the 1920s for my next kid or something.
So that’s it for my long-overdue Jealousy List: statistical procedures, succinct tutorials, color, data visualization, and more statistics. Again, a decent representation of what I’ve been reading recently.