Jealousy List 3

This is the third iteration of my Jealousy List, which is a list of articles so good I wish I had been the one to write them. My first two lists were posted about a year ago (see the list of lists here) and this one is long overdue, so I apologize for some of the posts being a little less recent. Regardless, here are a list of posts I’ve found in the past few weeks and months that I found exceptional in some way, entertaining, informative, or just plain cool.

  1. Saskia Freytag. “Workshop: Dimension reduction with R”.

    You may have heard of PCA (Principle Components Analysis) as a way to reduce a bunch of variabled down to a more manageable number. As it turns out, this is just one way to do a dimension reduction on your data. Freytag’s workshop does a really nice job at explaining some of the different dimention reduction techniques that are out there, including helpful plots for the visual learners out there. It also goes into detail about the pros and cons of each method, and gives some sample R code showing you how to run the analysis yourself. I wish there were more workshops like there floating around! Plus, it uses data from a bunch of cereals, and I’m pretty sure I’ve used that dataset before in my workshops…


  2. Austin Wehrwein. “Burden of roof: Revisiting housing costs with tidycensus”.

    This is a short blog post, but I really like it because is succinctly shows how to quickly produce a really complelling story with some data and a nice visual. It uses the tidycensus package by Kyle Walker to extract some information about median income and housing prices per US county, and then creates a stunning map to display the data. I also learned that the county I live in now, Clarke County, Georgia, was among the top 25 worst counties in the country in this regard. I guess that’s where all my money is going!


  3. Garrick Aden-Buie. “Custom Discrete Color Scales for ggplot2”.

    I’ve become a bit of a color snob, so I appreciate a good post on colors in data visualization. This one is less about the colors themselves, and more about how to more easily implement your own custom color scheme in ggplot2. Aden-Buie even goes so far as to provide helpful tips for when you compile all these custom commands into an R package. I’ll definitely be using this whenever I get my package off the ground.


  4. Rafael Irizarry. “Dynamite Plots must Die”. From Simply Statistics.

    Data visualization must have been on my mind for a while now, because five months ago I bookmarked this blog post so that it’d make it on my next Jealousy List. This is a nice critique about Dynamite Plots, or basically bar plots with those little error bars at the top. Basically, they obscure the underlying distribution and can be replaced by a very small table. Fortunately, the complaint comes with a few recommendations for alternative visuals.


  5. Julia Silge. “Introducing Tidylo”.

    There will probably always be at least one Julia Silge post on my Jealousy Lists. This one introduces a new R package, tidylo, which calculates weighted log odds using within the framework of the tidyverse. The post itself is, as always, a fun read and there are some great visuals. This’ll make it really easy to choose a baby name characteristic of like the 1920s for my next kid or something.


So that’s it for my long-overdue Jealousy List: statistical procedures, succint tutorials, color, data visualization, and more statistics. Again, a decent representation of what I’ve been reading recently.