This handout accompanies the workshop given on September 4, 2019 at UGA’s DigiLab in the Main Library. There is some overlap with a blog post I did a couple years ago, but this is the first time this material has been presented in a workshop format. There is also a supplemental handout, which goes into more detail about ggplot2::theme. As always, please visit joeystanley.com/r for the latest materials.


1 Unncessary background story

A few years ago, I found myself developing a look and feel that I liked for my term papers, powerpoint slides, conference posters, and my website (including this handout!). I found a font and color scheme that I thought looked nice, and because it was unique, I thought it might turn into a branding of some sort.

The problem was the visuals I made in ggplot2 didn’t match at all. They didn’t use the same colors or font or anything like that. Even after I made some of the changes I’ve talked about in previous workshops, it was quite obvious that I just copy and pasted the plots right in the middle of my presentation or website and it they didn’t match. A plot for example might have looked like this:

You may find yourself in the same situation, whether it’s for academic output or because you want your plots to match your your business’ colors and general look.

Because ggplot2 has nearly infinite flexibility, you can customize your plot so that it matches whatever powerpoint theme you have, for example. But the 80-20 rule definitely applies here: 20% of my ggplot code was to produce the majority of the plot, and 80% of the code was the all the nitty-gritty changes I made to get it to match my aesthetic.

kelloggs_plot + 
    theme_bw(base_size = 12, base_family = "Iowan Old Style") + 
    theme(panel.background = element_blank(), 
          plot.background = element_rect(fill = "gray99"), 
          legend.background = element_rect(fill = "transparent"), 
          legend.key = element_rect(fill = "transparent"), 
          legend.justification = c(1, 1), 
          legend.position = c(1, 1), 
          plot.title = element_text(hjust = 0.5),
          plot.subtitle = element_text(hjust = 0.5))

Assuming the plot is rendering for you the way it is on my screen, the color and font should now match this handout and create a much more seamless integration. When it’s just one plot, it’s not a big deal, but when I found myself copying and pasting that code over and over, I decided I had to find a solution.

I did some digging and fortunately there is a solution! Create a custom theme in ggplot2! I can wrap all these functions up into a single new function and tag that to the end of my plots. Now, my code is much shorter, and I can feel confident that my plots are consistent every time. Furthermore, if I want to make changes to the plot, I just modify the code and then the changes propogate to all the plots when I rerun it. In other words, that large block of code above would be reduced down to just this:

kelloggs_plot + 
    theme_joey()

I hadn’t seen any other tutorial out there so I decided to write a quick blog post on the topic and it has consistently been my number 1 most visted page on my website ever since. By a long shot. So there seems to be a need for a good explanation for creating a custom theme in ggplot2.

That tutorial is almost three years old and I think it could use a little bit of a facelift. So this workshop serves as a new and improved version of that.

2 Data prep and sample plots

First off, let’s load ggplot2 before we get carried away.

library(ggplot2)

For this workshop, I’m going to use similar plots to the ones I used in previous workshops. I’ve got four different plots because when creating a theme, it’s good to try it out on several kinds of plots to make sure they all integrate well.

2.1 Amount of sugar per McDonald’s category

For some of the workshop, I’ll work with the McDonald’s menu items dataset, which you can access from my wesite. To simplify things a little bit, I’ll take a subset—just four of the nine categories. To make that subset, I’ll use the subset function.

menu <- read.csv("http://joeystanley.com/downloads/menu.csv")
menu_subset <- subset(menu, Category %in% c("Smoothies & Shakes", "Desserts", "Beverages", "Snacks & Sides"))

The default plot will just be the distribution of the number of sugars in each of the four remaining categories. For the color, I’ll use Paul Tol’s themes, which I access using the package ggthemes.

m <- ggplot(menu_subset, aes(Category, Sugars, fill = Category)) +
    geom_boxplot(size = 0.75) +
    geom_jitter(color = "gray15") + 
    ggthemes::scale_fill_ptol()

So now, to plot it, all I need to do is call m.

m

Conveniently for us, if I want to make changes to the plot, I can just add additional lines of ggplot2 code to the p and it’ll work out like normal. In other words…

m + ggtitle("A Default Plot")

…is shorthand for…

ggplot(menu_subset, aes(Category, Sugars, fill = Category)) +
    geom_boxplot(size = 0.75) +
    geom_jitter(color = "gray15") + 
    ggthemes::scale_fill_ptol() + 
    ggtitle("A Default Plot")

Both will produce this plot:

2.2 Stranger Things ratings

Another dataset I’ll use is the Stranger Things dataset that was used in a previous workshop. Like before, I’ll change the season column to a factor too. For the default plot here, s, I’ll make the same scatterplot of the number of votes the episode got on IMDB by the average rating. I’ll make the dots bigger so they’re easier to see. And for fun, I’ll use a Wes Anderson theme inspired by the movie Fantastic Mr. Fox.

stranger <- read.csv("http://joeystanley.com/data/stranger.csv")
stranger$season = factor(stranger$season)
s <- ggplot(stranger, aes(votes, rating, color = season)) + 
    geom_point(size = 4) + 
    scale_color_manual(values = wesanderson::wes_palette("FantasticFox1")) + 
    labs(title = "Stranger Things episodes",
         subtitle = "Average rating by number of votes on IMDB")
s

2.3 Girlnames

The last is a dataset of the top 25 most common baby girl names in the US in 2017. I’ll read it in and prep it as I did in the intro workshop, except I’ll reverse the order of the girlnames and will make a horizontal bar chart. I’ll also only keep the top 15 just so there aren’t so many bars and color it using Color Brewer.

girlnames <- read.delim("http://joeystanley.com/data/girlnames.txt", sep = "\t")
girlnames <- girlnames[1:15,]
girlnames$name = factor(girlnames$name, levels = rev(girlnames$name))
g <- ggplot(girlnames, aes(name, n, fill = n)) + 
    geom_bar(stat = "identity") + 
    scale_fill_distiller(type = "seq", palette = "Purples") + 
    coord_flip() + 
    labs(title = "Top 15 baby girl names in the US in 2017",
         caption = "Data Source: Social Security data via the babynames package")
g

2.4 Cereal nutritional facts

Finally, the last dataset tht we’ll use—which was also made available through Kaggle.com—contains nutritional information from about 80 different kinds of cereal. We’ll load it in, make a few changes as before, and make a faceted plot showing the amount of fiber per cereal and its rating, split up by brand.

cereal <- read.csv("http://joeystanley.com/data/cereal.csv")
cereal$mfr <- forcats::fct_recode(cereal$mfr,
                                  "Kellogg's" = "K",
                                  "General Mills" = "G",
                                  "Post" = "P",
                                  "Quaker Oats" = "Q",
                                  "Ralston Purina" = "R",
                                  "Nabisco" = "N")
c <- ggplot(cereal, aes(fiber, rating)) + 
    geom_text(aes(label = name),
              check_overlap = TRUE, vjust = "inward", hjust = "inward") + 
    facet_wrap(~mfr)
c

Note that within facet_wrap, I’ve added a couple extra arguments (check_overlap, vjust, and hjust). I recently learned about these from Hadley Wichkham’s book-in-progress, ggplot2 (version 3) which you can view here. They ensure that labels don’t overlap (removing some if necessary), and then make sure they don’t spill off over the edges of the plot. Very handy.

3 Pre-existing ggplot2 themes

Before we get started on creating custom themes, it’s important to understand a little bit about the built-in themes that ggplot2 provides for you.

3.1 theme_gray()

By default, ggplot2 will display using a theme that is generally pretty good. The creators have taken some liberties to decide a few things, some founded upon principles of good data visualization and others that are more opinionated. In Hadley Wickham’s words:

The theme is designed to put the data forward while supporting comparisons, following the advice of (Tufte 2006; Brewer 1994; Carr 2002, 1994; Carr and Sun 1999). We can still see the gridlines to aid in the judgement of position (Cleveland 1993a), but they have little visual impact and we can easily ‘tune’ them out. The grey background gives the plot a similar typographic colour to the text, ensuring that the graphics fit in with the flow of a document without jumping out with a bright white background. Finally, the grey background creates a continuous field of colour which ensures that the plot is perceived as a single visual entity.

Using the defaut theme will produce beautiful plots. Here’s what that would look like:

m
s
g
c

3.2 Other themes in ggplot2

Fortunately, if you don’t like the look of the gray background, you can easily switch to one of the other seven these that ggplot2 has bult in.

My go-to theme is called theme_bw() for just “black and white.” The biggest difference is that now the background is white instead of gray. But it also adds a thin black border around the whole thing and has faint grey grid lines instead of white ones.

m + theme_bw()
g + theme_bw()
s + theme_bw()
c + theme_bw()

The linedraw theme uses only pure black lines, rather than any gray ones, which makes it kind of look a little vintage. In facets is where I see the biggest change: the boxes at the top (called strips) are black.

m + theme_linedraw()
g + theme_linedraw()
s + theme_linedraw()
c + theme_linedraw()

The light theme is very similar to bw. The biggest difference is the outside box is lighter. In facets, the strips are darker and the text is colored white.

m + theme_light()
g + theme_light()
s + theme_light()
c + theme_light()

The classic theme has no grid and the top and right parts of the box are gone too, just leaving the x and y axes. The facet strips are empty white boxes now.

m + theme_classic()
g + theme_classic()
s + theme_classic()
c + theme_classic()

If you really like the gray, you can go for the dark theme, which has a darker background and a darker gray for the grid lines. In the girl names plot, you can see that the lighter colors pop out more effectively, but the middle ones blend in with the background a little bit, which is something to keep in mind when you add color to your plots.

m + theme_dark()
g + theme_dark()
s + theme_dark()
c + theme_dark()

The minimal theme is even simpler than light. It doens’t have the outside border but it does retain the inside grid. The strips are gone entirely, leaving just the text behind, which is a little confusing with this plot.

m + theme_minimal()
g + theme_minimal()
s + theme_minimal()
c + theme_minimal()

Finally, you can go completely blank with void. This may seem a little weird, but it does have useful purposes, like when creating maps.

m + theme_void()
g + theme_void()
s + theme_void()
c + theme_void()