This handout is a supplement to the Data Visualization with ggplot2 series of workshops given August–September 2019 at UGA’s DigiLab in the Main Library. Due to time constraints, the topics discussed here did not make it into the workshops, so I’ve moved them to this handout. As always, please visit joeystanley.com/r for the latest materials.
The previous workshop was called “Customizing your plots to make that perfect visual”, and after reviewing the material, it occured to me that you may not have all the tools you need to make that perfect visual. So, I wanted to include this supplement and show a few more ggplot functions. These are little things that can really help you make your plot your own.
Fair warning, this document likely won’t be as proofread as some of my other ones. The topics don’t really go together to form a cohesive theme for this workshop, other than being a bunch of miscelleneous ggplot2 topics. And this document will likely update as I learn about new things and want to throw them in somewhere. Just a hodge-podge of topics that I could potentially turn into workshops in the future.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
In this workshop, I’ll jump around from the various datasets used in previous workshops, so it’ll be a good idea to download them if you’d like to follow along.
Also, since the purpose of this workshop is to highlight lots of different nitty-gritty things, I’m going to start with some base plots—all based on ones you saw in previous workshops. As a way of doing shorthand, I’ll save the entire plots into an R objects, like p
or s
. That way, I can call those objects and add to them and it’ll be as if I typed those several lines of code.
For some of the workshop, I’ll work with the McDonald’s menu items dataset, which you can access from my wesite. To simplify things a little bit, I’ll take a subset—just four of the nine categories. To make that subset, I’ll use the subset
function.
menu <- read.csv("http://joeystanley.com/downloads/menu.csv")
menu_subset <- subset(menu, Category %in% c("Smoothies & Shakes", "Desserts", "Beverages", "Snacks & Sides"))
The default plot will just be the distribution of the number of sugars in each of the four remaining categories. For the color, I’ll use Paul Tol’s themes, which I access using the package ggthemes
.
p <- ggplot(menu_subset, aes(Category, Sugars, fill = Category)) +
geom_boxplot(size = 0.75) +
geom_jitter(color = "gray15") +
ggthemes::scale_fill_ptol()
So now, to plot it, all I need to do is call p
.
p
Conveniently for us, if I want to make changes to the plot, I can just add additional lines off ggplot2 code to the p
and it’ll work out like normal. In other words…
p + ggtitle("A Default Plot")
…is shorthand for…
ggplot(menu_subset, aes(Category, Sugars, fill = Category)) +
geom_boxplot(size = 0.75) +
geom_jitter(color = "gray15") +
ggthemes::scale_fill_ptol() +
ggtitle("A Default Plot")
Both will produce this plot:
Another dataset I’ll use is the Stranger Things dataset that was used in a previous workshop. Like before, I’ll change the season
column to a factor too.
stranger <- read.csv("http://joeystanley.com/data/stranger.csv")
stranger$season = factor(stranger$season)
summary(stranger)
## title season episode rating
## Dig Dug : 1 1:8 Min. :1.00 Min. :6.1
## E Pluribus Unum : 1 2:9 1st Qu.:3.00 1st Qu.:8.5
## Holly, Jolly : 1 3:8 Median :5.00 Median :8.8
## MADMAX : 1 Mean :4.68 Mean :8.7
## Suzie, Do You Copy?: 1 3rd Qu.:7.00 3rd Qu.:9.0
## The Bathtub : 1 Max. :9.00 Max. :9.4
## (Other) :19
## votes minutes
## Min. :10309 Min. :41.00
## 1st Qu.:11693 1st Qu.:48.00
## Median :13148 Median :51.00
## Mean :13485 Mean :52.08
## 3rd Qu.:14909 3rd Qu.:55.00
## Max. :19185 Max. :77.00
##
For the default plot here, s
, I’ll make the same scatterplot of the number of votes the episode got on IMDB by the average rating. I’ll make the dots bigger so they’re easier to see. And for fun, I’ll use a Wes Anderson theme inspired by the movie Fantastic Mr. Fox.
s <- ggplot(stranger, aes(votes, rating, color = season)) +
geom_point(size = 4) +
scale_color_manual(values = wesanderson::wes_palette("FantasticFox1"))
s
The last is a dataset of the top 25 most common baby girl names in the US in 2017. I’ll read it in and prep it as I did in the intro workshop.
girlnames <- read.delim("http://joeystanley.com/data/girlnames.txt", sep = "\t")
girlnames$name = factor(girlnames$name, levels = girlnames$name)
And I’ll use the same plot we created as well.
g <- ggplot(girlnames, aes(name, n)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
g
So now, let’s see a whole bunch of ways we can modify these plots!
In the main workshop, I showed that you can change the title and axes of your plot with ggtitle
, xlab
and ylab.
p + ggtitle("Sugars per serving") +
xlab("Category") +
ylab("Sugars (g)")
As it turns out, the ggtitle
, xlab
, and ylab
are all actually shortcut functions. We can do the same thing by putting this same information inside of a function called labs
, with x
, y
, and title
being arguments to the function.
p + labs(title = "Sugars per serving",
x = "Category",
y = "Sugars (g)")
I personally like this approach better because it visually keeps all the labels together in my block of ggplot code But, the real benefit is that there are additional things we can add to the plot with the labs
function, like subtitles, captions, and tags.
p + labs(title = "Sugars per serving",
subtitle = "Smoothies have a lot of sugar",
x = "Category",
y = "Sugars (g)",
caption = "Data from Kaggle.com",
tag = "A")
Let’s go through each of these briefly:
The subtitle adds text below the title in a smaller font. I always have a hard time coming up with a decent subtitle, but the option is there if you want to give a longer description of the plot or maybe describe the point you’re trying to make.
The caption is a nice touch. I often use it to acknolwedge a data source. Alternatively, for some of the images I put on my blog, I use caption
to put the name of my blog to more easily give yourself credit if the image gets shared.
For the tag, this is most useful when you know you’re going to have multiple plots in a single figure. You can add numbers to the top left corner to allow for things like “Figure 1a” or something. I used this pretty effectively in my dissertation when combining lots of plots into a single figure.
In addition to these extra elements, the labs
function can actually be used to rename elements in your legend. For example, the default plot here p
, has a legend with the title Category
. If I wanted to, I could change that to something else in labs
.
p + labs(fill = "Type of Food")
Yes, we can change the name of the legend with scale_fill_manual
, but if we don’t need to bother with anything else (like order, names, or manual colors), then it’s probably easier (and more legible) to just change the name in labs
. Pretty slick.
You’ll notice in all the plots we’ve done so far, we’ve hardly mentioned the actual numbers along the x
axis. There are times where we might want to make some changes. Take the Stranger Things plot, for example:
s
You might want to include some gradience in the axes, like including the ratings 7.5, and 8.5. We can do that with scale_y_continuous
.
s + scale_y_continuous(breaks = c(6, 6.5, 7, 7.5, 8, 8.5, 9))
So here, notice that the major grid lines (the thicker of the faint white lines) adjust so that there’s one for each of the ticks you jus specified. Furthermore, the minor gridlines continue to be midway between the major ones so there are more of them.
Since we’re manually adding these, we can set whatever labels we like, even if they don’t make a lot of sense:
s + scale_y_continuous(breaks = c(6, 6.1, 6.7, 7.2, 7.6, 8.1, 8.539, 9.2))
But usually you’ll want sequential ticks. In fact, a more elegant solution would be to use the seq
command to generate those tick marks. So for example, we can create a sequential list of numbers going from 6 to 9, incrementing by 0.5 each time.
seq(6, 9, 0.5)
## [1] 6.0 6.5 7.0 7.5 8.0 8.5 9.0
And we can incorporate that into the code:
s + scale_y_continuous(breaks = seq(6, 9, 0.5))
In fact, for safety, I often create the sequence much longer for good measure. Any numbers that are outside of the plotting area are quietly ingored, so it doens’t hurt to add some extra numbers to the ends to be safe. So in this case, I’d go from 0 to 10 just to cover the full range of possible values:
s + scale_y_continuous(breaks = seq(0, 10, 0.5))
Note that by doing so, 9.5 is now on the plot, whereas in the previous one it was inadvertently left off.
And yes, there is an analogous scale_x_continuous
and you can add both to the plot:
s + scale_y_continuous(breaks = seq(0, 10, 0.5)) +
scale_x_continuous(breaks = seq(0, 20000, 1000))
Is there a way to add commas to those numbers along the x-axis? Yes! With a little help from the scales
package, which has a bunch of convenient functions for reformatting numbers. Here, we’ll add the labels = comma
argument (comma
being the function from scales
), and it adds the separator for us just as we expected.
library(scales)
s + scale_y_continuous(breaks = seq(0, 10, 0.5)) +
scale_x_continuous(breaks = seq(0, 20000, 1000),
labels = comma)
So if you’re unhappy with the default appearance of your tick marks, don’t worry because you can make whatever changes you want to them in ggplot2. For more information on all sorts of additional things (changing the color/ rotation/size of axes, reversing/log/transforming the scales, etc.), go to the Axes page in the Cookbook for R.
So we’ve seen how to modify the ticks, which in turn control the placement of the grid that underlies the entire plot. The key is the theme
function. This function allows you to basically modify anything else that this workshop doesn’t cover. In fact, we’ll spend a lot of time with the various arguments to theme
in the next workshop when we look at making custom themes in ggplot2.
We can remove that grid entirely if you don’t like it or find it distracting. Within the theme
function, you’ll want the panel.grid
argument, and to remove it we’ll just make it blank with element_blank()
.
s + theme(panel.grid = element_blank())
However, if you’re a fan of the grid, but want to make some changes, you can control pretty much whatever you want. Instead of element_blank
we can change the function to element_line
, which itself takes several arguments like color
, size
, linetype
. You can even add things like arrows to the end, or change how the end of the line looks. The sky is the limit really. Here are a few couple whimsical examples for illustration.
s + theme(panel.grid = element_line(color = "lightpink",
size = 0.75,
linetype = "dashed"))
s + theme(panel.grid = element_line(color = "gray85",
size = 3,
arrow = arrow()))
Now if that wasn’t enough control, turns out you can modify the major and minor grids independently with panel.grid.major
and panel.grid.minor
. Both of these functions inheret properties from panel.grid
if it’s there. So in this example, I use panel.grid
to turn everything bigger with size = 2
. But for the major gridlines (the ones that line up with the ticks), I’m turning them into dotted lines. For the minor gridlines I’m coloring them green:
s + theme(panel.grid = element_line(size = 2),
panel.grid.major = element_line(linetype = "dotted"),
panel.grid.minor = element_line(color = "lightgreen"))
And it gets better! You can actually control the x
- and y
-axes independently. So here, I’m turning all lines thicker with
s + theme(
# All lines are bigger
panel.grid = element_line(size = 2),
# Major lines are also dotted
panel.grid.major = element_line(linetype = "dotted"),
# Vertical major lines are also light sky blue with an arrow
panel.grid.major.x = element_line(color = "lightskyblue", arrow = arrow()),
# Horizontal major lines are also a khaki/yellow color.
panel.grid.major.y = element_line(color = "khaki"),
# Minor lines are light also lightgreen
panel.grid.minor = element_line(color = "lightgreen"),
# Vertical minor lines have a two-dash pattern
panel.grid.minor.x = element_line(linetype = "twodash"),
# Horizontal minor lines are solid and have an arrow at the end.
panel.grid.minor.y = element_line(linetype = "solid", arrow = arrow()))
The result for this particular plot looks awful, but it does show how much control you have. You’ll most likely never need all these commands for your grid lines, but it’s useful to be aware of these function’s existance because one day you’ll find yourself wishing you could control one little aspect of your plot and you’ll find you’ve already learned how!
By default, the title is left-aligned with the plotting area. Older versions of ggplot2
had it centered by default, base R’s plot
function has it centered, and a lot of other statistical software has centered titles. Some people prefer to have the title centered, which is totally fine. ggplot2
has made some aesthetic choices and its creator, Hadley Wickham, often describes it as “opinioned.” So if you want to center your title, then by golly center your title.
The key to centering is also in this theme
function with the plot.title
argument. Just as we did with the axis above, that argument should take a function; this time it’s element_text
which itself contains lot of controls over what text should look like. The key argument to element_text
is hjust
(“horizontal justification”), which is a number ranging from 0 (left aligned) to 1 (right aligned).
s + labs(title = "Stranger Things Episode Ratings",
caption = "Left-aligned title")
s + labs(title = "Stranger Things Episode Ratings",
caption = "Centered title") +
theme(plot.title = element_text(hjust = 0.5))
s + labs(title = "Stranger Things Episode Ratings",
caption = "Right-aligned title") +
theme(plot.title = element_text(hjust = 1))
s + labs(title = "Stranger Things Episode Ratings",
caption = "A title positioned uncomfortably somewhere towards the right") +
theme(plot.title = element_text(hjust = 0.81))
Of course, if you want to realign the subtitle or caption, you can do so with plot.subtitle
and plot.caption
, which work the same way. And just with the axis arguments, they all act independently of each other.
Note that by the title horizontal space that the title can be in is the same as the plotting area itself, not counting the space on the left for the vertical axis or the space on the right for the legend. In an upcoming version of ggplot2, you’ll be able to left-align it with the actual edge of the plot if you’d like, which will be pretty handy. For now, you can fake it by putting numbers less than zero or greater than 1 for the hjust
argument, but the upcoming version will have true alignment with the edge.
By now, you’ve seen a lot of things that theme
can do. (And there are so many more things it can do!) The problem is that it’s a bit verbose. If you wanted those crazy gridlines and a centered title, and custom ticks every time you made a plot, it would get quite repetitive. Fortunately, you can wrap all these things up into a single line of code.
Exactly how to wrap this up into a custom function is what the next workshop is about. For now, I’ll show you some of the preinstalled themes that come with ggplot2
. It’s easy to switch between them, and most remove the gray background that a lot of people dislike.
My go-to theme is called theme_bw()
for just “black and white.” The biggest difference is that now the background is white instead of gray. but it also adds a thin black border around the whole thing and has faint grey grid lines instead of white ones.
p + theme_bw()
The light
theme is very similar to bw
. The biggest difference is the outside box is lighter.
p + theme_light()
The classic
theme has no grid and the top and right parts of the box are gone too, just leaving the x and y axes.
p + theme_classic()
If you really like the gray, you can go for the dark
theme, which has a darker background and a darker gray for the grid lines.
p + theme_dark()
In fact, if you really like default background, you can set it specifically. This is the default for a reason because that specific shade of gray has been chosen to make colors stand out more.
p + theme_gray()
The minimal
theme is even simpler than light
. It doens’t have the outside border but it does retain the inside grid.
p + theme_minimal()
Finally, you can go completely blank with void
. This may seem a little weird, but it does have useful purposes, like when creating maps.
p + theme_void()
Something to keep in mind with themes is that they override any other theme
layer you might have included. For example, if you want to remove the legend and then use the classic
theme, it’s still there:
p + theme(legend.position = "none") +
theme_classic()
No, this isn’t a bug. The reason for this is because theme_*
is actually a shortcut for a whole bunch of other theme
elements. As a result, there are two theme
functions, so the second one overrides the first one. Fortunately, we can fix this by just reversing the order of theme
and theme_classic
:
p + theme_classic() +
theme(legend.position = "none")
If you don’t like any of these themes, or you want to change just one aspect of them (the width of the lines, the color of the background, the gridlines, etc.), you can! The next workshop dives deep into the theme
function and shows what kinds of things you can do to change your plot, and how to wrap all these changes up into a custom theme.
Let’s say we make a plot and it gets a little cumbersome because there are lots of points being shown. This is particularly true when you have text being displayed instead of points. Sometimes it’s nice to split the plot up into meaningful groups and look at each group individually. In fact, this is the basic principle behind small multiples, a concept popularized by Edward Tufte’s 1983 book, Visual Display of Quantitative Information.
To illustrate this, I’ll switch back to the cereal data that was used in previous chapters. And I’ll rename the manufactorers to something humand understand better with fct_recode
in the forcats
package.
cereal <- read.csv("http://www.joeystanley.com/data/cereal.csv")
cereal$mfr <- forcats::fct_recode(cereal$mfr,
"Kellogg's" = "K",
"General Mills" = "G",
"Post" = "P",
"Quaker Oats" = "Q",
"Ralston Purina" = "R",
"Nabisco" = "N")
For example, let’s say we want to see how much fiber a cereal has and compare it to how it’s rating is. But we want to be able to see the name of the cereal itself, so we use geom_text
instead of geom_point
.
ggplot(cereal, aes(fiber, rating)) +
geom_text(aes(label = name))
Ugh. That’s a little messy. Maybe it might help if we could look at each manufacturer individually. The long way to do this is to create six different datasets by subsetting the original, and then doing six different plots, and finding a way to plot them all in one in a grid-like layout. But that’s way too much work—imagine if we had 30 companies we wanted to look at! Fortunately, ggplot2
has a way to do this with just one simple line of code: facet_wrap
. Here, we can put the name of the column we want to split it up by—but it has to be after a tilde “~” (that squiggle thing left of the number “1” key). The reasons behind this are beyond the scope of this workshop, but just know you have to put it there. The result is exactly what we need.
ggplot(cereal, aes(fiber, rating)) +
geom_text(aes(label = name)) +
facet_wrap(~mfr)
Here we can see that each manufacturer has its own plot. The underlying order in the dataset determines the order the facets are displayed, which go from left-to-right top-to-bottom like a book. Here’s it’s alphabetical, but you can change that the same way that was shown above.
If you look closely, you’ll see that the x- and y-axes are the same for each plot. In other words, it’s clear to see that Nabisco cereals have a generally higher rating than most of the other ones and that Kellogg’s All Bran with Extra Fiber is true to its name and easy has the most fiber (and the highest rating). This is good because it makes comparisons across manufacturers easy.
But if we don’t care about differences between manufacturers, and just want to look at differences within them, we can have each plot zoom in to just the data that’s being shown. We do this by adding the scales = "free"
argument to facet_wrap
.
ggplot(cereal, aes(fiber, rating)) +
geom_text(aes(label = name)) +
facet_wrap(~mfr, scales = "free")
This plot shows the exact same data, but it highlights different things. For example, we can see clearly see the differences between Quaker Oats cereals, which was harder to see when they were all squished together before.
We can also change the number of rows and columns. By default, it’ll do roughly a square layout with approximately equal numbers of rows and columns. If you want them all to be side-by-side (perhaps you’re making a poster), you can set nrow = 1
or if you want them all vertical, you can set ncol = 1
.
ggplot(cereal, aes(fiber, rating)) +
geom_text(aes(label = name)) +
facet_wrap(~mfr, scales = "free", nrow = 1)
ggplot(cereal, aes(fiber, rating)) +
geom_text(aes(label = name)) +
facet_wrap(~mfr, scales = "free", ncol = 1)
More commonly, you’ll want to specify something a little less extreme than one very wide or one very tall plot, but it’s a good trick to be aware of.
What happens if you use facet_wrap
on some variable that’s already used in your plot? For example, what if you modify the above plot to have the manufacturers each with their own color?
What happens if you do a bar plot with fiber and manufacturer, but facet it by manufacturer? Is this a useful thing?
If you use the same variable for color and faceting, it does what you expect. The colors just don’t add much to the plot.
ggplot(cereal, aes(fiber, rating, color = mfr)) +
geom_text(aes(label = name)) +
facet_wrap(~mfr, scales = "free")
Instead it might be more useful to add another variable for color.
ggplot(cereal, aes(fiber, rating, color = protein)) +
geom_text(aes(label = name)) +
scale_color_distiller(type = "seq", palette = "Reds", direction = 1) +
facet_wrap(~mfr, scales = "free")
If you use a facet wrap on the same variable as the columns, it doesn’t really help much and it turns into a pretty lame graph.
ggplot(cereal, aes(mfr)) +
geom_bar() +
facet_wrap(~mfr)
As I’m going through the intermediate one, the way that I highlight one datapoint isn’t the best way. It would be best I think to add a new column that simply contains true/values based on whether the text in the mfr
column is “Kellogg’s”. When I plot the data and color the bars based on that new is_kelloggs
column, it colors just the one bar in a much more concise way.
cereal$is_kelloggs <- cereal$mfr == "Kellogg's"
ggplot(cereal, aes(mfr, fill = is_kelloggs)) +
geom_bar()
Since the legend is useless, here I should drop it anyway. And I’ll change the colors to really highlight what I want to show.
ggplot(cereal, aes(mfr, fill = is_kelloggs)) +
geom_bar() +
scale_fill_manual(values = c("gray75", "darkred")) +
theme(legend.position = "none")
This is a much more elegant solution. With six manufactorers, it might not be obvious why it’s more elegant, but let’s look at an example when there are many more categories that you want to highlight. Let’s look at the plot made with the girlnames
datset.
g
Now what if I wanted ot highlight the name Abigail. Using the manual technique, I’d have a lot of repetitive code. (Even putting this code together I messed up a few times making sure I had the pink in the right spot and the right number of gray ones.)
ggplot(girlnames, aes(name, n, fill = name)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "pink", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75", "gray75")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
legend.position = "none")
If we add the new column though, it’s literally the same amount of code as we did with the cereal data. Whether there are 6 categories or 600, it’s the same code.
girlnames$is_abigail <- girlnames$name == "Abigail"
ggplot(girlnames, aes(name, n, fill = is_abigail)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("gray75", "pink")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
legend.position = "none")
If you’re using Color Brewer for your colors, you have a couple additional changes you can make to your legend. If you’re plotting a continuous variable, you normally get a continuous scale in your legend:
ggplot(cereal, aes(sugars, rating)) +
geom_point(aes(color = rating)) +
scale_color_distiller(type = "seq", palette = "PuRd")
You can actually add the guide = "legend"
argument, and it’ll turn it into a categorical-looking legend. To be clear, the dots on the graph still use a continuous color scheme, but the legend is at least a little bit cleaner.
ggplot(cereal, aes(sugars, rating)) +
geom_point(aes(color = rating)) +
scale_color_distiller(type = "seq", palette = "PuRd", guide = "legend")
For some reason though, this has the effect of reversing the order so that high numbers are at the bottom. We can flip the way they appear in the legend by taking out guide = "legend"
and using guide = guide_legend(reverse = TRUE)
instead. Slightly cumbersome, but it gets the job done.
ggplot(cereal, aes(sugars, rating)) +
geom_point(aes(color = rating)) +
scale_color_distiller(type = "seq", palette = "PuRd",
guide = guide_legend(reverse=TRUE))
Of course now if we want to change it so that high numbers get the darker color, we have to add direction = 1
to reverse the order.
ggplot(cereal, aes(sugars, rating)) +
geom_point(aes(color = rating)) +
scale_color_distiller(type = "seq", palette = "PuRd",
guide = guide_legend(reverse=TRUE), direction = 1)
By the way, if direction = 1
doesn’t make any changes in other plots, try direction = -1
instead. I can’t figure out which one to use.
Anyway, because Color Brewer does some cool things, it takes a little more work to get things done, but the result is a pretty good looking plot.
Everything we’ve done so far is just a temporary image that will disappear when you close R. Crucially, it’s not going to show up in your powerpoints or papers. There is a way to save plots by clicking things in RStudio, but the whole purpose of this workshop is to do things via code because it’s a lot easier to reproduce it.
The way to save things is to use the function ggsave
immediately after creating a plot. You can specify the path to where you want it saved by typing a full or relative path. Note that Windows users will need a double back slash (\\
) while Mac users need a single forward slash (/
). Also be sure to specify the name of the file itself and the filetype (".png"
, ".jpg"
, etc.). You can specify the width
and height
in inches to control the size, which is super useful for making comparison charts. And you can specify the resolution using dpi
(“dots per inch”). The default, which is the standard for many publications, is 300
.
p + theme_classic() +
theme(legend.position = "none")
# For Macs
ggsave("/Users/joeystanley/Desktop/plots/barplot.png",
dpi = 300, height = 7, width = 7)
# For Windows
ggsave("C:\\Users\\joeystanley\\Desktop\\plots\\barplot.png",
dpi = 300, height = 7, width = 7)
Finally, it’s useful to know that your plot will automatically stretch depending on the size you give it. Usually, that’s fine and there’s no harm done. However, sometimes, the aspect ratio of the plot is actually important and you want to control for that. You can do this with the coord_fixed
command and adding the ratio
argument. So if I want to ensure that the x- and y-axes of my scatterplot were scaled at a 1:1 ratio, I can do so:
# No aspect ratio control
ggplot(cereal, aes(sugars, rating)) +
geom_point(aes(color = rating)) +
scale_color_distiller(type = "seq", palette = "PuRd")
# Enforce a 1:5 ratio for the x and y axes.
ggplot(cereal, aes(sugars, rating)) +
geom_point(aes(color = rating)) +
scale_color_distiller(type = "seq", palette = "PuRd") +
coord_fixed(ratio = 1/5)
So now, when you save your plot, you can have greater control over how the final product looks. Note that if your plot is relatively narrow, but you specify like a 30-inch-wide plot, you’re going to have a lot of white space on the sides.
Yes, ggplot2 takes a bit of time to get used to, but it is worth it. It is the best way that I know of to create clean, professional visualizations. I can’t possibly show you all the ways to visualize your data. Not only is that a lot of ggplot2 but it gets into a lot of statistical background regarding data types and the pros and cons of each visualization type. Data visualization is a substantial field of its own, but it’s important to at least know the basics as well as some additional tips for customizing your plots.