Simulating Candyland – Joey Stanley

About six years ago, I did a blog post about how I simulated Chutes and Ladders after playing it countless times with my child. Recently, I’ve been playing tons of Candyland with my next child. The games are kinda similar and they take zero skill, which means they’re perfect for little kiddos. It also means I can simulate them pretty easily.

I’m going to base the game off the board that I’ve been playing with, which we got at a thrift store. Here’s what that looks like:

Tip

If you’re not interested in how I did the simulation and just want to see the results, just jump to Section 3.

1 The simulation

1.1 Setting up the game

The first step in simulating Candyland is to set up the board. There’s no real easy way to do that, so I just went through and hard-coded the tiles and colors. I’ve got a simple dataframe with a column for the tile number, the color of the tile, and if there’s anything special about it. In my original spreadsheet, if there was nothing special about the tile, I left it blank, but because of issues with NAs down the road, I’ve replaced those blank cells with "none". Here’s what that looks like:

library(tidyverse)
board <- read_csv("candyland_data.csv", show_col_types = FALSE) |> 
  mutate(special = replace_na(special, "none")) |> 
  print()

# A tibble: 132 × 3
    tile color  special              
   <dbl> <chr>  <chr>                
 1     1 red    none                 
 2     2 purple none                 
 3     3 yellow none                 
 4     4 blue   peppermint pass start
 5     5 orange none                 
 6     6 green  none                 
 7     7 red    none                 
 8     8 purple none                 
 9     9 pink   cupcake              
10    10 yellow none                 
# ℹ 122 more rows

Now that we’ve got the board, we need to get the cards. This one is actually straightforward enough that I can create it on the fly. First, I’ll create the seven candy cards. I’m not sure if these are the official names, but that’s what we call them in my house.

candy_cards <- c("cupcake", "ice cream cone", "gummy star", 
                 "gingerbread man", "lollipop", "popsicle", 
                 "chocolate truffle")

There are six “single-color” cards four “double-color” cards for each color. So, I’ve first created a function that takes in a color name and creates those ten cards as a vector.

create_one_color_card <- function(.color) {
  c(rep(paste0("double ", .color), 4), 
    rep(paste0("single ", .color), 6))
}
create_one_color_card("red")

 [1] "double red" "double red" "double red" "double red" "single red"
 [6] "single red" "single red" "single red" "single red" "single red"

So now, I can use that those to generate the full deck of 67 cards.

cards <- c(create_one_color_card("red"),
           create_one_color_card("purple"),
           create_one_color_card("yellow"),
           create_one_color_card("blue"),
           create_one_color_card("orange"),
           create_one_color_card("green"),
           candy_cards)
sample(cards)

 [1] "single red"        "popsicle"          "single purple"    
 [4] "single purple"     "single green"      "single yellow"    
 [7] "gingerbread man"   "double blue"       "single orange"    
[10] "single blue"       "single red"        "double orange"    
[13] "single yellow"     "double yellow"     "single green"     
[16] "single green"      "single red"        "single orange"    
[19] "single red"        "single green"      "double blue"      
[22] "single green"      "double orange"     "single purple"    
[25] "double yellow"     "single blue"       "single orange"    
[28] "double purple"     "double yellow"     "single green"     
[31] "double red"        "lollipop"          "single purple"    
[34] "single purple"     "single red"        "ice cream cone"   
[37] "double purple"     "single purple"     "double green"     
[40] "double yellow"     "double red"        "single blue"      
[43] "single yellow"     "double red"        "single blue"      
[46] "single blue"       "single orange"     "single red"       
[49] "double blue"       "double red"        "single yellow"    
[52] "double orange"     "cupcake"           "gummy star"       
[55] "single orange"     "single yellow"     "chocolate truffle"
[58] "double blue"       "single orange"     "double green"     
[61] "double green"      "double purple"     "double purple"    
[64] "double orange"     "double green"      "single blue"      
[67] "single yellow"

So, with the board and cards set up, let’s start simulating some games. Note that because there is virtually zero interaction between the players, I’m just going to simulate single-player games.

2 Explaining the simulation

In this section, I’ll explain the mechanics of the simulation. If you’re not interested in that and just want to get to the results, feel free to skip to Section 3. The mechanics are similar to my Chutes and Ladders simulation.

2.1 Preliminaries

First, I want to allocate space for a full game. First, I’ll create a tibble called turns that has as many rows as twice the number of cards in the deck. After doing some simulations, I’ve found that, very rarely, a single player can get through all the cards and will need to shuffle them. I haven’t yet run into a simulation where a second shuffle is needed, though I suppose it could theoretically go on forever I think.

I’ll then simulate shuffling the cards by randomly sorting the cards with sample(cards, replace = FALSE). And I’ll string—not shuffle—two decks together in case we have a really long game. I’ll do that by just doing that sample(...) two times and strung together with c().

As additional columns in this tibble, I’ll create empty columns for the start time number, the end tile number, whether that tile was a shortcut or not. Those will be populated as the game happens.

n_cards <- length(cards)
turns <- tibble(turn_num = 1:(n_cards*2),
                start    = NA,
                card  = c(sample(cards, replace = FALSE), 
                          sample(cards, replace = FALSE)),
                shortcut = NA,
                end      = NA)

Since I’ll need to do this a lot in the explanation of the simulation, I’ll save it as a function so it takes up less room.

setup_turns <- function() {
  n_cards <- length(cards)
  tibble(turn_num = 1:(n_cards*2),
         start    = NA,
         card  = c(sample(cards, replace = FALSE), 
                   sample(cards, replace = FALSE)),
         shortcut = NA,
         end      = NA)
}
turns <- setup_turns()

2.2 Setting up the loop

Now that we’ve got the cards shuffled and ready to go for this game, let’s start the turns. Since I don’t know how many iterations I’ll need, I’ll set up a while loop that goes until I tell it to stop.¹ Here’s what the barebones loop looks like:

¹ There might be a more elegant way using for loop. Maybe my R looping skills aren’t where they need to be, but I just couldn’t figure out how to get it to exit the loop the way I wanted to and return the dataframe. I’ve hacked a bit of a solution by incrementing i each iteration. I miss Perl’s last function which made it so that it would exit the loop at the end of the current iteration, not necessarily right at that moment like R’s break function does. The keep_playing variable simulates that.

turns <- setup_turns()
# Loop until the game is over
i <- 1
keep_playing <- TRUE
while(keep_playing) {
  
  # put the game here
  # don't run this code though; it'll go on forever
  
}

Of course, this is going to run forever because we haven’t put in any for it to stop. So I’ll increment i each iteration and end the loop after n_cards*2 iterations.

turns <- setup_turns()

# Loop until the game is over
i <- 1
keep_playing <- TRUE
while(keep_playing) {
  
  if (i >= (n_cards*2)) {
    keep_playing <- FALSE
  } else {
    i <- i + 1
  }
  
}

Okay, so we’re now looping through code enough times to go through the game. Now let’s start to add some content to that game.

2.3 Start at tile zero

First, I need to get the start tile. On the first turn, the start tile is zero. Let’s add that. Here’s what that code looks like:

if (i == 1) {
  turns$start[[i]] <- 0
}

Here, I’ve added a conditional that checks what iteration number we’re on. If we’re on the first one, then go ahead and declare the start tile for this iteration to be zero. Here’s that code in context:

turns <- setup_turns()

i <- 1
keep_playing <- TRUE
while(keep_playing) {

  # Get the start tile
  if (i == 1) {
    turns$start[[i]] <- 0
  }
  
  # End the game
  if (i >= (n_cards*2)) {
    keep_playing <- FALSE
  } else {
    i <- i + 1
  }
  
}
turns

# A tibble: 134 × 5
   turn_num start card          shortcut end  
      <int> <dbl> <chr>         <lgl>    <lgl>
 1        1     0 single red    NA       NA   
 2        2    NA double orange NA       NA   
 3        3    NA double red    NA       NA   
 4        4    NA lollipop      NA       NA   
 5        5    NA double purple NA       NA   
 6        6    NA single green  NA       NA   
 7        7    NA double green  NA       NA   
 8        8    NA double orange NA       NA   
 9        9    NA single yellow NA       NA   
10       10    NA single yellow NA       NA   
# ℹ 124 more rows

Later on, we’ll add code saying that the start tile for all other turns is the end tile of the previous turn, but because we haven’t added any code yet for the end tiles, it won’t do us any good yet. So, to avoid issues with NAs, I’ll skip that for now.

2.4 Adding candy cards

What we do need to do though is “draw a card” and figure out how many tiles we need to advance. Keep in mind that the cards are already there for us, so we don’t need to randomly sample from the deck or anything. We just need to take the info that’s already there and use it to figure out where to end up.

First, let’s account for the candy cards. These are easy: regardless of what your start tile is, your end tile will always be the same. The code that I’ll add looks like this:

if (turns$card[[i]] %in% candy_cards) {
  turns$end[[i]] <- board[board$special == turns$card[[i]],]$tile
}

So here, I’m checking to see if the name of the card that has been assigned for this turn matches the one of the candy cards I have saved in the candy_cards vector I declared earlier. If there’s a match, then I’ll basically look up the tile of that candy card in my board dataframe—that’s the one I manually created in a separate file. It searches for the name of the candy, and fetches the tile number for it. That number is then assigned to this turn’s end tile. Here’s that code in context:

turns <- setup_turns()
i <- 1
keep_playing <- TRUE
while(keep_playing) {
  
  # Get the start tile
  if (i == 1) {
    turns$start[[i]] <- 0
  }
  
  # If it's a candy card, go straight there.
  if (turns$card[[i]] %in% candy_cards) {
    turns$end[[i]] <- board[board$special == turns$card[[i]],]$tile
  }

  # End the game
  if (i >= (n_cards*2)) {
    keep_playing <- FALSE
  } else {
    i <- i + 1
  }
  
}
turns

# A tibble: 134 × 5
   turn_num start card           shortcut   end
      <int> <dbl> <chr>          <lgl>    <dbl>
 1        1     0 double yellow  NA          NA
 2        2    NA double red     NA          NA
 3        3    NA ice cream cone NA          20
 4        4    NA single yellow  NA          NA
 5        5    NA double yellow  NA          NA
 6        6    NA double purple  NA          NA
 7        7    NA single green   NA          NA
 8        8    NA single red     NA          NA
 9        9    NA single orange  NA          NA
10       10    NA double yellow  NA          NA
# ℹ 124 more rows

So far, the game does nothing unless you run into a candy card. If you do, it’ll note where the end position should be.

At this point, I might as well add some code in that makes it so that the start tile of the next turn is the previous tile of the last turn.

turns <- setup_turns()
i <- 1
keep_playing <- TRUE
while(keep_playing) {
  
  # Get the start tile
  if (i == 1) {
    turns$start[[i]] <- 0
    # Otherwise, start where the last turn ended.
  } else {
    turns$start[[i]] <- turns$end[[i - 1]]
  }
  
  # If it's a candy card, go straight there.
  if (turns$card[[i]] %in% candy_cards) {
    turns$end[[i]] <- board[board$special == turns$card[[i]],]$tile
  }

  if (i >= (n_cards*2)) {
    keep_playing <- FALSE
  } else {
    i <- i + 1
  }
  
}
turns

# A tibble: 134 × 5
   turn_num start card              shortcut   end
      <int> <dbl> <chr>             <lgl>    <dbl>
 1        1     0 double green      NA          NA
 2        2    NA double red        NA          NA
 3        3    NA double orange     NA          NA
 4        4    NA chocolate truffle NA         117
 5        5   117 double green      NA          NA
 6        6    NA double green      NA          NA
 7        7    NA single green      NA          NA
 8        8    NA gingerbread man   NA          69
 9        9    69 single green      NA          NA
10       10    NA single yellow     NA          NA
# ℹ 124 more rows

So in the above game, we see that we got candy tiles in two of the first couple turns. The end position is saved and the start position of the next turn is the same.

2.5 Adding other cards

Adding the colored cards is slightly less straightforward than the candy cards, but still isn’t too bad. There are two kinds of cards for each color: single cards, which means you advance to the next tile of that color, and double cards, which means you advance to to the second next tile of that color. Keep in mind that the cards have already been shuffled and assigned turns, so all I need to do is take the card and figure out how far I need to get.

So, right now, the cards take the form of a character vector with values like "single red" or "double blue". To extract the color, I’ll use str_extract and pull out the last word. And to get whether it’s a single or double, I’ll do the same thing and pull out the first word.

card_color  <- str_extract(turns$card[[i]], "\\w+\\Z")
card_amount <- str_extract(turns$card[[i]], "\\A\\w+")

Now, I need to figure out where to go then. I’m less concerned about figuring out the exact number of tiles I need to advance. I think it’ll be easier to select all upcoming tiles of that color and find the tile number of the first or second one. So, first, I’ll create eligible_spots, which takes the board, filters it so that only tiles greater than the tile I’m on and whose color matches the card I drew, and pull out just the tile numbers. So if my starting position were 0 and I drew a single red, here are the eligible spots.

start_position <- 0
card_color <- "red"
eligible_spots <- board |>
  filter(tile > start_position,
         color == card_color) |>
  pull(tile)
eligible_spots

 [1]   1   7  14  21  27  33  39  46  52  58  64  71  77  83  89  96 103 109 115
[20] 121 127

Now I need to just take the first one if it’s a single card and then take the second one if it’s a double.

if (card_amount == "single") {
  turns$end[[i]] <- eligible_spots[[1]]
} else {
  turns$end[[i]] <- eligible_spots[[2]]
}
turns$end[[i]]

Great! Now if we just incorporate that bit of code into the main loop, we should be well on our way to a functioning game.

turns <- setup_turns()
i <- 1
keep_playing <- TRUE
while(keep_playing) {
  
  # Get the start tile
  if (i == 1) {
    turns$start[[i]] <- 0
  # Otherwise, start where the last turn ended.
  } else {
    turns$start[[i]] <- turns$end[[i - 1]]
  }
  
  # If it's a candy card, go straight there.
  if (turns$card[[i]] %in% candy_cards) {
    turns$end[[i]] <- board[board$special == turns$card[[i]],]$tile
  # If it's not, find the next colors.
  } else {
    card_color  <- str_extract(turns$card[[i]], "\\w+\\Z")
    card_amount <- str_extract(turns$card[[i]], "\\A\\w+")
    
    # move to the next spot
    eligible_spots <- board |>
      filter(tile > turns$start[[i]],
             color == card_color) |>
      pull(tile)
    
    if (card_amount == "single") {
      turns$end[[i]] <- eligible_spots[[1]]
    } else {
      turns$end[[i]] <- eligible_spots[[2]]
    }
    
  }
  
  # End the game
  if (i >= (n_cards*2)) {
    keep_playing <- FALSE
  } else {
    i <- i + 1
  }
}

Except, if you’re like me, then the loop breaks. After digging around, I found out that the game works perfectly until the end. So let’s add some better code to account for the end.

2.6 Ending the game

Candyland ends when someone reaches the end of the board. The problem with how our loop works now is that if you’re close to the end and there are no more eligible tiles left, it crashes because it doesn’t know where to go. So, what we need to do is add a conditional that says if there are no more eligible tiles, to move it to “tile 133”, which is just one space after the last tile. We also need to account for the possibility of getting a double when there’s just one eligible tile left, which causes the person to win. So let’s program that in. That updated block of code looks like this:

# find the number of eligible spots
n_eligible_spots <- length(eligible_spots)

# regular single card
if (n_eligible_spots >= 1 & card_amount == "single") {
  turns$end[[i]] <- eligible_spots[[1]]
# regular double card
} else if (n_eligible_spots >= 2 & card_amount == "double") {
  turns$end[[i]] <- eligible_spots[[2]]
# no more eligible spots
} else {
  turns$end[[i]] <- 133
}

Once we hit the end of the board, we should end the game. As is, the loop will continue drawing cards until we’ve gone through the deck twice. This may seem like innocent extra iterations of the loop, but the problem is that when we draw another candy card we get pulled back into the game.

So, let’s add some code at the end that says to end the game we’re on tile 133. That means we’ll extend that if-else statement at the bottom of the loop:

# run out of cards
if (i >= c(n_cards*2)) {
  keep_playing <- FALSE
# win
} else if (turns$end[[i]] >= max(board$tile)) {
  keep_playing <- FALSE
# keep going
} else {
  i <- i + 1
}

Now, the game ends when we run out of cards (programmed as going through the deck twice) or we hit the end of the board. We tell the program that the game is over by setting keep_going to FALSE. That’ll stop the while loop from doing any more iterations. Here’s the game so far:

turns <- setup_turns()
i <- 1
keep_playing <- TRUE
while(keep_playing) {
  
  # Get the start tile
  if (i == 1) {
    turns$start[[i]] <- 0
  # Otherwise, start where the last turn ended.
  } else {
    turns$start[[i]] <- turns$end[[i - 1]]
  }
  
  # If it's a candy card, go straight there.
  if (turns$card[[i]] %in% candy_cards) {
    turns$end[[i]] <- board[board$special == turns$card[[i]],]$tile
  # If it's not, find the next colors.
  } else {
    card_color  <- str_extract(turns$card[[i]], "\\w+\\Z")
    card_amount <- str_extract(turns$card[[i]], "\\A\\w+")
    
    # move to the next spot
    eligible_spots <- board |>
      filter(tile > turns$start[[i]],
             color == card_color) |>
      pull(tile)
    
    # find the number of eligible spots
    n_eligible_spots <- length(eligible_spots)
    
    # regular single card
    if (n_eligible_spots >= 1 & card_amount == "single") {
      turns$end[[i]] <- eligible_spots[[1]]
    # regular double card
    } else if (n_eligible_spots >= 2 & card_amount == "double") {
      turns$end[[i]] <- eligible_spots[[2]]
    # no more eligible spots
    } else {
      turns$end[[i]] <- 133
    }
    
  }
  
  # run out of cards
  if (i >= c(n_cards*2)) {
    keep_playing <- FALSE
  # win
  } else if (turns$end[[i]] >= max(board$tile)) {
    keep_playing <- FALSE
  } else {
    i <- i + 1
  }
  
}
turns

# A tibble: 134 × 5
   turn_num start card          shortcut   end
      <int> <dbl> <chr>         <lgl>    <dbl>
 1        1     0 single orange NA           5
 2        2     5 double yellow NA          16
 3        3    16 single green  NA          19
 4        4    19 double green  NA          32
 5        5    32 double yellow NA          41
 6        6    41 single green  NA          45
 7        7    45 single blue   NA          49
 8        8    49 double green  NA          57
 9        9    57 single blue   NA          61
10       10    61 single green  NA          63
# ℹ 124 more rows

Because we’ve allocated room in our turns dataframe for many more turns, what you’ll see if you scroll down to the bottom is a bunch of NAs in the start and end tiles. We’ll filter those out later. But first, we have to add one more component to the game: the shortcuts!

2.7 Adding shortcuts

The shortcuts are pretty easy to program. In fact, they’re the same as what I did for Chutes and Ladders. We add a conditional that says if the end tile is a certain number, change it to another number. Since there are only two, I’ll just hard-code it in. Here’s what that bit of code looks like.

# Do the shortcuts.
if (turns$end[[i]] == 4) {
  turns$end[[i]] <- 60
  turns$shortcut[[i]] <- "peppermint pass"
} else if (turns$end[[i]] == 29) {
  turns$end[[i]] <- 41
  turns$shortcut[[i]] <- "gummy pass"
}

In order to keep track of which shortcuts were encountered, I added a small bit of code that saves the name of the shortcut to the shortcut column. That’ll come in handy later when we do lots of games and query the simulations.

Now our code is complete:

turns <- setup_turns()
i <- 1
keep_playing <- TRUE
while(keep_playing) {
  
  # Get the start tile
  if (i == 1) {
    turns$start[[i]] <- 0
  # Otherwise, start where the last turn ended.
  } else {
    turns$start[[i]] <- turns$end[[i - 1]]
  }
  
  # If it's a candy card, go straight there.
  if (turns$card[[i]] %in% candy_cards) {
    turns$end[[i]] <- board[board$special == turns$card[[i]],]$tile
    
  # If it's not, find the next colors.
  } else {
    card_color  <- str_extract(turns$card[[i]], "\\w+\\Z")
    card_amount <- str_extract(turns$card[[i]], "\\A\\w+")
    
    # move to the next spot
    eligible_spots <- board |>
      filter(tile > turns$start[[i]],
             color == card_color) |>
      pull(tile)
    
    # find the number of eligible spots
    n_eligible_spots <- length(eligible_spots)
    
    # regular single card
    if (n_eligible_spots >= 1 & card_amount == "single") {
      turns$end[[i]] <- eligible_spots[[1]]
    # regular double card
    } else if (n_eligible_spots >= 2 & card_amount == "double") {
      turns$end[[i]] <- eligible_spots[[2]]
    # no more eligible spots
    } else {
      turns$end[[i]] <- 133
    }
    
  }
  
  # run out of cards
  if (i >= c(n_cards*2)) {
    keep_playing <- FALSE
  # win
  } else if (turns$end[[i]] >= max(board$tile)) {
    keep_playing <- FALSE
  } else {
    i <- i + 1
  }
  
}
turns

# A tibble: 134 × 5
   turn_num start card          shortcut   end
      <int> <dbl> <chr>         <lgl>    <dbl>
 1        1     0 double orange NA          12
 2        2    12 double yellow NA          23
 3        3    23 single yellow NA          29
 4        4    29 single green  NA          32
 5        5    32 gummy star    NA          42
 6        6    42 single yellow NA          48
 7        7    48 single red    NA          52
 8        8    52 single yellow NA          54
 9        9    54 single purple NA          59
10       10    59 double green  NA          70
# ℹ 124 more rows

So, we now have a loop that does a full simulation of Candyland!

But, if you’re like me, just one simulation isn’t enough. I want to run lots of simulations! And since we’ve got this nice and tidy, might as well do just a little bit more and wrap it up into a function! Let’s do that now.

2.8 Wrap it up into a function

Turning this into a function is actually pretty straightforward from here. All we need to do is put this entire chunk of code into a function and then make sure we’re exporting the final dataset. Here’s where I’ll filter out the NA turns at the end of the game. I’ve also added the argument game_num = 0 to the function call, mostly to help with the map function later on, since I can’t figure out how to map through a list without sending an argument.

simulate_game <- function(game_num = 0) {
  
  # put everything we've done so far here
  
  turns %>%
    filter(turn_num <= i) %>%
    return()
}
simulate_game()

So, to see that in context, here is the entire function from start to finish.

simulate_game <- function(game_num = 0) {
  
  # Declare space for the full game.
  turns <- setup_turns()
  
  # Loop until the game is over
  i <- 1
  keep_playing <- TRUE
  while(keep_playing) {

    # Step 1: Start at zero
    if (i == 1) {
      turns$start[[i]] <- 0

    # Otherwise, start where the last turn ended.
    } else {
      turns$start[[i]] <- turns$end[[i - 1]]
    }

    # Step 2: This is where the game actually happens.
    # If it's a candy card, go straight there.
    if (turns$card[[i]] %in% candy_cards) {
      turns$end[[i]] <- board[board$special == turns$card[[i]],]$tile
      
    # If it's not, find the next colors.
    } else {
      card_color  <- str_extract(turns$card[[i]], "\\w+\\Z")
      card_amount <- str_extract(turns$card[[i]], "\\A\\w+")

      # move to the next spot
      eligible_spots <- board |>
        filter(tile > turns$start[[i]],
               color == card_color) |>
        pull(tile)
      # find the number of eligible spots
      n_eligible_spots <- length(eligible_spots)
      
      # regular single card
      if (n_eligible_spots >= 1 & card_amount == "single") {
        turns$end[[i]] <- eligible_spots[[1]]
      # regular double card
      } else if (n_eligible_spots >= 2 & card_amount == "double") {
        turns$end[[i]] <- eligible_spots[[2]]
      # no more eligible spots
      } else {
        turns$end[[i]] <- 133
      }
    }
    
    # Do the shortcuts.
    if (turns$end[[i]] == 4) {
      turns$end[[i]] <- 60
      turns$shortcut[[i]] <- "peppermint pass"
    } else if (turns$end[[i]] == 29) {
      turns$end[[i]] <- 41
      turns$shortcut[[i]] <- "gummy pass"
    }
  
    # Step 4: Check if it's game over.
    # run out of cards
    if (i >= c(n_cards*2)) {
      keep_playing <- FALSE
    # win
    } else if (turns$end[[i]] >= max(board$tile)) {
      keep_playing <- FALSE
    } else {
      i <- i + 1
    }
  }

  turns %>%
    filter(turn_num <= i) %>%
    return()
}
simulate_game()

# A tibble: 42 × 5
   turn_num start card          shortcut   end
      <int> <dbl> <chr>         <lgl>    <dbl>
 1        1     0 double red    NA           7
 2        2     7 double purple NA          15
 3        3    15 double green  NA          26
 4        4    26 single purple NA          28
 5        5    28 double yellow NA          35
 6        6    35 double blue   NA          43
 7        7    43 double yellow NA          54
 8        8    54 single red    NA          58
 9        9    58 single red    NA          64
10       10    64 single blue   NA          67
# ℹ 32 more rows

2.9 Simulate lots of games!

We can now call simulate_game() as many times as we want, and it’ll do a new simulation each time. Let’s set up a dataframe so that we can save the output of each game. Here, I’ll create a dataframe that has one column called game_num that is just the numbers 1 to 10,000. I’ll then use purrr::map to run the simulation once for each row. This takes about 1.8 minutes on my computer, so be patient if it’s a little slow for you. (Or just decrease the number of games.)

games <- tibble(game_num = 1:10000) |>
  mutate(game = map(game_num, simulate_game)) |>
  unnest(cols = c(game))

For what it’s worth, when I have code blocks that I know will take a long time to run, I wrap them up in some code to track the time, so I can remember that next time I run. I also add beepr::beep() to the bottom so that my computer makes a satisfying sound once the block is completed. I’ll also set the seed to today’s date to make all this replicable. You can do this if you want, but you obviously don’t need to.

# Takes 1.7ish minutes for 10K simulations.
start_time <- Sys.time()
set.seed(250312)
games <- tibble(game_num = 1:10000) |>
  mutate(game = map(game_num, simulate_game)) |>
  unnest(cols = c(game))
beepr::beep()
Sys.time() - start_time

Time difference of 1.639684 mins

Hooray! We now have a function that simulates an entire game of Candyland, and with just a few additional lines of code and a little bit of patience, we were able to run that simulation thousands of times. Let’s take a look at the output, just to see what we’re working with:

games

# A tibble: 224,150 × 6
   game_num turn_num start card          shortcut   end
      <int>    <int> <dbl> <chr>         <chr>    <dbl>
 1        1        1     0 single purple <NA>         2
 2        1        2     2 double orange <NA>        12
 3        1        3    12 single blue   <NA>        17
 4        1        4    17 single yellow <NA>        23
 5        1        5    23 single green  <NA>        26
 6        1        6    26 double red    <NA>        33
 7        1        7    33 single red    <NA>        39
 8        1        8    39 single yellow <NA>        41
 9        1        9    41 double green  <NA>        51
10        1       10    51 double orange <NA>        62
# ℹ 224,140 more rows

We’ve got the same five columns that we’ve been used to working with, with the addition of one more, showing the game number. In total, there were 218,035 turns taken across these 10,000 games. So, you might be able to see where this is going: we can now query this giant spreadsheet to see patterns across the games.

3 Results

So, now that we have the simulation done, let’s take a look at the results! First, I’ll take the giant games object and summarize it. I don’t necessarily need to keep information about every single turn, but I can start to count some things that we can think about analyzing later. For now, Ill count the number of turns, candies, singles, doubles, and shortcuts, and I’ll do so by game.

games_summary <- games %>%
  summarize(turns = max(turn_num),
            n_candies = sum(card %in% candy_cards),
            n_singles = sum(str_detect(card, "single")),
            n_doubles = sum(str_detect(card, "double")),
            n_shortcuts = sum(!is.na(shortcut)),
            .by = game_num) |> 
  print()

# A tibble: 10,000 × 6
   game_num turns n_candies n_singles n_doubles n_shortcuts
      <int> <int>     <int>     <int>     <int>       <int>
 1        1    20         0        10        10           0
 2        2    16         4         8         4           0
 3        3    25         2        16         7           1
 4        4     7         1         5         1           0
 5        5    23         0        16         7           1
 6        6    29         4        18         7           0
 7        7    13         1         8         4           0
 8        8    18         1        11         6           0
 9        9    27         2        15        10           0
10       10    17         3         9         5           0
# ℹ 9,990 more rows

We now have a spreadsheet that has one row per game, and how many times those things happened. We’ll do some additional queries of the games dataset later, but for now let’s work with that.

3.1 Number of turns

First, let’s look at the number of turns. This plot shows how many turns it took to complete each game.

See code

ggplot(games_summary, aes(turns)) + 
  geom_histogram(binwidth = 1, fill = "#6e6cff") + 
  scale_x_continuous(breaks = seq(0, 200, 10)) + 
  scale_y_continuous(expand = expansion(0, 5)) + 
  labs(title = "Number of turns required to finish a solo game of Candyland",
       subtitle = "Based on 10,000 simulated games") + 
  theme_minimal(base_size = 14, base_family = "Avenir")

This is an interesting distribution. I’ll bet a statistician could identify what kind of probability distribution it looks like but it’s not one that I recognize. Broadly, we see that the typical number of turns is about 12–22. There’s an interesting double peak at the top there, one at around 13–14 turns and another centered around 20. I’m not sure what’s going on there, but it seems to show up every time I run these simulations, so there must be something real that is causing that distribution. (Edit: I found out what’s happening! See Section 4.2!)

The number of a games that had more than about 22 turns goes down the more turns there were. It follows a nice (reverse?) exponential function. Keep in mind that there are 67 cards in the deck, which appears to be about the limit. There were a small number of games (just 28 out of 10,000) that took more than 67 turns. So, if you’re playing a solo game and have to reshuffle the deck, that’s a pretty rare thing!

The most number of turns I saw in this simulation was a whopping 114. It was an unfortunate sequence of going for a long time without getting any candy cards (or getting ones that were close to where they already were), nearly making it to the end, and then getting yanked back to close to the beginning. It happened over and over. There was one exciting set of turns (about 60 turns in) where they got four candies in just 10 turns. They were right near the end, got the beloved Chocolate Truffle and moved ahead six tiles, then immediately got pulled back to the Lollipop, made it a few turns and then when the deck reshuffled they got pulled back by the Gummy Star, and three turns later pulled back again to the Cupcake.

The fewest number of turns was three, which happened 38 times in my 10,000 games. My five-year-old had this happen to him just the other day—I didn’t realize there was a 0.38% chance of that happening!² In all cases, these three-turn games got the Chocolate Truffle on their first turn, and at least one double-color card after that.

² Again, a statistician could probably figure out the precise probability, but I figure these numbers based on the simulation are close.

3.2 Single-color cards

Let’s take a look at the single-color cards. What is the distribution of those cards across these simulations?

See code

games_summary |> 
  count(n_singles) |> 
  mutate(prop = n/sum(n)) |> 
  ggplot(aes(n_singles, prop)) + 
  geom_col(width = 1, fill = "#6e6cff") + 
  scale_x_continuous(breaks = seq(0, 200, 10),) + 
  scale_y_continuous(expand = expansion(0, 0),
                     breaks = seq(0, 0.1, 0.02),
                     label = scales::percent) + 
  labs(title = "Number of single-color cards drawn in a game of Candyland",
       subtitle = "Based on 10,000 simulated games",
       x = "number of single-color cards drawn",
       y = "percent of games") + 
  theme_minimal(base_size = 14, base_family = "Avenir")

Because they’re the most common kind of card, it should come as no surprise that the distribution of the number of single-color cards you draw parallels the total number of cards drawn in the came. We a hint of that mysterious double peak like we saw in the previous plot, but the first peak is more prominent instead of the second. Overall though, the general shape is the same as the number of turns in the game, only this time it peaks at around 9 rather than 20.

47 games saw zero single-color cards. Most of these involved drawing the Chocolate Truffle or Popsicle on the first turn and then getting two or three double-color cards until the end. A few got some double-candy cards, and then draw one of those candies and then continued to get doubles. Three games got some other candy card (or two) and otherwise “legitimately” made it across the board with just the double-color cards.

3.3 Double-color cards

Now, let’s focus on the double-candy cards.

See code

games_summary |> 
  count(n_doubles) |> 
  mutate(prop = n/sum(n)) |> 
  ggplot(aes(n_doubles, prop)) + 
  geom_col(fill = "#6e6cff", width = 1) + 
  scale_x_continuous(breaks = seq(0, 200, 5)) + 
  scale_y_continuous(expand = expansion(0, 0),
                     breaks = seq(0, 0.1, 0.02),
                     label = scales::percent) + 
  labs(title = "Number of double-color cards drawn in a game of Candyland",
       subtitle = "Based on 10,000 simulated games",
       x = "number of double-color cards drawn",
       y = "percent of games") + 
  theme_minimal(base_size = 14, base_family = "Avenir")

This distribution is starting to look familiar now. The most likely outcome is that you’ll see roughly 5–10 double-color cards per game. A few games saw far more, more than 20 or even 25, but the odds of that are quite low. Like the single-color cards, there were a few games that saw no double-color cards. Most of those got a candy pretty early on and then finished the game without seeing any singles.

3.4 Shortcuts

There are two shortcuts in the game: the peppermint pass, which is available right at the start of the game and takes you ahead about a third of the way, and the gummy pass, which is a short pass again towards the beginning of the game. How many shortcuts are typically seen in a game?

See code

games_summary |> 
  count(n_shortcuts) |> 
  mutate(prop = n/sum(n)) |> 
  ggplot(aes(n_shortcuts, prop)) + 
  geom_col(fill = "#6e6cff") + 
  scale_y_continuous(expand = expansion(0, 0),
                     labels = scales::percent) +
  labs(title = "Number of shortcuts in a solo game of Candyland",
       subtitle = "Based on 10,000 simulated games",
       x = "shortcuts",
       y = "percentage of games") + 
  theme_minimal(base_size = 14, base_family = "Avenir")

This figure shows that most of the time, about 73.4%, you’re not going to see a shortcut. About 25.2% of games saw one shortcut.

Interestingly, you can’t take both shortcuts without a candy card taking you back close to the beginning because the Peppermint Pass starts before and ends after the Gummy Pass. So, you need to get the Cupcake or Ice Cream Cone to bring you back. And even then, you’d only be able to take the Gummy Pass because once you’ve passed the beginning of the Peppermint Pass, there’s no chance of it again. Nevertheless, 1.35% of games took two shortcuts. In about 60.4% of those cases, the first shortcut was the Gummy Pass.

In two cases (just 0.02% of the time!), there were three shortcuts taken in a single game! In one case, they took the Gummy Pass three times. They got it their first time passing through, again after getting the Cupcake, and again after getting the Ice Cream Cone. In the other case, they took the Peppermint Pass the first time, and then again hit the Gummy Pass after getting the Cupcake and again after getting the Ice Cream Cone.

Three is the maximum unless you go through the deck a second time. But given how rare that happens, the odds of getting three shortcuts, going through the entire deck, and then getting either the Cupcake or Ice Cream Cone and getting the Gummy Pass yet again, is so slim. But not impossible!

Finally, what are the relative odds of the two shortcuts in relation to each other? Surprisingly, the Peppermint Pass is not as rare as I thought. Of all the shortcuts taken, it made up 41% of them. So, the Gummy Pass is about 1.4 times as likely. (I thought that number would be a lot higher!)

3.5 Candy Cards

Now let’s look at the number of candy cards that a person might encounter in a typical game of Candyland. Right now, we’re not so much concerned about which card but rather just the total number they see. This plot shows the distribution across the simulated games. I’ve added percentages at the top because we had some very small bars towards the right of the plot.

See code

games_summary |> 
  count(n_candies) |> 
  mutate(prop = n/sum(n)) |> 
  ggplot(aes(n_candies, prop)) + 
  geom_col(fill = "#6e6cff") + 
  geom_text(aes(label = scales::percent(prop)),
            nudge_y = 0.01) + 
  scale_x_continuous(breaks = seq(0, 200, 2),
                     minor_breaks = 1:100,
                     expand = expansion(0, 0.1)) + 
  scale_y_continuous(expand = expansion(0, 0.01),
                     labels = scales::percent) +
  labs(title = "Number of candy cards drawn in a game of Candyland",
       subtitle = "Based on 10,000 simulated games",
       x = "Number of candies") + 
  theme_minimal(base_size = 14, base_family = "Avenir")

So, it looks like in about 91–92% of games, you’ll draw at least one candy card. The most typical outcome was just one card, but drawing up to four was not too uncommon. The most I saw in my simulations was drawing 13 cards, but that happened exactly one time. Keep in mind that there are only seven candy cards in the deck, so drawing that many would involve getting through the entire deck without finishing, and then drawing six of the seven cards again before finishing. In fact just 0.1% of games draw eight or more, meaning they went through the entire deck.

But that’s if we’re considering all the candy cards collectively. Let’s take a look at the candy cards individually to see if they have their own patterns.

Note

This next bit of discussion gets into probability distributions and statistics a little bit.

This plot shows the percentage of times each candy card was drawn.

See code

candycard_distributions <- games |> 
  filter(card %in% candy_cards) |> 
  count(card) |> 
  mutate(prop = n/sum(n),
         card = factor(candy_cards, 
                       levels = c("cupcake", "ice cream cone",
                                  "gummy star", "gingerbread man",
                                  "lollipop", "popsicle",
                                  "chocolate truffle")))
ggplot(candycard_distributions, aes(card, prop)) + 
  geom_col(fill = "#6e6cff") + 
  scale_y_continuous(expand = expansion(0, 0),
                     limits = c(0, 0.1675),
                     labels = scales::percent) +
  geom_hline(yintercept = 1/7, linetype = "dashed", color = "gray40") + 
  labs(title = "Distribution of candy cards across 10,000 simulated games",
       x = "candy card",
       y = "perentage") + 
  theme_minimal(base_size = 14, base_family = "Avenir")

You would expect the candies to be pretty evenly distributed, and they are pretty close. However, there is more variation between the cards than I expected. I’ve overlayed a gray dashed line at 14.28% (one-seventh), showing what the expected height of each bar would be if they were all drawn evenly. Some of these are a bit higher and lower than I expected. The Cupcake, for example, was drawn 16.34% of the time—just about a full 1% more than expected—and the ice cream code and Lollipop were drawn 13.01% of the time, more than 1% lower than expected.

You might be thinking, “Who cares? 1% is basically nothing. It’s just random chance.” I’m not so sure. Here’s what the distribution would be if it were truly random, plotted on the same scale as the above plot. I’ve randomly chosen a number between one and seven 10,000 times and plot them:

See code

set.seed(250313)
tibble(num = sample(1:7, 10000, replace = TRUE)) |> 
  count(num) |> 
  mutate(prop = n/sum(n),
         diff = prop - 1/7) |> 
  arrange(prop) |> 
  ggplot(aes(as.factor(num), prop)) + 
  geom_col(fill = "#6e6cff") + 
  scale_y_continuous(expand = expansion(0, 0),
                     limits = c(0, 0.1675),
                     breaks = seq(0, 1, 0.04),
                     labels = scales::percent) +
  geom_hline(yintercept = 1/7, linetype = "dashed", color = "gray40") + 
  labs(title = "Distribution of \"Pick a number between one and seven\" 10,000 times",

       x = "number", 
       y = "percentage") + 
  theme_minimal(base_size = 14, base_family = "Avenir")

These numbers are all much closer to the predicted value of 14.28%. None are further than 0.6% away. You might be thinking, “Okay, so you got lucky one time. The deviance observed in the candy cards is well within the range of possible values.” But still, I don’t think it is! I did the above experiment—pick a number between one and seven 10,000 times and then get the proportions of each number—itself 10,000 times. So I have 100 million data points. Below, I’ve plotted the distribution of those proportions from the 10,0000 experiments (where each experiment had 10,000 random draws).

See code

set.seed(250313)
tenthousand_draws <- tibble(iteration = 1:10000) |> 
  rowwise() |> 
  mutate(data = list(tibble(num = sample(1:7, 10000, replace = TRUE)))) |> 
  unnest(data) |> 
  count(iteration, num) |> 
  mutate(prop = n/sum(n), .by = iteration)
  
ggplot(tenthousand_draws, aes(prop)) + 
  geom_histogram(binwidth = 0.001, fill = "#6e6cff") + 
  geom_vline(xintercept = 1/7) +
  scale_x_continuous(labels = scales::percent) + 
  facet_wrap(~num) + 
  labs(title = "What happens if you pick a number between one and seven 10,000 times, tally up how many\nof each number you get, and repeat that 10,000 times? This plot shows the distribution of\nthose 10,000 tallies.",
       subtitle = "They basically all look like the same bell curve.") +
  theme_bw(base_size = 14, base_family = "Avenir") + 
  theme(plot.title = element_text(size = 14),
        plot.subtitle = element_text(size = 12))

tenthousand_draws |> 
  summarize(mean = mean(prop),
            sd   = sd(prop),
            .by = num) |> 
  mutate(mean_mean = mean(mean),
         mean_sd   = mean(sd)) |> 
  mutate(mean_diff = mean - mean(mean),
         sd_diff   = sd - mean(sd)) |> 
  arrange(mean_diff)

# A tibble: 7 × 7
    num  mean      sd mean_mean mean_sd   mean_diff    sd_diff
  <int> <dbl>   <dbl>     <dbl>   <dbl>       <dbl>      <dbl>
1     4 0.143 0.00345     0.143 0.00349 -0.0000305  -0.0000343
2     7 0.143 0.00346     0.143 0.00349 -0.0000281  -0.0000244
3     6 0.143 0.00353     0.143 0.00349 -0.00000965  0.0000461
4     3 0.143 0.00345     0.143 0.00349 -0.00000472 -0.0000344
5     5 0.143 0.00350     0.143 0.00349  0.00000650  0.0000146
6     2 0.143 0.00354     0.143 0.00349  0.0000250   0.0000485
7     1 0.143 0.00347     0.143 0.00349  0.0000415  -0.0000160

Unsurprisingly, those seven distributions are basically identical. Their means are 0.14285 ± 0.000041 and their standard deviations are 0.003489 ± .0000343. I’ll therefore collapse the seven numbers together and treat them as a single distribution. That’s visualized below with the means of each of the candy cards.

See code

tenthousand_draws |> 
  mutate(prop_rounded = round(prop, 3)) |> 
  count(prop_rounded) |> 
  mutate(prop = n/sum(n)) |>
  ggplot(aes(prop_rounded, prop)) + 
  geom_col(fill = "#6e6cff") +
  geom_vline(xintercept = 1/7) +
  geom_vline(data = candycard_distributions,
             aes(xintercept = prop), color = "#ff7575") +
  ggrepel::geom_text_repel(data = candycard_distributions,
            aes(x = prop, label = card),
            seed = 250318,
            angle = 90, direction = "y") +
  scale_x_continuous(labels = scales::percent) +
  labs(title = "Distribution of seven candy cards compared to their expected distributions",
       subtitle = "Based on 10,000 simulated games",
       x = "expected proportion of draws",
       y = "probability of expected draws") +
  theme_minimal(base_size = 14, base_family = "Avenir")

One candy card (the Gummy Star) is well within the expected range. The Popsicle is on the edge of the distribution, but still a plausible value. But the Cupcake was so far away from the expected value. The z-score was 5.688 and the p-value is about 0.0000001. So the odds of getting these cards as infrequently as we did, assuming they’re evenly probable, is quite small. I have no explanation for why that is, but I think it’s an intriguing finding here. It’s also intriguing that the Ice Cream Cone and the Lollipop are consistently drawn at around the same rate, and far lower than expected.

Finally, when I did this whole thing again based on a different random seed, I got strikingly similar results. There’s something here and I’m not sure what it is!

3.6 Gains and Losses

One of the biggest questions I have as to do with the candy cards still. It’s often the case that when you get a candy card, it’s an exciting thing because you jump ahead a lot. But, it also seems like a lot of the time it’s a bad thing because you fall back quite a bit.³ So, is it overall better to get a candy card, or is it on average worse for your game?

³ Fortunately, my five-year-old doesn’t care either way and is just excited to get a candy card!

See code

diffs <- games |> 
  filter(card %in% candy_cards) |> 
  mutate(diff = end - start)

ggplot(diffs, aes(diff)) + 
  geom_histogram(binwidth = 1, fill = "#6e6cff") + 
  scale_x_continuous(breaks = seq(-200, 200, 20),
                     minor_breaks = seq(-200, 200, 10)) +
  scale_y_continuous(expand = expansion(0, 5)) + 
  labs(title = "How far ahead or behind does a candy card take you?",
       subtitle = "Based on 10,000 simulated games of Candyland",
       x = "tiles advanced") + 
  theme_minimal(base_size = 14, base_family = "Avenir")

Okay, so there’s a lot to unpack here. First off, we see that it’s centered right around zero with similar-looking distributions on either sides. The biggest jumps ahead were when a Chocolate Truffle was drawn right at the beginning of the game. The biggest fallbacks were when the Cupcake was drawn just before the end of the game. The dreaded scenario of drawing a Cupcake while sitting on the last tile happened 24 times!

The next thing I see in this plot is a few random spikes on the right side of the plot. Those are cases where you draw a candy right right at the start of the game.

Overall, the average number of tiles advanced across all these candy car draws was 1.26, with a median of 5. If we ignore the first turn of each game (since that seems to have caused those spikes), the average drops to -1.62 and the median is just 2 tiles. So, candy cards are sometimes good and sometimes bad, but overall it really is just a wash.

3.7 Most likely tiles

Finally, the last thing I’ll explore with these simulated games is the tiles on the board. Which ones are landed on the most? This figure below shows how many times each tile was landed on across the 10,000 games.

See code

games |> 
  filter(end != 133) |> 
  ggplot(aes(end)) + 
  geom_histogram(binwidth = 1, fill = "#6e6cff") + 
  scale_x_continuous(breaks = seq(0, 200, 10),
                     expand = expansion(0, 0)) +
  scale_y_continuous(expand = expansion(0, 5)) + 
  labs(title = "Most likely landed-on tiles in a game of Candyland",
       subtitle = "Based on 10,000 simulated games",
       x = "tile number",
       y = "times landed on") + 
  theme_minimal(base_size = 14, base_family = "Avenir")

The patterns here make a lot of sense. First, there are 11 tiles that are about twice as high. Seven of those are candy tiles and two of them are the ends of the shortcuts. It makes sense why the ends of the shortcuts are twice as high because you’ve got twice the odds of landing on them: once from taking the shortcut and once by approaching the end point the long way. The candy tiles are twice as high because they’re about as likely as taking the shortcuts.

You’ll also notice two tiles that were never landed on. Those are the beginning of the shortcuts. So, technically you do land on them, but you don’t end your turn on them.

The other two tiles that are landed on the most are two yellow tiles right at the end of the game. Why would those be landed on more than other tiles? As it turns out, I think it’s an error in how the game is laid out. Normally, when a candy tile is placed, it interrupts the sequence of colors across the board. However, in the case of the Chocolate Truffle, it actually takes the place of a yellow tile. Normally, if you’re within six tiles of a yellow tile (or any color), and draw a single color, you’ll land on that. But because that yellow was skipped, now if you’re within 12 tiles of that penultimate yellow, you’ll land there. And if you’re further back and get a double yellow, you’ll land there. So, the odds are just twice as greater because a yellow tile was skipped. That very last spike towards the end is the subsequent yellow, which you’d get if you drew a double yellow within five tiles of the Chocolate Truffle (on either side).

See the board again if you’d like

The other kind of intriguing pattern is that the number of times a tile is landed on decreases the further away one gets from a candy and shortcut tile. Since those tiles are the most likely, it makes sense then that the tiles immediately following them are more common. The further away you get from them though, the odds go up that you’ll see another candy card or something, pulling you away from that part of the board. The gap between tile 20 (Ice Cream Cone) and 41 (the end of the Gummy Pass followed by the Gummy Star) and the gap between tile 69 (Gingerbread Man) and 92 (Lollipop) show this especially well.

3.8 Summary

So, so far we’ve seen a lot of patterns in Candyland based on 10,000 simulated solo games. Most of the results are not too surprising, but it’s nice to put some numbers to things, like the number turns it takes, how many candy cards you’re likely to draw, whether candy cards are ultimately good or bad for your game, and how much more common one shortcut is compared to the other.

4 Changes

Since we’ve got this all simulated, I can actually pretty easily change a few things about the game to see what kind of effect it might have on the results. I did this last time with my simulation of Chutes and Ladders and found that if you remove the longest chute, sure enough the average game length was shorter. What kinds of changes can we make to Candyland?

4.1 Triple cards

I think it would be fun to add one triple-color card for each color in the deck. It’d be a rare but really cool thing to encounter. Let’s add those six cards to the deck and see what happens to the stats.

Click here to see the details and code

First, I’ll need to modify my create_one_color_card function to create a triple.

create_one_color_card <- function(.color) {
  c(rep(paste0("double ", .color), 4), 
    rep(paste0("single ", .color), 6),
    rep(paste0("triple ",  .color), 1))
}
cards_with_triples <- c(create_one_color_card("red"),
           create_one_color_card("purple"),
           create_one_color_card("yellow"),
           create_one_color_card("blue"),
           create_one_color_card("orange"),
           create_one_color_card("green"),
           candy_cards)

This will increase the overall size of the deck, which might have some consequences by itself, but I don’t think it’ll matter too much.

I now have to modify the code I use for the simulation to account for triple cards

simulate_game_with_triples <- function(game_num = 0) {
  
  # Declare space for the full game.
  n_cards <- length(cards_with_triples)
  turns <- tibble(turn_num = 1:(n_cards*2),
         start    = NA,
         card  = c(sample(cards_with_triples, replace = FALSE), 
                   sample(cards_with_triples, replace = FALSE)),
         shortcut = NA,
         end      = NA)
  
  # Loop until the game is over
  i <- 1
  keep_playing <- TRUE
  while(keep_playing) {

    # Step 1: Start at zero
    if (i == 1) {
      turns$start[[i]] <- 0

    # Otherwise, start where the last turn ended.
    } else {
      turns$start[[i]] <- turns$end[[i - 1]]
    }

    # Step 2: This is where the game actually happens.
    # If it's a candy card, go straight there.
    if (turns$card[[i]] %in% candy_cards) {
      turns$end[[i]] <- board[board$special == turns$card[[i]],]$tile
      
    # If it's not, find the next colors.
    } else {
      card_color  <- str_extract(turns$card[[i]], "\\w+\\Z")
      card_amount <- str_extract(turns$card[[i]], "\\A\\w+")

      # move to the next spot
      eligible_spots <- board |>
        filter(tile > turns$start[[i]],
               color == card_color) |>
        pull(tile)
      # find the number of eligible spots
      n_eligible_spots <- length(eligible_spots)
      
      # regular single card
      if (n_eligible_spots >= 1 & card_amount == "single") {
        turns$end[[i]] <- eligible_spots[[1]]
      # regular double card
      } else if (n_eligible_spots >= 2 & card_amount == "double") {
        turns$end[[i]] <- eligible_spots[[2]]
      # triple cards
      } else if (n_eligible_spots >= 3 & card_amount == "triple") {
        turns$end[[i]] <- eligible_spots[[3]]
      # no more eligible spots
      } else {
        turns$end[[i]] <- 133
      }
    }
    
    # Do the shortcuts.
    if (turns$end[[i]] == 4) {
      turns$end[[i]] <- 60
      turns$shortcut[[i]] <- "peppermint pass"
    } else if (turns$end[[i]] == 29) {
      turns$end[[i]] <- 41
      turns$shortcut[[i]] <- "gummy pass"
    }
  
    # Step 4: Check if it's game over.
    # run out of cards
    if (i >= c(n_cards*2)) {
      keep_playing <- FALSE
    # win
    } else if (turns$end[[i]] >= max(board$tile)) {
      keep_playing <- FALSE
    } else {
      i <- i + 1
    }
  }

  turns %>%
    filter(turn_num <= i) %>%
    return()
}

Now, let’s actually run the 10,000 simulated games.

set.seed(250320)
games_triple <- tibble(game_num = 1:10000) |>
  mutate(game = map(game_num, simulate_game_with_triples)) |>
  unnest(cols = c(game))
games_triple_summary <- games_triple %>%
  summarize(turns = max(turn_num),
            n_candies = sum(card %in% candy_cards),
            n_singles = sum(str_detect(card, "single")),
            n_doubles = sum(str_detect(card, "double")),
            n_triples = sum(str_detect(card, "triple")),
            n_shortcuts = sum(!is.na(shortcut)),
            .by = game_num) |> 
  print()

# A tibble: 10,000 × 7
   game_num turns n_candies n_singles n_doubles n_triples n_shortcuts
      <int> <int>     <int>     <int>     <int>     <int>       <int>
 1        1    30         1        17         9         3           1
 2        2    16         3         6         4         3           0
 3        3     7         1         2         4         0           0
 4        4    24         1        12        10         1           0
 5        5    16         1         6         7         2           0
 6        6    18         0         8         8         2           0
 7        7    25         1        13        11         0           0
 8        8    13         2         7         3         1           0
 9        9    22         1        10         9         2           0
10       10    18         2        10         4         2           0
# ℹ 9,990 more rows

Now we can take a look at the results!

First, we’ll see how many triple cards were drawn in a typical game.

See code

games_triple_summary |> 
  count(n_triples) |> 
  mutate(prop = n/sum(n)) |> 
  ggplot(aes(n_triples, prop)) + 
  geom_col(fill = "#6e6cff") + 
  scale_y_continuous(expand = expansion(0, 0),
                     label = scales::percent) +
  labs(title = "Number of triple-color cards drawn in a game of Candyland",
       subtitle = "Based on 10,000 simulated games with triple-color cards added",
       x = "number of triple cards drawn",
       y = "percent of games") + 
  theme_minimal(base_size = 14, base_family = "Avenir")

There are only six triples in the deck, so it’s unsurprising that there are relatively few drawn in a game. But most games encountered at least one, just like the candy cards. Let’s see what overall effect this hand on the number of turns in the game.

See code

bind_rows(`normal` = games_summary, `with triples added` = games_triple_summary, .id = "game_type") |> 
  ggplot(aes(turns, color = game_type, fill = game_type)) + 
  geom_density(alpha = 0.45) + 
  scale_x_continuous(breaks = seq(0, 200, 10)) + 
  scale_y_continuous(expand = expansion(0, 0),
                     labels = scales::percent) + 
  ggthemes::scale_color_ptol() + 
  ggthemes::scale_fill_ptol() + 
  labs(title = "Number of turns to finish a solo game of Candyland",
       subtitle = "Based on 10,000 simulated games",
       y = "percent of games",
       color = NULL,
       fill = NULL) + 
  theme_minimal(base_size = 14, base_family = "Avenir") + 
  theme(legend.position = "bottom")

So, unsurprisingly, the overall number of turns it takes to complete a game of Candyland goes down if you add some triple-color cards. Interestingly, the most likely outcome is still to finish in about 12–20 turns. But, you’re a little more likely to finish in that range, you’re more likely to finish in fewer turns than that, and less likely to finish in more terns.

So, if the creators wanted to change the game a little bit, they could tweak the deck and run some simulations. If they feel the game is too long, add some more doubles or triples. If it’s too short, add more singles.

4.2 Removing Candies

We’ve established in Section 3.6 that the candy cards ultimately don’t benefit the player if the only objective is to finish the game. They are, of course, the most fun part of the game. (My family’s Chocolate Truffle card is starting to fall apart because my kids like to hold it so much!) But, what would happen if we took those cards out entirely? How would that affect the length of the game?

Click here to see the details and code

First, I’ll need to modify my deck of cards to not have those candy cards.

create_one_color_card <- function(.color) {
  c(rep(paste0("double ", .color), 4), 
    rep(paste0("single ", .color), 6))
}
cards_without_candies <- c(create_one_color_card("red"),
           create_one_color_card("purple"),
           create_one_color_card("yellow"),
           create_one_color_card("blue"),
           create_one_color_card("orange"),
           create_one_color_card("green"))

This will decrease the overall size of the deck, so I’ll allocate room for going through the deck a third time. I’ll now modify the simulation.

simulate_game_without_candies <- function(game_num = 0) {
  
  # Declare space for the full game.
  n_cards <- length(cards_without_candies)
  turns <- tibble(turn_num = 1:(n_cards*3),
         start    = NA,
         card  = c(sample(cards_without_candies, replace = FALSE), 
                   sample(cards_without_candies, replace = FALSE), 
                   sample(cards_without_candies, replace = FALSE)),
         shortcut = NA,
         end      = NA)
  
  # Loop until the game is over
  i <- 1
  keep_playing <- TRUE
  while(keep_playing) {

    # Step 1: Start at zero
    if (i == 1) {
      turns$start[[i]] <- 0

    # Otherwise, start where the last turn ended.
    } else {
      turns$start[[i]] <- turns$end[[i - 1]]
    }

    # Step 2: This is where the game actually happens.
    card_color  <- str_extract(turns$card[[i]], "\\w+\\Z")
    card_amount <- str_extract(turns$card[[i]], "\\A\\w+")
    
    # move to the next spot
    eligible_spots <- board |>
      filter(tile > turns$start[[i]],
             color == card_color) |>
      pull(tile)
    # find the number of eligible spots
    n_eligible_spots <- length(eligible_spots)
    
    # regular single card
    if (n_eligible_spots >= 1 & card_amount == "single") {
      turns$end[[i]] <- eligible_spots[[1]]
    # regular double card
    } else if (n_eligible_spots >= 2 & card_amount == "double") {
      turns$end[[i]] <- eligible_spots[[2]]
    # no more eligible spots
    } else {
      turns$end[[i]] <- 133
    }

    
    # Do the shortcuts.
    if (turns$end[[i]] == 4) {
      turns$end[[i]] <- 60
      turns$shortcut[[i]] <- "peppermint pass"
    } else if (turns$end[[i]] == 29) {
      turns$end[[i]] <- 41
      turns$shortcut[[i]] <- "gummy pass"
    }
  
    # Step 4: Check if it's game over.
    # run out of cards
    if (i >= c(n_cards*2)) {
      keep_playing <- FALSE
    # win
    } else if (turns$end[[i]] >= max(board$tile)) {
      keep_playing <- FALSE
    } else {
      i <- i + 1
    }
  }

  turns %>%
    filter(turn_num <= i) %>%
    return()
}

Now, let’s actually run the 10,000 simulated games.

set.seed(250320)
games_without_candies <- tibble(game_num = 1:10000) |>
  mutate(game = map(game_num, simulate_game_without_candies)) |>
  unnest(cols = c(game))
games_without_candies_summary <- games_without_candies %>%
  summarize(turns = max(turn_num),
            n_singles = sum(str_detect(card, "single")),
            n_doubles = sum(str_detect(card, "double")),
            n_shortcuts = sum(!is.na(shortcut)),
            .by = game_num) |> 
  print()

# A tibble: 10,000 × 5
   game_num turns n_singles n_doubles n_shortcuts
      <int> <int>     <int>     <int>       <int>
 1        1    23        13        10           0
 2        2    24        18         6           1
 3        3    24        15         9           0
 4        4    26        17         9           0
 5        5    25        17         8           0
 6        6    13         8         5           1
 7        7    28        18        10           0
 8        8    22        15         7           1
 9        9    18         7        11           0
10       10    10         4         6           1
# ℹ 9,990 more rows

Now let’s take a look at the results.

In this plot, I’ve shown the distribution of the number of turns it takes to finish a game of Candyland. In blue is a normal game, while in red is the game if we took all the candy cards out.

See code

bind_rows(`normal` = games_summary, `without candies` = games_without_candies_summary, .id = "game_type") |> 
  ggplot(aes(turns, color = game_type, fill = game_type)) + 
  geom_density(alpha = 0.45, adjust = 2) + 
  scale_x_continuous(breaks = seq(0, 200, 10)) + 
  scale_y_continuous(expand = expansion(0, 0),
                     labels = scales::percent) + 
  ggthemes::scale_color_ptol() + 
  ggthemes::scale_fill_ptol() + 
  labs(title = "Number of turns to finish a solo game of Candyland",
       subtitle = "Based on 10,000 simulated games",
       y = "percent of games",
       color = NULL,
       fill = NULL) + 
  theme_minimal(base_size = 14, base_family = "Avenir") + 
  theme(legend.position = "bottom")

These results surprised me! I thought that because candy cards, on average, don’t help the player advance that much, that the number of turns required to finish a game would be basically the same. I was prepared to make the conclusion that while they don’t help advance the game, they certainly make it more fun. I was wrong! Well, kinda. They’re still fun regardless, but taking out the candy cards has several effects.

First, the minimum number of turns it takes to finish the game goes up. The shortest game took just eight turns. In all four that this happened, the simulation got six double-color cards and the peppermint pass. This makes sense because you have to work your way across the entire board “the long way.”

Second, the longest game is considerably shorter. This also makes sense because every turn brings you closer to the end and there’s nothing pulling you back. You might think the longest game in theory would be 133 turns because that’s how many tiles there are, but you’d have to start drawing double cards at some point, so it’d have to be shorter. In these 10,000 simulations, the longest game took 32 turns. There’s nothing special about it other than they just got crummy luck. Many of their single cards advanced them just two or three tiles, and several of their doubles advanced them just seven or eight tiles.

The combined effects of lengthening the shortest game and shortening the longest game means that the variability in number of turns needed to finish the game goes down. This is visually apparent in the plot: there’s a tighter cluster centered around 21–22 games, rather than the much more spread out distribution in a regular game.

The final effect that I see is a greater exaggeration of the bimodal distribution. In Section 3.1, I pointed out there was a spike around 13–14 turns and another around 20. We see those same numbers, maybe shifted slightly, but in a much more stark way. It’s really hard to pick out what’s going on here, but I think the obvious answer is that the games that took a shortcut took fewer turns. I mean, of course taking a shortcut will make the whole game shorter, but is the effect as strong as it seems?

It seems so. I classified the games based on whether they finished in 15 turns or less and whether they took the Peppermint Pass (the really good shortcut at the beginning of the game). Sure enough, 95% of games that took that shortcut also finished in 15 turns or less, and 98% of games that did not take the shortcut finished the game in more than 15 turns.⁴ When I plot the data split up by whether the peppermint pass was taken, the pattern is crystal clear.

⁴ I ran a chi-squared on this data and it supports the idea of a relationship between these two categories (\(\chi^2\) = 159,941.1, df = 1, p < 0.0001). I’m not really sure if chi-squared tests work with such large numbers, but the results are so clear in just the summary that stats aren’t needed to show the pattern.

See code

games_without_candies |> 
  summarize(turns = n(),
            took_peppermint = if_else(sum(shortcut == "peppermint pass", na.rm = TRUE) > 0,
                                      "took the peppermint pass",
                                      "did not take the peppermint pass"),
            .by = game_num) |>
  ggplot(aes(turns, color = took_peppermint, fill = took_peppermint)) + 
  geom_density(alpha = 0.45, adjust = 2) + 
  scale_x_continuous(breaks = seq(0, 200, 2)) +
  scale_y_continuous(expand = expansion(0, c(0, 0.02)),
                     labels = scales::percent) +
  ggthemes::scale_color_ptol(breaks = c("took the peppermint pass", "did not take the peppermint pass")) +
  ggthemes::scale_fill_ptol(breaks = c("took the peppermint pass", "did not take the peppermint pass")) +
  labs(title = "Number of turns to finish a solo game of Candyland without candy cards",
       subtitle = "Based on 10,000 simulated games",
       y = "percent of games",
       color = NULL,
       fill = NULL) +
  theme_minimal(base_size = 14, base_family = "Avenir") + 
  theme(legend.position = "bottom")

So, going back to the original data plotted in Section 3.1, I’ll bet that’s what’s going on. It’s not as clear of a pattern in a real game because it mostly gets washed out from the candies, but I’ll bet that’s what’s going on. Let me plot that original data again just to make sure.

See code

games |> 
  summarize(turns = n(),
            took_peppermint = if_else(sum(shortcut == "peppermint pass", na.rm = TRUE) > 0,
                                      "took the peppermint pass",
                                      "did not take the peppermint pass"),
            .by = game_num) |>
  ggplot(aes(turns, color = took_peppermint, fill = took_peppermint)) + 
  geom_density(alpha = 0.45, adjust = 2) + 
  scale_x_continuous(breaks = seq(0, 200, 10)) + 
  scale_y_continuous(expand = expansion(0, c(0, 0.02)),
                     labels = scales::percent) +
  ggthemes::scale_color_ptol(breaks = c("took the peppermint pass", "did not take the peppermint pass")) +
  ggthemes::scale_fill_ptol(breaks = c("took the peppermint pass", "did not take the peppermint pass")) +
  labs(title = "Number of turns to finish a solo game of Candyland",
       subtitle = "Based on 10,000 simulated games",
       y = "percent of games",
       color = NULL,
       fill = NULL) +
  theme_minimal(base_size = 14, base_family = "Avenir") + 
  theme(legend.position = "bottom")

Yep, there it is. So, that bimodal distribution is because of the peppermint pass. Pretty cool.

5 Conclusion

Okay, so I had some fun with this. Simulating Candyland is pretty straightforward, and once you’ve got it going, it’s easy to query and see patterns. And it’s easy to make some adjustments and see what kind of effect it has on the gameplay.

Here’s a summary of the main findings:

The typical number of turns to finish a game is 10–25, more on the lower end if you take the Peppermint Pass and more on the upper end if you don’t.
It’s typical to see 5–15 single-color cards and 3-10 double cards.
Only about a third of games take one of the shortcuts, and the Gummy Pass is 1.4 times as likely as the Peppermint Pass.
You’ll see at least one candy card in 90% of games. For some inexplicable reason, the Cupcake and Chocolate Truffle are far more common than due to random chance alone, and the Lollipop, Ice Cream Cone, and Gingerbread Man are far less common.
The candy cards are, overall, neither an advantage nor a disadvantage. If you remove candy cards drawn on the first turn, the average number of tiles advanced because of a candy card is basically zero.