Visualizing Jonathan Dowse’s Vowels

Phonetics
Side Projects
Author

Joey Stanley

Published

December 10, 2023

I’ve seen interactive IPA charts where a single person produces all the sounds. But a couple years ago, I was teaching a Phonetics & Phonology course, and I stumbled upon Jonathan Dowse’s IPA extended chart with audio. In this post, I take his vowels and map their formants.

Note that I tweeted about this on February 7, 2022. I don’t know what the future of X holds, so I thought I’d make this a more permanent home for this plot.

Dowse’s IPA charts

Dowse’s IPA chart is the most detailed one I’ve ever seen. The consonants include many more places of articulation not normally found on the IPA chart, like linguolabial, alveolo-palatal, rounded velar, low uvular, and aryepiglottal. For each one, he uses all the manners that are physically possible, including ones like aspirated stop, affricate, taps, and trills. And for each manner and place, he has voiced and voiceless variants. Not only is the chart itself interesting to look through with all the diacritics and stuff, but he’s got recordings of each consonant in different contexts: [C], [Ca], [aC], and [aCa]. Pretty cool. Further down, he has a whole nother table with “rarer” manners, including lateral fricatives, lateral flaps, fricative trills, implosive, and different kinds of ejectives. It’s pretty interesting to listen to them. He also has a whole table of clicks at eight places of articulation, six manners, and velar and uvular variants of each.

Look, I don’t know enough about phonetics to say whether these are all accurately produced, but it’s impressive that Dowse, who does not appear to have much formal training in linguistics, can produce all these sounds.

Today’s post is not about the consonants though; it’s about the vowels. His vowel chart is equally extensive. He contrasts five front-to-back distinctions and seven height distinctions, with rounded and unrounded versions of each one. He also has an entirely separate chart showing nasalized versions of all of these.

Jonathan Dowse’s vowel chart

I do know enough about phonetics to be able to look at these vowels. I was curious about how these 70 vowel qualities mapped to the acoustic space. I wanted to see whether these distinctions were all equidistant and whether their distribution in the acoustic space matched this rectangular tabular layout in the chart.

Data Processing

The first step was to download the audio, which I did by just clicking on each one and downloading them one at a time. I then processed them using FastTrack. This produces in a spreadsheet for each vowel produced, with measurements and bandwidths for the first three formants (plus some other measurements) every few milliseconds. Here’s an example from the high front vowel.

library(tidyverse)
library(santoku)

high_front <- read_csv("./csvs/high_front.csv")
knitr::kable(head(high_front))
time f1 b1 f2 b2 f3 b3 f1p f2p f3p f0 intensity
0.025211 243.0 35.3 2289.2 168.1 3353.6 229.5 219.1 2264.4 3322.6 0 53.7
0.027211 245.9 34.9 2201.6 322.6 3356.8 380.5 219.1 2264.9 3322.4 0 51.5
0.029211 247.0 44.1 1989.7 553.4 3282.2 640.5 219.1 2265.5 3322.1 0 49.4
0.031211 235.4 72.1 1889.2 597.8 3150.0 773.0 219.1 2266.5 3321.5 0 48.0
0.033211 208.8 99.9 2035.3 678.2 3253.7 883.0 219.1 2267.7 3320.9 0 49.0
0.035211 191.6 96.2 2208.9 525.9 3338.3 574.3 219.2 2269.2 3320.1 0 50.1

This is more information than I need, especially since Dowse tries to say the vowels as monophthongally as he can. But we can take a look at the trajectories in just a sec. 

So, I want to plot the midpoints of all vowels at once. Since each is stored in a separate spreadsheet, I’ll use Sys.glob to get the paths to all those spreadsheets and map the read_csv function onto all of those paths. Since the vowel quality is stored in the filename itself (i.e., “high_front.csv”), I’ll strip away the path and the extension to leave just that filename and use it as the name of the vowel itself. Finally, I’ll take all those spreadsheets and combine them into one big one with bind_rows and unnest.

vowels_raw <- tibble(vowel = Sys.glob("./csvs/*.csv")) %>%
  mutate(data = map(vowel, read_csv, show_col_types = FALSE),
         vowel = str_remove_all(vowel, "./csvs/"),
         vowel = str_remove_all(vowel, ".csv")) %>%
  bind_rows() %>%
  unnest(data) %>%
  print()
# A tibble: 14,772 × 13
   vowel        time    f1    b1    f2    b2    f3    b3   f1p   f2p   f3p    f0
   <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 high_back… 0.0255  313.  130.  538.  407. 2325.  286.  278   527. 2422.     0
 2 high_back… 0.0275  325.  227.  492   390. 2321   307.  278   527. 2422.     0
 3 high_back… 0.0295  275.  384.  485.  260. 2318.  343.  278   527. 2423.     0
 4 high_back… 0.0315  228.  377   506.  223. 2303.  389.  278   527  2423.     0
 5 high_back… 0.0335  236   289.  517.  219. 2302.  455.  278   527. 2424.     0
 6 high_back… 0.0355  255.  191.  529.  197. 2345.  574   278   528. 2425.     0
 7 high_back… 0.0375  264.  134.  537   164. 2379.  726.  278.  528  2426.     0
 8 high_back… 0.0395  270.  113.  540.  153. 2405.  781.  278.  528. 2427.     0
 9 high_back… 0.0415  275   116.  541.  165. 2468.  687.  278.  529. 2429.     0
10 high_back… 0.0435  279   129   539.  178. 2494.  537.  278.  529. 2430.     0
# ℹ 14,762 more rows
# ℹ 1 more variable: intensity <dbl>

This results in a spreadsheet with 14,772 rows, each representing a set of formant measurements at a particular point in time across all 70 recordings. Kind of a lot of data, considering it’s only 70 vowels, but that’s the kind of resolution FastTrack can give you.

Okay, so for the purposes of the plot, I need to create a spreadsheet that is just the metadata about the vowel itself. In Step 1, I take that monster dataframe and just keep the name of the vowel (i.e. “high-mid_back-unrounded”) and only keep unique values. I then split that name up into its three parts (height, backness, and rounded) using separate. In Step 2, I then modify each one of those a little bit. I turn rounded into a boolean instead of a string. Then I turn height and backness into factors and set what order they should be in. Finally, I create a marked column to indicate whether a front vowel is rounded or a back vowel is unrounded—I felt like this might be handy to create a visual of the unmarked vowels. For Step 3, I add the IPA symbols to it. There’s no shortcut: I just had to put the symbols in one at a time based on what the chart showed.

vowels_meta <- vowels_raw %>%
  
  # Step 1: split the vowel name up
  distinct(vowel) %>%
  separate(vowel, c("height", "backness", "rounded"), sep = "_", fill = "right", remove = FALSE) %>%
  
  # Step 2: modify those attributes
  mutate(rounded  = case_when(rounded == "rounded" ~ TRUE,
                             is.na(rounded) ~ FALSE),
         height   = factor(height,   levels = c("high", "near-high", "high-mid", "mid", "low-mid", "near-low", "low")),
         backness = factor(backness, levels = c("front", "near-front", "central", "near-back", "back")),
         marked = case_when(backness %in% c("near-back", "back") & !rounded ~ TRUE,
                            backness %in% c("front", "near-front", "central") & rounded ~  TRUE,
                            TRUE ~ FALSE)) %>%
  
  # Step 3: add IPA
  arrange(height, backness, rounded) %>%
    mutate(ipa = c("i", "y", "ï", "ÿ", "ɨ", "u̶", "ɯ̈", "ü", "ɯ", "u",
                   "i̞", "y̙̞", "ɪ", "ʏ", "ɪ̈", "ʊ̈", "ɯ̽", "ʊ", "ɯ̞", "u̞",
                   "e", "ø", "ë", "ø̈", "ɘ", "ɵ", "ɤ̈", "ö", "ɤ", "o",
                   "e̞", "ø̞", "ë̞", "ø̞̈", "ə", "ɵ̘", "ɤ̞̈", "ö̞", "ɤ̞", "o̞",
                   "ɛ", "œ", "ɛ̈", "œ̈", "ɜ", "ɞ", "ʌ̈", "ɔ̈", "ʌ", "ɔ",
                   "æ", "œ̞", "æ̈", "ɶ̽", "ɐ", "ɞ̞", "ɑ̽", "ɒ̽", "ʌ̞", "ɔ̞",
                   "a", "ɶ", "ä", "ɶ̈", "ɐ̞", "ɐ̞̹", "ɑ̈", "ɒ̈", "ɑ", "ɒ")) %>%

  print()
# A tibble: 70 × 6
   vowel                   height backness   rounded marked ipa  
   <chr>                   <fct>  <fct>      <lgl>   <lgl>  <chr>
 1 high_front              high   front      FALSE   FALSE  i    
 2 high_front_rounded      high   front      TRUE    TRUE   y    
 3 high_near-front         high   near-front FALSE   FALSE  ï    
 4 high_near-front_rounded high   near-front TRUE    TRUE   ÿ    
 5 high_central            high   central    FALSE   FALSE  ɨ    
 6 high_central_rounded    high   central    TRUE    TRUE   u̶    
 7 high_near-back          high   near-back  FALSE   TRUE   ɯ̈    
 8 high_near-back_rounded  high   near-back  TRUE    FALSE  ü    
 9 high_back               high   back       FALSE   TRUE   ɯ    
10 high_back_rounded       high   back       TRUE    FALSE  u    
# ℹ 60 more rows

Okay, so now I have a dataframe that has the name of the vowel from the filename, and then some metadata about that vowel.

Now let’s go back and process that acoustic data. I start by taking the raw data and just keeping the formants. Having like 170 timepoints for each vowel is fine, but when visualizing such data, the odds of getting a wonky one are higher and it’ll ruin the whole plot. Here’s [i]. You can see that most of the formants are pretty stable. But towards the beginning and end, things get weird and they distract from the good data.

ggplot(high_front, aes(f2, f1, color = time)) + 
  geom_path() + 
  scale_x_reverse() + 
  scale_y_reverse() +
  theme_minimal()

What I’ll do then is take all this high resolution temporal data and make it lower resolution. What I found works is to bin the times into about 10 bins and then take the median within each one. I’ll do this by first normalizing the time so that the onset starts at t = 0. That new version of time is now time_diff. I’ll then normalize the time by converting it into percent duration, so that the onset is at 0 and the offset is at 1, with the midpoint at 0.5. That is now in percent. This makes it easier to then slice the data into 10 parts using the santoku::kiru function. For each of those 10 chunks, I can then get the median F1, F2, and F3 measurements.

high_front_summarized <- high_front %>%
  mutate(time_diff = time - min(time),
         percent = time_diff / max(time_diff),
         time_cut = kiru(percent,
                         breaks = seq(0, 1, 0.1),
                         labels = 1:10)) %>%
  summarize(across(c(f1, f2, f3), median), .by = c(time_cut)) %>%
  print()
# A tibble: 10 × 4
   time_cut    f1    f2    f3
   <fct>    <dbl> <dbl> <dbl>
 1 1         231. 2343. 3324.
 2 2         225. 2383. 3275.
 3 3         216. 2371. 3239.
 4 4         214. 2385  3227.
 5 5         216. 2389. 3230.
 6 6         216. 2371  3204.
 7 7         217. 2376. 3215.
 8 8         217. 2371. 3218.
 9 9         220. 2364  3241.
10 10        222. 2344. 3191.

I can plot this new lower-resolution version of the data, and you can see it’s much tighter because the extreme outliers were lost. I’ll add the original data in gray to provide some context.

ggplot(high_front_summarized, aes(f2, f1)) + 
  geom_path(data = high_front, color = "gray80") +
  geom_path() + 
  scale_x_reverse() + 
  scale_y_reverse() +
  theme_minimal()

Okay great. So, that worked for one vowel. Let’s do that for all vowels. The code is the same, except I’m grouping things by vowel. This results in trajs dataframe (for “trajectories”). We’ll look at that in just a second.

trajs <- vowels_raw %>%
  select(vowel, time, f1, f2, f3) %>%
  mutate(time_diff = time - min(time),
         percent = time_diff / max(time_diff),
         time_cut = kiru(percent,
                         breaks = seq(0, 1, 0.1),
                         labels = 1:10),
         .by = vowel) %>%
  summarize(across(c(f1, f2, f3), median), .by = c(vowel, time_cut)) %>%
  print()
# A tibble: 700 × 5
   vowel             time_cut    f1    f2    f3
   <chr>             <fct>    <dbl> <dbl> <dbl>
 1 high_back_rounded 1         280   542. 2460.
 2 high_back_rounded 2         277.  532. 2484.
 3 high_back_rounded 3         264.  546. 2472.
 4 high_back_rounded 4         254.  545. 2402.
 5 high_back_rounded 5         252.  546. 2371.
 6 high_back_rounded 6         253.  560. 2434.
 7 high_back_rounded 7         254.  571. 2444.
 8 high_back_rounded 8         250   583  2489.
 9 high_back_rounded 9         257.  595. 2323 
10 high_back_rounded 10        270.  699. 2347.
# ℹ 690 more rows

For now, let’s look at the midpoints. You may have noticed in the high front plot above that the middle 50% or so of the vowel was indeed quite monophthongal with very little formant change. I’ll assume that’s the case for all the vowels. So I’ll take the middle few bins and take the median measurement for each one. That’ll give me a new midpoints dataset.

midpoints <- trajs %>%
  
  # Get the middle few bins and find the median
  filter(time_cut %in% 3:7) %>%
  summarize(across(c(f1, f2, f3), median), .by = vowel) %>%
  
  # Add the vowel metadata back in.
  left_join(vowels_meta, by = "vowel") %>%
  print()
# A tibble: 70 × 9
   vowel                     f1    f2    f3 height backness rounded marked ipa  
   <chr>                  <dbl> <dbl> <dbl> <fct>  <fct>    <lgl>   <lgl>  <chr>
 1 high_back_rounded       254.  546. 2434. high   back     TRUE    FALSE  u    
 2 high_back               288. 1137. 2467. high   back     FALSE   TRUE   ɯ    
 3 high_central_rounded    269. 1642. 2128. high   central  TRUE    TRUE   u̶    
 4 high_central            312. 1754. 2229. high   central  FALSE   FALSE  ɨ    
 5 high_front_rounded      215. 2232. 2865. high   front    TRUE    TRUE   y    
 6 high_front              216. 2376. 3227. high   front    FALSE   FALSE  i    
 7 high_near-back_rounded  272.  784. 2233. high   near-ba… TRUE    FALSE  ü    
 8 high_near-back          283  1342. 2234. high   near-ba… FALSE   TRUE   ɯ̈    
 9 high_near-front_round…  249  1909. 2153. high   near-fr… TRUE    TRUE   ÿ    
10 high_near-front         276. 2136. 2460. high   near-fr… FALSE   FALSE  ï    
# ℹ 60 more rows

Okay, we now have a spreadsheet with reasonably good midpoint measurements for each vowel.

Plotting midpoints

It’s now time to plot it! Here’s just a raw look at the data.

ggplot(midpoints, aes(f2, f1)) + 
    geom_text(aes(label = ipa), size = 5) +
    scale_x_reverse() + 
    scale_y_reverse() + 
    ggthemes::scale_color_ptol() + 
    labs(x = "F2", y = "F1") + 
    theme_minimal()

Okay, so interesting already because we can see that the overall shape is a trapezoid still and not a square. We can see that the lower back portion of the vowel space is a bit denser than, say, the high front. And there’s a bit of a gap in the mid-to-high central portion.

Let’s zhuzh this plot up a bit. I’ll color the vowels by height. Within each height, I’ll connect rounded vowels with a dotted line and unrounded vowels with a solid line. To do that, I’ll create a new column that has a unique value for vowel height before I pass it into ggplot. I’ll color rounded vowels in gray. Finally, I’ll add some annotations.

midpoints %>%
    unite(line_id, height, rounded, remove = FALSE) %>%
    ggplot(aes(f2, f1, color = height, shape = backness)) + 
    geom_line(aes(group = line_id, linetype = rounded)) + 
    geom_label(aes(label = ipa, fill = rounded), size = 5) +
    scale_fill_manual(values = c("white", "gray90")) +
    scale_x_reverse() + 
    scale_y_reverse() + 
    ggthemes::scale_color_ptol() + 
    labs(title = "Acoustic measurements from jbdowse.com/ipa/",
         x = "F2", y = "F1",
         caption = 'Along the front-to-back dimension, the vowels are "front", "near-front", "central", "near-back", and "back."\nLines connect vowels of the same height and rounding. Dotted lines connect rounded vowels.',) + 
    theme_minimal()

Okay, now we’re starting to see some things! So, it looks like rounded vowels are pretty consistently further back than their unrounded counterparts. In some cases, drastically so (see [ɯ] compared to [u]). F2 is pretty level across most of the higher vowels. Among the low vowels, the further back they were the higher they were. Here we can better see the clustering in the low back portion of the vowel space. This also gives some nice context for the Moulton (1968:464), who says that the fieldworkers for the Linguistic Atlas of New England were “hopelessly and humanly incompetent at transcribing phonetically the low and back vowels they heard from their informants” (cited in Johnson 2010:32). Given a spot in the low back portion of the vowel space, there are lots of ways to transcribe it that would come pretty darn close.

Let’s pause and just make sure we’re on the same page when it comes to mapping acoustics to perception. I’m not saying that Dowse was wrong. I don’t make this plot just to point and laugh and say, “wow, he sure did a terrible job!” I haven’t like measured out my perception or anything, but with only a few exceptions the vowels sound more or less equidistant from each other to me. So, what this really shows is that there’s a pretty stark difference between what is perceptually equidistant and what is acoustically equidistant. Perhaps what this is showing is that we can actually hear small distances between vowels in the low back space more than in the high front space. Or perhaps Dowse was a little too ambitious at creating an artificially inflated number of low back distinctions and that we should stick with the trapezoidal shape that the IPA chart has. I don’t know. But I’m sure there’s some interesting paper from the 70s or something that has been written about this.

Let’s clean the vowel chart up a little bit by removing the rounded front vowels and the unrounded back vowels.

midpoints %>%
    unite(line_id, height, rounded, remove = FALSE) %>%
    filter(!marked) %>%
    ggplot(aes(f2, f1, color = height, shape = backness)) + 
    geom_line(aes(group = line_id, linetype = rounded)) + 
    geom_label(aes(label = ipa), size = 5) +
    scale_x_reverse() + 
    scale_y_reverse() + 
    ggthemes::scale_color_ptol() + 
    theme_minimal()

Okay, so this is a little bit sparser I don’t know if there’s any new insight here, other than the middle of the vowel space really opens up quite a bit.

Plotting trajectories

Since we have trajectory data, let’s plot some of those. I’ll have to reshape the data so that all the formant data shows up into a single column. I’ll use pivot_longer to do that, which you can read more about how it’s helpful for such vowel data here. If we just look at the high front data, we can see what that looks like.

high_front %>%
  pivot_longer(cols = c(f1, f2, f3), names_to = "formant", values_to = "hz") %>% 
  ggplot(aes(time, hz, color = formant, group = formant)) + 
  geom_point() + 
  geom_path() + 
  theme_minimal() + 
  theme(legend.position = "none")

That wonky data we saw in the F1-F2 plot makes sense now. Let’s look at the summarized data.

high_front_summarized %>%
  pivot_longer(cols = c(f1, f2, f3), names_to = "formant", values_to = "hz") %>% 
  ggplot(aes(time_cut, hz, color = formant, group = formant)) + 
  geom_point() + 
  geom_path() + 
  theme_minimal() + 
  theme(legend.position = "none")

Okay, so that’s cleaner.

Let’s plot all the vowels then in this spectrogram-like plot.

trajs %>%
  pivot_longer(cols = c(f1, f2, f3), names_to = "formant", values_to = "hz") %>% 
  unite(traj_id, vowel, formant, remove = FALSE) %>%
  ggplot(aes(time_cut, hz, color = formant, group = traj_id)) + 
  geom_point() + 
  geom_path() + 
  theme_minimal() + 
  theme(legend.position = "none")

Nothing too surprising here. We’ve got a lot of lines that are relatively stable. Towards the ends, most formants shift a little bit. Not sure why. A few others have some movement in other places. But, you’ve got to appreciate Dowse’s ability to hold a monophthong.

We can view this in a traditional F1-F2 plot. Here I’ve filtered out the edges because they had a lot of really wonky measurements, so this shows between 20% into the duration of the vowel and 70% into the duration of the vowel.

trajs %>%
  left_join(vowels_meta, by = "vowel") %>%
  filter(time_cut %in% c(2:7)) %>%
  ggplot(aes(f2, f1, color = height)) + 
  geom_line(aes(group = vowel), arrow = joeyr::joey_arrow()) + 
  scale_x_reverse() +
  scale_y_reverse() +
  ggthemes::scale_color_ptol() +
  labs(title = "Trajectories from jbdowse.com/ipa/",
       x = "F2", y = "F1") +
  theme_minimal()

Overall, you can see that Dowse does a good job at holding a monophthong. The higher vowels are generally pretty monophthongal. The lower the vowel, the more back-gliding it is. Low vowels appear to be less stable in height.

Conclusion

When I found Dowse’s vowel chart, I wanted to see what the acoustics were. I think it’s pretty enlightening to see how acoustic differences map onto perceptual distances and vice versa.