Resources

Modified

March 26, 2024

On this page you’ll find links to all sorts of stuff that I have found useful, including tutorials, books, and general reading on R and Praat, statistics, software, corpora, design, and other stuff.

Note

I haven’t really updated this page since about 2019, so it may not include the latest resources. Some links may be dead. I’ve taken this page off my website’s main navigation bar, but I’ll keep it around in case others find it useful.


My handouts, tutorials, and workshops

R Workshops

This is a series of workshops on how to use R which includes a variety of topics. I have included PDFs and additional information on each installment of this series.

Formant extraction tutorial

This tutorial walks you through writing a praat script that extracts formant measurements from vowels. If you’ve never worked with Praat scripting but want to work with vowels, this might be a good starting point.

Vowel plots in R tutorials (Part 1 and Part 2)

This is a multi-part tutorial on how to make sort of the typical vowel plots in R. Part 1 shows plotting single-point measurements as scatter plots and serves as a mild introduction to ggplot2. Part 2 shows how to plot trajectories, both in the F1-F2 space and in a Praat-like time-Hz space, and is a bit of an introduction to tidyverse as well.

Measuring vowel overlap in R (Part 1 and Part 2)

This is a two-part tutorial on calculating Pillai scores and Bhattacharyya’s Affinity in R. The first covers what I consider the bare necessities, culminating custom R functions for each. The second is a bit more in-depth as it looks at ways to make the functions more robust, but it also shows some simple visualizations you can make with the output.

Make yourself googleable

I’m no expert, but I have given a workshop on how grad students can increase their online presence and make themselves more googleable, based in large part to ImpactStory’s fantastic 30-day challenge, which you can read here.

Academic Poster Workshop

In response to the need for a “How to Make an Academic Poster” workshop, I put one together last minute. Poster-making is more of an art than a science and this is a very opinionated view on the dos and don’ts of making an academic poster.

Excel Workshop

I once gave a workshop on Excel and ended producing a long handout, that goes from the very basics to relatively tricky techniques. The link above will take you to a blog post that summarizes the workshop, and you can also find the handout itself.


R Resources

Here is a list of resources I’ve found for R. I’ve gone through some of them and others are on my to-do list. These are in no particular order.

General R Coding

  • The website for Tidyverse is a great go-to place for learning how to use dplyr, tidyr, and many other packages.

  • R for Data Science by Garrett Grolemund & Hadley Wickham is a fantastic overview of tidyverse functions.

  • Advanced R by Hadley Wickham with the solutions by Malte Grosser, Henning Bumann, Peter Hurford & Robert Krzyzanowski.

  • R Packages by Hadley Wickham. Also try Shannon Pileggi’s tutorial called Your first R package in 1 hour to see some of these tools in action.

  • Hands-On Programming with R by Garrett Grolemund & Hadley Wickham for writing functions and simulations. Haven’t read it, but it looks good.

  • r-statistics.co by Selva Prabhakaran which has great tutorials on R itself, ggplot2, and advanced statistical modeling.

  • Tidymodels is like the Tidyverse suite of packages, but it’s meant for better handling of many statistical models. Also see it’s GitHub page.

  • Learn to purrr by Rebecca Barter is the tutorial on purrr that I wish I had.

  • Modern R with the Tidyverse by Bruno Rodriguez is a work in progress (as of June 2022), but it’s another free eBook that shows R and the Tidyverse.

  • Easystats “is a collection of R packages, which aims to provide a unifying and consistent framework to tame, discipline, and harness the scary R statistics and their pesky models.”

  • Oscar Baruffa’s monstrous Big Book of R is your one-stop resource for open-source R books on pretty much any topic. There are hundreds of books!

Working with Text

  • Text Mining with R by Julia Silge & David Robinson. Haven’t read it, but it looks great.

  • Handling Strings with R by Gaston Sanchez.

  • If you use the CMU Pronouncing Dictionary, you should look at the phon package. It makes the whole thing searchable and easy to find rhymes. Personally, this’ll make it a lot easier to find potential words for a word list.

  • The ggtext package by Claus O. Wilke makes it a lot easier to work with text if you want to add a little bit of rich text to your plots.

RMarkdown, Bookdown, and Blogdown

Note: Now that Quarto is available, some of this material may be out of date.

GIS and Spatial Stuff

Working with Census Data

Working with audio in R

This category includes anything that deals with audio. These are things that I mostly do in Praat or some other software, but someone has figured out how to do it in R.

  • praatpicture by Rasmus Puggaard-Rode lets you make Praat Picture style plots of acoustic data.
  • audio.whisper is an R package that lets you interact with OpenAI’s Whisper.

Miscelleny

  • gt or, the “Grammar of Tables,” the is basically the ggplot2 but for tables.

  • tidymodels is collection of packages harmoneous with the tidyverse, that mkes it really easy to run models on your data.

  • Self-explanatory tweets:


Data Visualization

Courses

Books

Colors

I’ve given a workshop on colors in data visualization, which you can view here. In it, I list the following resources, plus a whole bunch of other ones.

Using colors in data visualization

Prepackaged color palettes

  • A monster compilation of color palettes in R can be found at Emil Hvitfeldt’s Github.

  • The scico package has a bunch of colorblind-safe, perceptually uniform, ggplot2-friendly color palettes for use in visuals. Very cool.

  • The color brewer website, while best for maps, offers great color palettes that are colorblind and sometimes also printer-safe. The have native integration with ggplot2 with the scale_[color|fill]_ [brewer|distiller] functions.

  • Paul Tol has come up with some additional color themes, which you can access with scale_color_ptol in the ggthemes package.

  • oklch-smooth, by Stephen Hutchings, is “a smooth, full spectrum sRGB color palette for data visualization.”

  • There is no shortage of color palettes. Here are a handful of ones I’ve seen and liked for one reason or another:

    • dutchmasters: Instead of coming up with your own colors, why not use ones created by Dutch painters? This is an R package by Edwin Thoen.

    • PrettyCols by Nicola Rennie.

  • Colors.css: A nicer color palette for the web look like nice, customizable colors that work great for websites.

Creating your own color palettes

  • If you want to make your own discrete color scale in R, definitely check out Garrick Aden-Buie’s tutorial, Custom Discrete Color Scales for ggplot2.

  • Check out the simplecolors package, by Jake Riley, to find hex codes for consistently-named colors.

  • Definitely check out Adobe’s Color app for some inspiration on color palettes.

  • Also, check out Coolers for more inspiration on color palettes.

  • And if you have a start and end point, this Colorpicker app can get colors in between those points.

  • I’ve needed to do a bivariate cloropleth before, so Timo Grossenbacher’s blog post was helpful because it illustrates what this is and how you can do it in R.

Animation

  • Thomas Lin Pedersen’s gganimate package has now made it possible to make really cool animations in R. Sometimes you want to add a bit of pizzazz to your presentation, but other times animation really is the best way to visualize something. Either way, this package will help you out a lot.

Rayshader

  • Definitely check out Tyler Morgan-Wall’s rayshader package. It makes it pretty simple to make absolutely stunning 3D images of your data in R. You can make 3D maps if you have spatial data, but you can also turn any boring ggplot2 plot into a 3D work of art. Seriously, go try it out.

  • Lego World Map - Rayshader Walkthrough by Arthur Welle is an awesome walkthrough on rayshader and maps made out of virtual Legos. It’s a lot of fun.

Making better plots

Miscellany


Statistics Resources

General Statistics Knowledge

  • The American Statistical Association, which is essentially the statistics equivalent in scope and prestige as the the Linguistic Society of America, put out a statement on p-values in 2016. In March of 2019, they followed up with a monster 43-article special issue, Statistical Inference in the 21st Century: A World Beyond p < 0.05, wherein they recommend that the expression “statistically significant” be abandoned. This has potential to be a pivot point in the field of statistics. Why should a linguist care? Well, the first article in that issue says “If you use statistics in research, business, or policymaking but are not a statistician, these articles were indeed written with YOU in mind.” If you use statistics in your research, it might be worth reading through at least the first article of this issue.

  • The book Modern Dive: An Introduction to Statistical and Data Sciences via R by Chester Ismay and Albert Y. Kim is a free eBook available that teachest the basics of R and statistics. See Andrew Heiss’s post about this book for more information.

  • Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing by Justin Matejka and George Fitzmaurice. This went viral in some circles and shows that you can get the exact same summary statistics with wildly different distributions. Very cool.

  • Here’s a BuzzFeed article by Stephanie M. Lee about a researcher who made the news because of his unbelieveable amount of p-hacking and using “statistics” to lie about his data.

  • Have you learned about tests like t-tests, ANOVA, chi-squared tests? Did you know they’re all just reguression under the hood? Check out this explanation by Jonas Kristoffer Lindeløv called Common statistical tests are linear models. It’s mathy and based in R.

Linear mixed-effects models

GAM(M)s

My dissertation makes heavy use of generalized additive mixed-effects models (GAMMs). Here are some resources that I used to help learn about these.

Other Models

I know there are other types of models out there but I haven’t had the opportunity to use them. Here are some resources I’ve found that might be good for me down the road.

Bayesian Statistics

I have not yet learned about Bayesian stats, but here are some resources I’ve come across that I may use later.

Statistics for Linguists

Miscelleny

  • This workshop, Dimension reduction with R, by Saskia Freytag shows different methods for dimension reduction, weighs their pros and cons, and includes examples and visuals of their applications. Pretty useful.

  • If you use statistical modeling in your research, the report package is a useful tool to convert your model into human-readable prose.

  • Here’s an open source course on data science by Danielle Navarro.

  • Here’s Michael Franke’s Introduction to Data Analysis.

  • This blog post by Alex Cookson does a cool job at explaining PCA while also including some super cool visuals.

  • This blog post by Joshua Loftus visualizes least squares as springs. Makes a lot of sense to me!

  • If you’ve come up with an outlier detection algorithm, try following Sevvandi Kandanaarachchi’s Testing an Outlier Detection Method to see if it works.

  • Easystats “is a collection of R packages, which aims to provide a unifying and consistent framework to tame, discipline, and harness the scary R statistics and their pesky models.”


Praat Resources

  • Michelle Cohn has written and posted a bunch of very useful Praat scripts that you can download and use.

  • A YouTube channel called ListenLab by Matt Winn that has a bunch of video tutorials on how to do stuff in Praat.

  • Another YouTube channel called Intro to Speech Acoustics that may be useful to students of acoustics, phonetics, etc.

  • And I’ve written a tutorial on writing a script for basic automatic formant extraction.


Working with audio

There are three main steps for processing audio: transcription, forced alignment, and formant extraction.

Automatic Transcription

There is software available that you can use to transcribe in like Praat, Transcriber, and ELAN. But here are some tools I’ve seen that do automatic transcription.

  • CLOx is a new automatic transcriber available from the University of Washington. It’s a web-based service that uses Microsoft Bing’s Speech Recognition system to transcribe your audio. It’s estimated that a sociolinguistic interview can be transcribed in a fifth the time as a manual transcription. The great news is that it’s available for several languages!

  • DARLA is actually a whole collection of tools available through a web interface from Dartmouth University. It can transcribe, align, and extract formants from your (English) audio files all in one go. For automatic transcription, you can use their own in-house system by using the “Completely Automated” method. They admit the transcriptions won’t be perfect, but they provide a handy tool for manual correcting.

  • OH-Portal is by the Institute of Phonetics and Speech Processing. It works on several languages, and on clean lab data, it’s a little faster to run this and correct the transcription than it is to do a transcription from scratch. Runs entirely through the web browser, so you don’t have to download anything.

Forced Aligners

I’ve got a lot of audio that I need to process, so a crucial part of all that is force aligning the text to the audio. Smart people have come up with free software to do this. Here’s a list of the ones I’ve seen.

  • DARLA, avilable from Dartmouth University, is the one I’ve used the most. It can transcribe, align, and extract formants from your (English) audio files all in one go. Previously, its forced aligner is built using Prosody-Lab but now uses the Montreal Forced Aligner (see below).

  • The Montreal Forced Aligner is a relatively new one that I heard about for the first time at the 2017 LSA conference. It is fundamentally different than other ones in that it uses a software called Kaldi. It’s easy to set up and install and I’ve used it on my own data. The benefit of this over DARLA is that it’s on your own computer so you don’t have to wait for files to upload. And you can process files in bulk. Be sure to check out Michael McAuliffe’s blog on updates.

  • FAVE is probably the most well-known forced aligner. It’s open source and you can download it on your own computer from Joe Fruehwald’s Github page. Or if you’d prefer, you can UPenn’s their web interface instead.

  • Prosodylab-Aligner is, according to their website, “a set of Python and shell scripts for performing automated alignment of text to audio of speech using Hidden Markov Models.” This is a software available through McGill University that actually allows you to train your own acoustic model (e.g. on a non-English audio corpus). I haven’t used it yet, but if I ever need to process non-English audio, this’ll be my go-to.

  • SPPAS is a software package with several functions including forced alignment in several languages. Of the aligners you can download to your computer, this might be one of the easier ones to use.

  • WebMAUS is another web interface with multiple functions including a forced aligner for several languages.

  • Gentle advertises itself as a “robust yet lenient forced aligner built on Kaldi.” It’s easy to download and use and produces what appear to be very good word-level alignments of a provided transcript. It even ignored the interviewer’s voice in the file I tried. The output is a .csv file, so I’m not sure how to turn that into a TextGrid, and if you need phoneme-level acoustic measurements, a word-level transcription isn’t going to work.

Formant Extractors

  • Santiago Barreda’s Fast Track is my current go-to tool for automated formant extraction. It’s a Praat plug-in, but it works really well with the accompanying R package, FastTrackR. Give them both a try!

  • FAVE-Extract is the standard that tons of people use.

  • PolyglotDB works well with large, force-aligned corpora.

  • If you want to do write a script yourself, I’ve written a tutorial on writing a script for basic automatic formant extraction.


Phonetics Resources

  • The rtMRI IPA chart has MRI videos of all the sounds on the IPA chart.

  • Jonathan Dowse’s IPA Charts with Audio includes basically any possible combination of co-articulatations, regardless of whether they’re actually attested in human language.

  • Pink Trombone is an interesting site that has a interactive simulator of the vocal tract. You can click around and make different vowels and consonants. Pretty fun resource for teaching how speech works.


Typography, Web Design, and CSS

I enjoy reading and attempting to implement good typography into my website. Here are some resources that I have found helpful for that.

Beautiful Websites

I designed this website more or less from scratch, so I can appreciate the work others put into their own academic sites. Here are some examples of beautiful websites that I have found that I really like.

  • Kieran Healy has one of the beautiful academic websites I’ve ever seen. I created this category on this page just so I could include his page on here. Wow.

  • Practical Typography by Matthew Butterick is was my gateway into typography. My font selection and many other little details on my site (slides, posters, CV, etc.) were influenced by this book.

CSS

  • If you enjoy the work of Edward Tufte and would like to incorporate some of his design principles into your website, you’ll be interested in Tufte CSS by Dave Liepmann. If you’re interested in your RMarkdown files rendering in a Tufte-style (like this), there are ways to do that too, which you can read in chapter 3 of bookdown by Yihui Xie or chapter 6 of R Markdown, by Yihui Xie, J. J. Allaire, and Garrett Grolemund.


Academic Life

Occasionally, I’ll see posts with really good and insightful tips on how to be an academic. For the ones I saw I Twitter, I’ve put the first post here: click on them to go directly to that tweet where you can read the rest.

Miscellaneous

Just random stuff that doesn’t fit elsewhere.

  • The great American word mapper is an interactive tool put together by Diansheng Guo, Jack Grieve, and Andrea Nini that lets you see regional trends in how words are used on Twitter.

  • Collecting, organizing, and citing scientific literature: an intro to Zotero is a great tutorial on how to use Zotero by Mark Dingemanse. Zotero is a fantastic tool for, well, collecting, organizing, and citing scientific literature and I’m not exaggerating when I say that I could not be in academics without it.

  • Vulgar: A Language Generator is a site that automatically creates a new conlang, based on parameters that you specify. The free web version allows you to add whatever vowels and consonants you’d like to include, and it’ll create a full language: a language name; IPA chart for vowels and consonants; phonotactics; phonological rules; and paradigms for nominal morphology, definite and indefinite articles, personal pronouns, and verb conjugations; derivational morphology; and a lexicon of over 200 words. For $19 you can download the software and get a lexicon of 2000 words, derivational words, random semantic overlaps with natural languages, and the ability to customize orthography, syllable structure, and phonological rules. In addition to just being kinda fun, this is a super useful resource for creating homework assignments for students.

  • The EMU-webApp “is a fully fledged browser-based labeling and correction tool that offers a multitude of labeling and visualization features.” I haven’t given this enough time to learn to use it properly, but it seems very helpful.

  • Jonhannes Haushofer’s CV of Failures. Other people have written this more elegantly than I could, but sometimes it’s nice to see that other academics fail too. You’re not going to get into all the conferences you apply for, your papers are sometimes going to be rejected, and you’re definitely not getting all the funding you apply for. I find it therapeutic to put together a CV of failures like his researcher did and to keep it updated and formatted just as would a regular CV. Don’t let impostor syndrome get in the way by thinking others haven’t failed too.

  • Kieran Healey’s The Plain Person’s Guide to Plain Text Social Science is an entire book on an aspect of productivity that I’ve only thought about occasionally: what kind of software should you do your work? Before you get too entrenched in your workflow, it’s good to consider what your options are.

  • ThisWordDoesNotExist.com is a fun site created by Thomas Dimson.

  • Niche for fellow Mormons, but this post by “Ziff” called “Church President Probabilities, Changes with the Death of One Q15 Member” is a really in-depth analysis that predicts who the next president of the church will be.

  • XKCD’s color survey is always fascinating to me. He displayed a random color and asked people to name it. People could retake the survey as much as they wanted. Hundreds of thousands of responses later, and he came up with a really cool crowd-sourced visualization of how English speakers categorize colors.

  • FiveThirtyEight’s “The Ultimate Halloween Candy Power Ranking”. They took a couple dozen Halloween candys, displayed images of two of them at random, and asked people which they’d rather have. Many, many responses later, and they have a nice ranking of people’s favorite candy.