Resources

On this page you’ll find links to all sorts of stuff that I have found useful, including tutorials, books, and general reading on R and Praat, statistics, software, corpora, design, and other stuff.


My handouts, tutorials, and workshops

R Workshops

This is a series of workshops on how to use R which includes a variety of topics. I have included PDFs and additional information on each installment of this series.

Formant extraction tutorial

This tutorial walks you through writing a praat script that extracts formant measurements from vowels. If you’ve never worked with Praat scripting but want to work with vowels, this might be a good starting point.

Vowel plots in R tutorials (Part 1 and Part 2)

This is a multi-part tutorial on how to make sort of the typical vowel plots in R. Part 1 shows plotting single-point measurements as scatter plots and serves as a mild introduction to ggplot2. Part 2 shows how to plot trajectories, both in the F1-F2 space and in a Praat-like time-Hz space, and is a bit of an introduction to tidyverse as well.

Measuring vowel overlap in R (Part 1 and Part 2)

This is a two-part tutorial on calculating Pillai scores and Bhattacharyya’s Affinity in R. The first covers what I consider the bare necessities, culminating custom R functions for each. The second is a bit more in-depth as it looks at ways to make the functions more robust, but it also shows some simple visualizations you can make with the output.

Make yourself googleable

I’m no expert, but I have given a workshop on how grad students can increase their online presence and make themselves more googleable, based in large part to ImpactStory’s fantastic 30-day challenge, which you can read here.

Academic Poster Workshop

In response to the need for a “How to Make an Acadmic Poster” workshop, I put one together last minute. Poster-making is more of an art than a science and this is a very opinionated view on the dos and don’ts of making an acadmic poster.

Excel Workshop

Last year I gave a workshop on Excel and ended producing a long handout, that goes from the very basics to relatively tricky techniques. The link above will take you to a blog post that summarizes the workshop, and you can also find the handout itself.


R Resources

Here is a list of resources I’ve found for R. I’ve gone through some of them and others are on my to-do list. These are in no particular order.

General R Coding

  • The website for Tidyverse is a great go-to place for learning how to use dplyr, tidyr, and many other packages.

  • R for Data Science by Garrett Grolemund & Hadley Wickham is a fantastic overview of tidyverse functions.

  • Intro to Tidyverse by David Robinson.

  • Advanced R by Hadley Wickham with the solutions by Malte Grosser, Henning Bumann, Peter Hurford & Robert Krzyzanowski.

  • R Packages by Hadley Wickham.

  • Hands-On Programming with R by Garrett Grolemund & Hadley Wickham for writing functions and simulations. Haven’t read it, but it looks good.

  • r-statistics.co by Selva Prabhakaran which has great tutorials on R itself, ggplot2, and advanced statistical modeling.

  • Tidymodels is like the Tidyverse suite of packages, but it’s meant for better handling of many statistical models. Also see it’s GitHub page.

  • Learn to purrr by Rebecca Barter is the tutorial on purrr that I wish I had.

  • Modern R with the Tidyverse by Bruno Rodriguez is a work in progress, but it’s another free eBook that shows R and the Tidyverse.

Working with Text

  • Text Mining with R by Julia Silge & David Robinson. Haven’t read it, but it looks great.

  • Handling Strings with R by Gaston Sanchez.

  • Visualizing text data with ggplot2 by Colin Fay.

  • If you use the CMU Pronouncing Dictionary, you should look at the new phon package. It makes the whole thing searchable and easy to find rhymes. Personally, this’ll make it a lot easier to find potential words for a word list.

  • The ggtext package by Claus O. Wilke makes it a lot easier to work with text if you want to add a little bit of rich text to your plots.

Working with Twitter

  • 21 Recipes for Mining Twitter Data with rtweet by Bob Rudis is a tutorial that illustrates how to extract and do a whole bunch of stuff with Twitter data in R.

  • R Ready to Map is a tutorial by Dorris Scott that starts off using the rtweet package to extract some Twitter data, shows you how to map it, and then walks you through creating an interactive RMarkdown document that integrates leaflet maps and plots.

RMarkdown, Bookdown, and Blogdown

GIS and Spatial Stuff

Working with Census Data


Data Visualization

Books

Colors

I’ve given a workshop on colors in data visualization, which you can view here. In it, I list the following resources, plus a whole bunch of other ones.

  • The scico package has a bunch of colorblind-safe, perceptually uniform, ggplot2-friendly color palettes for use in visuals. Very cool.

  • The color brewer website, while best for maps, offers great color palettes that are colorblind and sometimes also printer-safe. The have native integration with ggplot2 with the scale_[color|fill]_ [brewer|distiller] functions.

  • Paul Tol has come up with some additional color themes, which you can access with scale_color_ptol in the ggthemes package.

  • If you want to make your own discrete color scale in R, definitely check out Garrick Aden-Buie’s tutorial, Custom Discrete Color Scales for ggplot2.

  • There is no shortage of color palettes. Here are a handful of ones I’ve seen and liked for one reason or another:

    • nationalparkcolors: An R package by Katie Jolly with color palettes based on vintage-looking national parks posters.

    • earthtones: An R package by Will Cornwell where you give it GPS coordinates and it’ll go to that location in Google Maps and create a color palette based on satellite images. Pretty cool.

    • RSkittleBrewer: An R package by Alyssa Frazee that includes color palettes based on Skittles!

    • pokepalettes.com: A simple webpage that takes a Pokemon name and generates a color palette.

  • Of course, a monster compilation of color palettes in R can be found at Emil Hvitfeldt’s Github.

Animation

  • Thomas Lin Pedersen’s gganimate package has now made it possible to make really cool animations in R. Sometimes you want to add a bit of pizzazz to your presentation, but other times animation really is the best way to visualize something. Either way, this package will help you out a lot.

Rayshader

  • Definitely check out Tyler Morgan-Wall’s rayshader package. It makes it pretty simple to make absolutely stunning 3D images of your data in R. You can make 3D maps if you have spatial data, but you can also turn any boring ggplot2 plot into a 3D work of art. Seriously, go try it out.

  • Lego World Map - Rayshader Walkthrough by Arthur Welle is an awesome walkthrough on rayshader and maps made out of virtual Legos. It’s a lot of fun.

Making better plots

Miscellany


Statistics Resources

General Statistics Knowledge

  • The American Statistical Association, which is essentially the statistics equivalent in scope and prestige as the the Linguistic Society of America, put out a statement on p-values in 2016. In March of 2019, they followed up with a monster 43-article special issue, Statistical Inference in the 21st Century: A World Beyond p < 0.05, wherein they recommend that the expression “statistically significant” be abandoned. This has potential to be a pivot point in the field of statistics. Why should a linguist care? Well, the first article in that issue says “If you use statistics in research, business, or policymaking but are not a statistician, these articles were indeed written with YOU in mind.” If you use statistics in your research, it might be worth reading through at least the first article of this issue.

  • The book Modern Dive: An Introduction to Statistical and Data Sciences via R by Chester Ismay and Albert Y. Kim is a free eBook available that teachest the basics of R and statistics. See Andrew Heiss’s post about this book for more information.

  • Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing by Justin Matejka and George Fitzmaurice. This went viral in some circles and shows that you can get the exact same summary statistics with wildly different distributions. Very cool.

  • Here’s a BuzzFeed article by Stephanie M. Lee about a researcher who made the news because of his unbelieveable amount of p-hacking and using “statistics” to lie about his data.

Linear mixed-effects models

GAM(M)s

My dissertation makes heavy use of generalized additive mixed-effects models (GAMMs). Here are some resources that I used to help learn about these.

Other Models

I know there are other types of models out there but I haven’t had the opportunity to use them. Here are some resources I’ve found that might be good for me down the road.

Miscelleny

  • This workshop, Dimension reduction with R, by Saskia Freytag shows different methods for dimension reduction, weighs their pros and cons, and includes examples and visuals of their applications. Pretty useful.

  • If you use statistical modeling in your research, the report package is a useful tool to convert your model into human-readable prose.


Praat Resources


Working with audio

There are three main steps for processing audio: transcription, forced alignment, and formant extraction.

Automatic Transcription

There is software available that you can use to transcribe in like Praat, Transcriber, and ELAN. But here are some tools I’ve seen that do automatic transcription.

  • CLOx is a new automatic transcriber available from the University of Washington. It’s a web-based service that uses Microsoft Bing’s Speech Recognition system to transcribe your audio. It’s estimated that a sociolinguistic interview can be transcribed in a fifth the time as a manual transcription. The great news is that it’s available for several languages!

  • DARLA is actually a whole collection of tools available through a web interface from Dartmouth University. It can transcribe, align, and extract formants from your (English) audio files all in one go. For automatic transcription, you can use their own in-house system by using the “Completely Automated” method. They admit the transcriptions won’t be perfect, but they provide a handy tool for manual correcting.

  • OH-Portal is by the Institute of Phonetics and Speech Processing. It works on several languages, and on clean lab data, it’s a little faster to run this and correct the transcription than it is to do a transcription from scratch. Runs entirely through the web browser, so you don’t have to download anything.

Forced Aligners

I’ve got a lot of audio that I need to process, so a crucial part of all that is force aligning the text to the audio. Smart people have come up with free software to do this. Here’s a list of the ones I’ve seen.

  • DARLA, avilable from Dartmouth University, is the one I’ve used the most. It can transcribe, align, and extract formants from your (English) audio files all in one go. Previously, its forced aligner is built using Prosody-Lab but now uses the Montreal Forced Aligner (see below).

  • The Montreal Forced Aligner is a relatively new one that I heard about for the first time at the 2017 LSA conference. It is fundamentally different than other ones in that it uses a software called Kaldi. It’s easy to set up and install and I’ve used it on my own data. The benefit of this over DARLA is that it’s on your own computer so you don’t have to wait for files to upload. And you can process files in bulk.

  • FAVE is probably the most well-known forced aligner. It’s open source and you can download it on your own computer from Joe Fruehwald’s Github page. Or if you’d prefer, you can UPenn’s their web interface instead.

  • Prosodylab-Aligner is, according to their website, “a set of Python and shell scripts for performing automated alignment of text to audio of speech using Hidden Markov Models.” This is a software available through McGill University that actually allows you to train your own acoustic model (e.g. on a non-English audio corpus). I haven’t used it yet, but if I ever need to process non-English audio, this’ll be my go-to.

  • SPPAS is a software package with several functions including forced alignment in several languages. Of the aligners you can download to your computer, this might be one of the easier ones to use.

  • WebMAUS is another web interface with multiple functions including a forced aligner for several languages.

  • Gentle advertises itself as a “robust yet lenient forced aligner built on Kaldi.” It’s easy to download and use and produces what appear to be very good word-level alignments of a provided transcript. It even ignored the interviewer’s voice in the file I tried. The output is a .csv file, so I’m not sure how to turn that into a TextGrid, and if you need phoneme-level acoustic measurements, a word-level transcription isn’t going to work.

Formant Extractors

  • FAVE-Extract is the gold-standard that tons of people use.

  • If you want to do write a script yourself, I’ve written a tutorial on writing a script for basic automatic formant extraction.


Corpora

For whatever reason, sometimes it’s nice to uses data that already exists rather than collect your own. Here are just a few of the sites I’ve seen for downloading audio for (potential) linguistic research.

Audio Corpora

  • CORAAL is the Corpus of Regional African American English, the first public corpus of African American Language. You can download the audio and transcriptions in their entirety here or search and browse the corpus from the website.

  • The Linguistic Atlas Project is an important work for American dialectology. Early linguists interviewed thousands of people from across the country, mostly between the 1930s and the 1980s. If you’ve heard of the Linguistic Atlas of New England (LANE), the Linguistic Atlas of the Middle Atlantic States (LAMSAS), or the Linguistic Atlas of the Gulf States (LAGS), these are all under the umbrella of the Linguistic Atlas Project and serve as a baseline from which contemporary data compared against to study language change in real time. Many of the recordings are available to download online (for those that were recorded after portable technology existed, so around 1950 or later). There arne’t too many full transcriptions yet, but there are scans of handwritten transcriptions of key words available to download.

  • The Dictionary of American Regional English (DARE) recently made all of their audio available online. This is a nice collection of older recordings from all over the country.

  • The International Dialects of English Archive (IDEA) has a nice collection of over 1000 short audio clips featuring basically every variety of English (native and non-native) you can think of. It’s designed with voice actors in mind, but it can still be used for linguistic analysis.

  • StoryCorps has tons of recorded interviews available for download. I’ve seen audio from this site used a couple times for linguistic analysis.

  • The Library of Congress hosts thousands of recorded interviews. I don’t recall seeing these used in linguistic research, but some of them are older and could be good for something.

Text Corpora

  • COCA, COHA, and many others are all created by Mark Davies at Brigham Young University. These are said to be the gold standard when it comes to balanced, large corpora.

  • Jason Baumbartner has done the legwork to make the entirety of Reddit available for download. I worked with this data when he first released it in 2015, and it was about a 50-billion word corpus back then. Reddit has grown tremendously even since then so you’re looking at some truly big data. Super cool.


Typography, Web Design, and CSS

I enjoy reading and attempting to implement good typography into my website. Here are some resources that I have found helpful for that.

Beautiful Websites

I designed this website more or less from scratch, so I can appreciate the work others put into their own academic sites. Here are some examples of beautiful websites that I have found that I really like.

  • Kieran Healy has one of the beautiful academic websites I’ve ever seen. I created this category on this page just so I could include his page on here. Wow.

  • Practical Typography by Matthew Butterick is was my gateway into typography. My font selection and many other little details on my site (slides, posters, CV, etc.) were influenced by this book.

CSS

  • If you enjoy the work of Edward Tufte and would like to incorporate some of his design principles into your website, you’ll be interested in Tufte CSS by Dave Liepmann. If you’re interested in your RMarkdown files rendering in a Tufte-style (like this), there are ways to do that too, which you can read in chapter 3 of bookdown by Yihui Xie or chapter 6 of R Markdown, by Yihui Xie, J. J. Allaire, and Garrett Grolemund (cf. this).


Academic Life

Occasionally, I’ll see posts with really good and insightful tips on how to be an academic. For the ones I saw I Twitter, I’ve put the first post here: click on them to go directly to that tweet where you can read the rest.

Miscellaneous

Just random stuff that doesn’t fit elsewhere.

  • The great American word mapper is an interactive tool put together by Diansheng Guo, Jack Grieve, and Andrea Nini that lets you see regional trends in how words are used on Twitter.

  • Collecting, organizing, and citing scientific literature: an intro to Zotero is a great tutorial on how to use Zotero by Mark Dingemanse. Zotero is a fantastic tool for, well, collecting, organizing, and citing scientific literature and I’m not exaggerating when I say that I could not be in academics without it.

  • Pink Trombone is an interesting site that has a interactive simulator of the vocal tract. You can click around and make different vowels and consonants. Pretty fun resource for teaching how speech works.

  • Vulgar: A Language Generator is a site that automatically creates a new conlang, based on parameters that you specify. The free web version allows you to add whatever vowels and consonants you’d like to include, and it’ll create a full language: a language name; IPA chart for vowels and consonants; phonotactics; phonological rules; and paradigms for nominal morphology, definite and indefinite articles, personal pronouns, and verb conjugations; derivational morphology; and a lexicon of over 200 words. For $19 you can download the software and get a lexicon of 2000 words, derivational words, random semantic overlaps with natural languages, and the ability to customize orthography, syllable structure, and phonological rules. In addition to just being kinda fun, this is a super useful resource for creating homework assignments for students.

  • IPA Phonetics is an iPhone app has what they call an “elaborated” IPA chart with lots of extra places and manners of articulation, complete with audio clips of all the sounds. You can play a game where it’ll play a sound and you can guess what you heard. It’s just fun to see things like a voiced uvular fricative (ɢʁ) or a dentolabial fricative [θ̼] on an IPA chart. Credits to University of Victoria linguistics and John Esling’s “Phonetic Notation” (chapter 18 of the Handbook of Phonetic Sciences, 2nd ed.).

  • The EMU-webApp “is a fully fledged browser-based labeling and correction tool that offers a multitude of labeling and visualization features.” I haven’t given this enough time to learn to use it properly, but it seems very helpful.

  • Jonhannes Haushofer’s CV of Failures. Other people have written this more elegantly than I could, but sometimes it’s nice to see that other academics fail too. You’re not going to get into all the conferences you apply for, your papers are sometimes going to be rejected, and you’re definitely not getting all the funding you apply for. I find it therapeutic to put together a CV of failures like his researcher did and to keep it updated and formatted just as would a regular CV. Don’t let impostor syndrome get in the way by thinking others haven’t failed too.

  • Kieran Healey’s The Plain Person’s Guide to Plain Text Social Science is an entire book on an aspect of productivity that I’ve only thought about occasionally: what kind of software should you do your work? Before you get too entrenched in your workflow, it’s good to consider what your options are.