Resources

On this page you’ll find links to all sorts of stuff that I have found useful, including tutorials, books, and general reading on R and Praat, statistics, software, corpora, design, and other stuff.


My handouts, tutorials, and workshops

R Workshops

I’m currently giving a series of workshops on how to use R which will include a variety of topics. I have included PDFs and additional information on each installment of this series.

Formant extraction tutorial

This tutorial walks you through writing a praat script that extracts formant measurements from vowels. If you’ve never worked with Praat scripting but want to work with vowels, this might be a good starting point.

Vowel plots in R tutorials (Part 1 and Part 2)

This is a multi-part tutorial on how to make sort of the typical vowel plots in R. Part 1 shows plotting single-point measurements as scatter plots and serves as a mild introduction to ggplot2. Part 2 shows how to plot trajectories, both in the F1-F2 space and in a Praat-like time-Hz space, and is a bit of an introduction to tidyverse as well.

Make yourself googleable

I’m no expert, but I have given a workshop on how grad students can increase their online presence and make themselves more googleable, based in large part to ImpactStory’s fantastic 30-day challenge, which you can read here.

Excel Workshop

Last year I gave a workshop on Excel and ended producing a long handout, that goes from the very basics to relatively tricky techniques. The link above will take you to a blog post that summarizes the workshop, and you can also find the handout itself.


R Resources

Here is a list of resources I’ve found for R. I’ve gone through some of them and others are on my to-do list. These are in no particular order.

General R Coding

  • The website for Tidyverse is a great go-to place for learning how to use dplyr, tidyr, and many other packages.

  • R for Data Science by Garrett Grolemund & Hadley Wickham is a fantastic overview of tidyverse functions.

  • Intro to Tidyverse by David Robinson.

  • Advanced R by Hadley Wickham with the solutions by Malte Grosser, Henning Bumann, Peter Hurford & Robert Krzyzanowski.

  • R Packages by Hadley Wickham.

  • Hands-On Programming with R by Garrett Grolemund & Hadley Wickham for writing functions and simulations. Haven’t read it, but it looks good.

  • r-statistics.co by Selva Prabhakaran which has great tutorials on R itself, ggplot2, and advanced statistical modeling.

  • Tidymodels is like the Tidyverse suite of packages, but it’s meant for better handling of many statistical models. Also see it’s GitHub page.

Data Visualization

  • ggplot2 by Hadley Wickham is a comprehensive resource for learning all the ins and outs of ggplot2.

  • Not satisfied with R’s default colors? Try some of these alternatives:

    • The scion package has a bunch of colorblind-safe, perceptually uniform, ggplot2-friendly color palettes for use in visuals. Very cool.

    • The color brewer website, while best for maps, offers great color palettes that are colorblind and sometimes also printer-safe. The have native integration with ggplot2 with the scale_[color|fill]_ [brewer|distiller] functions.

    • Paul Tol has come up with some additional color themes, which you can access with scale_color_ptol in the ggthemes package.

  • This blog post by Jesse Sadler is a great tutorial on how to use R to visualize network data.

  • Data Visualization: A Practical Introduction by Kieran Healy. I haven’t had the time to look through it, and as I write this it’s an incomplete draft of the forthcoming book, but it looks quite good. It covers data prep, basic plots, visualizing statistical models, maps, and a whole bunch of other stuff.

  • Edward Tufte is a statistician known for his series of four books that focus on best practices in the presentation of data: The Visual Display of Quantitative Information, Envisioning Information, Visual Explanations, and Beautiful Evidence. I haven’t read them, but have thumbed through them and they look very cool. As a practical application of them, this page by Lukasz Piwek shows how to implement many of these visualizations in R. You can also use ggthemes to get some of this implementation.

  • Joey Cherdarchuk of Darkhorse Analytics has put together some really succinct presentations on how to simplify things you might put in a paper like maps, charts, tables, and reducing the data to ink ratio.

Working with Text

RMarkdown and Bookdown

GIS and Spatial Stuff


Statistics Resources

General Statistics Knowledge

  • The American Statistical Association, which is essentially the statistics equivalent in scope and prestige as the the Linguistic Society of America, put out a statement on p-values. It is brief and written in accessible language and in my opinoin should be required reading if you ever use or interpret p-values in your research.

  • Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing by Justin Matejka and George Fitzmaurice. This went viral in some circles and shows that you can get the exact same summary statistics with wildly different distributions. Very cool.

  • 15 Types of Regression You Should Know is a post on the blog Listen Data that is a nice overview of different kinds of regression and how to implement them in R.

  • Mixed Modeling as a Foreign Language, a blog post by Andrew McDonald, first off is a good explanation of what mixed modeling is all about. But more importantly, it makes the point that “if you only partly understand the words you are using, you will humiliate yourself eventually.” In other words, it’s important to know what you’re doing when you use statistics, and if you don’t, maybe you should reconsider before you do something wrong.

  • Here’s a BuzzFeed article by Stephanie M. Lee about a researcher who made the news because of his unbelieveable amount of p-hacking and using “statistics” to lie about his data.

Linear mixed-effects models

GAMMs


Praat Resources


Working with audio

There are three main steps for processing audio: transcription, forced alignment, and formant extraction.

Automatic Transcription

There is software available that you can use to transcribe in like Praat, Transcriber, and ELAN. But here are some tools I’ve seen that do automatic transcription.

  • CLOx is a new automatic transcriber available from the University of Washington. It’s a web-based service that uses Microsoft Bing’s Speech Recognition system to transcribe your audio. It’s estimated that a sociolinguistic interview can be transcribed in a fifth the time as a manual transcription. The great news is that it’s available for several languages!

  • DARLA is actually a whole collection of tools available through a web interface from Dartmouth University. It can transcribe, align, and extract formants from your (English) audio files all in one go. For automatic transcription, you can use their own in-house system by using the “Completely Automated” method. They admit the transcriptions won’t be perfect, but they provide a handy tool for manual correcting.

Forced Aligners

I’ve got a lot of audio that I need to process, so a crucial part of all that is force aligning the text to the audio. Smart people have come up with free software to do this. Here’s a list of the ones I’ve seen.

  • DARLA, avilable from Dartmouth University, is the one I’ve used the most. It can transcribe, align, and extract formants from your (English) audio files all in one go. Previously, its forced aligner is built using Prosody-Lab but now uses the Montreal Forced Aligner (see below).

  • The Montreal Forced Aligner is a relatively new one that I heard about for the first time at the 2017 LSA conference. It is fundamentally different than other ones in that it uses a software called Kaldi. It’s easy to set up and install and I’ve used it on my own data. The benefit of this over DARLA is that it’s on your own computer so you don’t have to wait for files to upload. And you can process files in bulk.

  • FAVE is probably the most well-known forced aligner. It’s open source and you can download it on your own computer from Joe Fruehwald’s Github page. Or if you’d prefer, you can UPenn’s their web interface instead.

  • Prosodylab-Aligner is, according to their website, “a set of Python and shell scripts for performing automated alignment of text to audio of speech using Hidden Markov Models.” This is a software available through McGill University that actually allows you to train your own acoustic model (e.g. on a non-English audio corpus). I haven’t used it yet, but if I ever need to process non-English audio, this’ll be my go-to.

  • SPPAS is a software package with several functions including forced alignment in several languages. Of the aligners you can download to your computer, this might be one of the easier ones to use.

  • WebMAUS is another web interface with multiple functions including a forced aligner for several languages.

  • Gentle advertises itself as a “robust yet lenient forced aligner built on Kaldi.” It’s easy to download and use and produces what appear to be very good word-level alignments of a provided transcript. It even ignored the interviewer’s voice in the file I tried. The output is a .csv file, so I’m not sure how to turn that into a TextGrid, and if you need phoneme-level acoustic measurements, a word-level transcription isn’t going to work.

Formant Extractors

  • FAVE-Extract is the gold-standard that tons of people use.

  • If you want to do write a script yourself, I’ve written a tutorial on writing a script for basic automatic formant extraction.


Corpora

For whatever reason, sometimes it’s nice to uses data that already exists rather than collect your own. Here are just a few of the sites I’ve seen for downloading audio for (potential) linguistic research.

Audio Corpora

  • CORAAL is the Corpus of Regional African American English, the first public corpus of African American Language. You can download the audio and transcriptions in their entirety here or search and browse the corpus from the website.

  • The Linguistic Atlas Project is an important work for American dialectology. Early linguists interviewed thousands of people from across the country, mostly between the 1930s and the 1980s. If you’ve heard of the Linguistic Atlas of New England (LANE), the Linguistic Atlas of the Middle Atlantic States (LAMSAS), or the Linguistic Atlas of the Gulf States (LAGS), these are all under the umbrella of the Linguistic Atlas Project and serve as a baseline from which contemporary data compared against to study language change in real time. Many of the recordings are available to download online (for those that were recorded after portable technology existed, so around 1950 or later). There arne’t too many full transcriptions yet, but there are scans of handwritten transcriptions of key words available to download.

  • The Dictionary of American Regional English (DARE) recently made all of their audio available online. This is a nice collection of older recordings from all over the country.

  • The International Dialects of English Archive (IDEA) has a nice collection of over 1000 short audio clips featuring basically every variety of English (native and non-native) you can think of. It’s designed with voice actors in mind, but it can still be used for linguistic analysis.

  • StoryCorps has tons of recorded interviews available for download. I’ve seen audio from this site used a couple times for linguistic analysis.

  • The Library of Congress hosts thousands of recorded interviews. I don’t recall seeing these used in linguistic research, but some of them are older and could be good for something.

Text Corpora

  • COCA, COHA, and many others are all created by Mark Davies at Brigham Young University. These are said to be the gold standard when it comes to balanced, large corpora.

  • Jason Baumbartner has done the legwork to make the entirety of Reddit available for download. I worked with this data when he first released it in 2015, and it was about a 50-billion word corpus back then. Reddit has grown tremendously even since then so you’re looking at some truly big data. Super cool.


Typography, Web Design, and CSS

I enjoy reading and attempting to implement good typography into my website. Here are some resources that I have found helpful for that.

Beautiful Websites

I designed this website more or less from scratch, so I can appreciate the work others put into their own academic sites. Here are some examples of beautiful websites that I have found that I really like.

  • Kieran Healy has one of the beautiful academic websites I’ve ever seen. I created this category on this page just so I could include his page on here. Wow.

  • Practical Typography by Matthew Butterick is was my gateway into typography. My font selection and many other little details on my site (slides, posters, CV, etc.) were influenced by this book.

CSS

  • If you enjoy the work of Edward Tufte and would like to incorporate some of his design principles into your website, you’ll be interested in Tufte CSS by Dave Liepmann. If you’re interested in your RMarkdown files rendering in a Tufte-style (like this), there are ways to do that too, which you can read in chapter 3 of bookdown by Yihui Xie or chapter 6 of R Markdown, by Yihui Xie, J. J. Allaire, and Garrett Grolemund (cf. this).


Miscellaneous

Just random stuff that doesn’t fit elsewhere.

  • The great American word mapper is an interactive tool put together by Diansheng Guo, Jack Grieve, and Andrea Nini that lets you see regional trends in how words are used on Twitter.

  • Collecting, organizing, and citing scientific literature: an intro to Zotero is a great tutorial on how to use Zotero by Mark Dingemanse. Zotero is a fantastic tool for, well, collecting, organizing, and citing scientific literature and I’m not exaggerating when I say that I could not be in academics without it.

  • Pink Trombone is an interesting site that has a interactive simulator of the vocal tract. You can click around and make different vowels and consonants. Pretty fun resource for teaching how speech works.

  • Vulgar: A Language Generator is a site that automatically creates a new conlang, based on parameters that you specify. The free web version allows you to add whatever vowels and consonants you’d like to include, and it’ll create a full language: a language name; IPA chart for vowels and consonants; phonotactics; phonological rules; and paradigms for nominal morphology, definite and indefinite articles, personal pronouns, and verb conjugations; derivational morphology; and a lexicon of over 200 words. For $19 you can download the software and get a lexicon of 2000 words, derivational words, random semantic overlaps with natural languages, and the ability to customize orthography, syllable structure, and phonological rules. In addition to just being kinda fun, this is a super useful resource for creating homework assignments for students.

  • IPA Phonetics is an iPhone app has what they call an “elaborated” IPA chart with lots of extra places and manners of articulation, complete with audio clips of all the sounds. You can play a game where it’ll play a sound and you can guess what you heard. It’s just fun to see things like a voiced uvular fricative (ɢʁ) or a dentolabial fricative [θ̼] on an IPA chart. Credits to University of Victoria linguistics and John Esling’s “Phonetic Notation” (chapter 18 of the Handbook of Phonetic Sciences, 2nd ed.).

  • The EMU-webApp “is a fully fledged browser-based labeling and correction tool that offers a multitude of labeling and visualization features.” I haven’t given this enough time to learn to use it properly, but it seems very helpful.

  • Jonhannes Haushofer’s CV of Failures. Other people have written this more elegantly than I could, but sometimes it’s nice to see that other academics fail too. You’re not going to get into all the conferences you apply for, your papers are sometimes going to be rejected, and you’re definitely not getting all the funding you apply for. I find it therapeutic to put together a CV of failures like his researcher did and to keep it updated and formatted just as would a regular CV. Don’t let impostor syndrome get in the way by thinking others haven’t failed too.