Today, some collaborative work between Brett Hashimoto, me, and Jack Grieve is being presented at the American Association for Corpus Linguistics Conference at the University of Oregon.
Download the slides here!
We use the Corpus of North American Spoken English (CoNASE; Coats 2019, 2023), which has YouTube transcriptions of over 300,000 geotagged regional and local government videos from across the US and Canada. It represents over 150,000 hours of spoken language and 1.2 billion words of text. We follow Grieve’s (2016) methods and extract information from 135 grammatical alternation variables. We then ran spatial stats to identify areas where those alternations cluster together. Basically, we’re using YouTube to identify regional grammatical variation. Our presentation today could only show a few of the over 200 maps that we’ve generated and here’s just one of my favorites.
We’re just getting started in our analysis of this unbelievably big and rich dataset. Stay tuned for many more additional findings in future presentations!