Dockum & Bowern (2019) – Swadesh wordlists are not long enough

Swadesh wordlists are not long enough
Rikker Dockum, Claire Bowern
direct link:
May 2019
This paper presents the results of experiments on the minimally sufficient wordlist size for drawing phonological generalizations about languages. Given a limited lexicon for an under-documented language, are conclusions that can be drawn from those data representative of the language as a whole? Linguistics necessarily involves generalizing from limited data, as documentation can never completely capture the full complexity of a linguistic system. We performed a series of sampling experiments on 36 Australian languages in the Chirila database (Bowern 2016) with lexicons ranging from 2,000 to 10,000 items. The purpose was to identify the smallest wordlist size to achieve: (1) full phonemic coverage for each language, and (2) accurate phonemic distribution compared to the full dataset. We hypothesize that when these two criteria are met they represent a minimally complete sample of a language for basic phonological typology. The results show coverage is consistently achieved at an average lexicon size of approximately 400 items, regardless of the original lexicon size sampled from. These results hold broad significance, given the predominance of word lists smaller than 400 items. For fieldwork, this study also provides a guideline for designing documentation tasks in the face of limited time and resources. These results also help to make empiricallygrounded decisions about which datasets are suitable for use for which research tasks.

Format: [ pdf ]
Reference: lingbuzz/004904
(please use that when you cite this article)
Published in: Language Documentation and Description
keywords: basic vocabulary, language documentation, phonology, inventories, phonology