Comparing Models of Phonotactics for Word Segmentation

Schrimpf, Natalie & Gaja Jarosz. 2014. Comparing Models of Phonotactics for Word Segmentation. Association of Computational Linguistics: Joint Meeting of SIGMORPHON and SIGFSM 2014.


Developmental research indicates that infants use low-level statistical regularities, or pho- notactics, to segment words from continuous speech. In this paper, we present a segmenta- tion framework that enables the direct com- parison of different phonotactic models for segmentation. We compare a model using phoneme transitional probabilities, which have been widely used in computational models, to syllable-based bigram models, which have played a prominent role in the developmental literature. We also introduce a novel estimation method, and compare it to other strategies for estimating the parameters of the phonotactic models from unsegmented data. The results show that syllable-based models outperform the phoneme models, specifically in the context of improved unsu- pervised parameter estimation. The syllable- based transitional probability model achieves a word token f-score of nearly 80%, the high- est reported performance for a phonotactic segmentation model with no lexicon.

Download PDF