Corpora

My work over the last several years includes the creation of the phonetic transcription component of the Weist-Jarosz Corpus of Child Polish, which is freely available as part of the CHILDES project on child language. The corpus includes audio recordings of spontaneous productions of four children acquiring Polish and their interactions with their primary caregivers.

The audio-linked phonetic and orthographic transcripts of the child speech can be viewed online at:

http://phonbank.talkbank.org/access/Slavic/Polish/WeistJarosz.html

where the corpus can be browsed online, downloaded in CHAT in CHAT format. The corpus is also available as part of the PhonBank project on the child phonology and is available in the Phon format as well.

Please let me know if you use the data for any projects. I would love to hear what it is being used for. If you use this corpus in published materials, please cite the following two papers for the phonological component of the corpus:

  • Gaja Jarosz, Shira Calamaro & Jason Zentz. 2016. Input Frequency and the Acquisition of Syllable Structure in Polish. In Language Acquisition. Published online May 16, 2016, http://dx.doi.org/10.1080/10489223.2016.1179743.
  • Gaja Jarosz. 2010. Implicational Markedness and Frequency in Constraint-Based Computational Models of Phonological Learning. In Journal of Child Language 37(3), Special Issue on Computational models of child language learning, 565-606. Cambridge: Cambridge University Press.
Skip to toolbar