Nyman and Tesar 2019: Determining underlying presence in the learning of grammars that allow insertion and deletion

Nyman, Alexandra and Bruce Tesar. 2019. Determining underlying presence in the learning of grammars that allow insertion and deletion. Glossa: a journal of general linguistics 4(1): 37. 1–41. DOI: https://doi.org/10.5334/gjgl.603

Abstract
The simultaneous learning of a phonological map from inputs to outputs and a lexicon of phonological underlying forms has been a focus of several research efforts (Jarosz 2006; Apoussidou 2007; Merchant 2008; Merchant & Tesar 2008; Tesar 2014). One of the numerous challenges is that of computational efficiency, which led to the investigation of learning with output-driven maps (Tesar 2014). Prior work on learning with output-driven maps has focused on systems in which the only disparities between inputs and outputs were segmental identity disparities (differences in the value of a feature). Inclusion of segmental insertion and deletion disparities exacerbates computational concerns, as it increases the number of possible correspondence relations between an input and an output, and makes the space of possible inputs for a word infinite due to the possible presence of an unbounded number of deleted segments. We propose an extension of that earlier work to handle phonologies that permit insertion and deletion, and evaluate the proposal by applying it to cases in Basic CV Syllable Theory (Jakobson 1962; Clements & Keyser 1983; Prince & Smolensky 2004). First, we propose that a learner represent information about the possible presence/absence of a segment in an underlying form via a presence feature. The presence feature can be set using the same inconsistency detection method that has previously been used to set other segmental features. This allows the learner to combine evidence from paradigmatically related words in a single compact representation. Second, we propose that the learner only consider for underlying forms segments that surface in at least one surface realization of the morpheme. This approach is justified by the structure of output-driven maps, and avoids the potential for an unbounded number of possibly deleted segments in an underlying form. A proof is given for the validity of the method for avoiding unbounded deletion. The resulting learner is able to learn some grammatical regularities about segmental insertion and deletion; this is shown via two manual step-by-step applications of the algorithm. Verificatory simulations for learning the entire typology of Basic CV Syllable Theory are left to work in the near future.