Representations in OT

I’ve recently had some useful discussion with people about the nature of representations in OT, and how they did or did not (or should or should not) change from a theory with inviolable constraints (= principles and parameters theory). I’d like to summarize my thoughts, and would very much welcome further discussion.

In our discussion following Tobias Scheer’s mfm fringe presentation, I brought up the point that when one switches to violable constraints, it’s not obvious that the representations should stay the same. Tobias asked for an example, and I didn’t have a good one right away, but then realized that a particularly clear and worked out one is in the discussion of extrametricality vs. nonfinality in Prince and Smolensky 1993/2004. Gaja Jarosz also reminded me of underspecification: since markedness is expressed in output constraints in OT, it’s not obvious that one also requires a theory of input underspecification for that purpose.

I think my own experience of working on my *NC project in the mid nineties illustrates some more aspects of what happened to representations as OT was being extended to segmental phonology, and brings up some further issues. When I started that project, I was looking for an explanation for the facts in terms of feature markedness and positional licensing. I wanted to get the directionality of postnasal voicing, which unlike most other local assimilation processes is L-to-R, instead R-to-L. I also wanted to get conspiracies amongst processes that resolve nasal-voiceless obstruent clusters. I tried hard and failed, and eventually “gave up” and used the formally stipulative but phonetically grounded *NC constraint. I later realized that this failure was part of a more general issue: it doesn’t seem to be the case that positional markedness can always be derived from the combination of general context-free markedness constraints and general positional licensing constraints. The best I could do in terms of those assumptions was to say that the coda nasal [voice] wanted to be licensed in onset position and hence spread, but that didn’t explain why it was just nasals that did this, nor did it deal sufficiently well with directionality. I have some more discussion of the general problems with deriving positional markedness from prosodic licensing, and further references, in a discussion of local conjunction on p. 10 of this paper (also in my review of the Harmonic Mind).

Taking the approach to segmental phonology in the *NC proposal, we can ask what that commits us to in terms of a theory of representations. It looks to me like what we would need is a feature set that is sufficiently expressive to formalize our constraints, but that’s it. Phonetic grounding is expressed as restrictions on the universal set of constraints (or as restrictions on possible rankings, in work like Steriade’s on perceptual grounding). And this set of features could be universal, with no language-specific choices (thanks to Pavel Iosad for a question on this) – contrast and its absence can be captured by ranking alone. Furthermore, I don’t think there is any sense in which this theory has the concept of a natural class (thanks to Kristine Yu for a question on that). So this perhaps at least partially explains why the nature of segmental representations has not been a big topic in at least some variants of OT.

Now I should say that I can see lots of reasons why you would want to say that features do differ from language to language, and why the particular feature set you choose could have consequences for predictions about learning and generalization. But in terms of the particular theory I’ve just described, I don’t see any arguments for language-specific differences in feature specification. I should probably also say that I see reasons why one might not want to model the role of phonetics in phonology as just stipulations about the constraint set with some armchair phonetic justification – there are clearly plenty of alternatives. But the proposal is a natural extension to general approach to encoding of substance in OT as stipulations about constraints: e.g. that there is Onset and NoCoda, but not NoOnset and Coda.

Finally, let me emphasize that what I’ve said about a lack of interest in the nature of segmental representations based on my *NC work should not be taken as representative of a lack of interest in segmental features in OT as a whole (or even in my mind!). For example, there are extremely interesting questions about the nature of the representations needed for `spreading’: Bakovic (2000: diss.) takes the position that there is no spreading, while others have defended autosegmental representations or adopted gestural representations, and yet others (e.g. Cole and Kisseberth) have proposed domain-based representations.

7 thoughts on “Representations in OT

  1. Pavel Iosad

    Thanks for this, Joe! I think our conversation veered in a slightly different direction, so I appreciate the opportunity to continue this here.

    I think there’s two points. First, re your last point, I think it’s interesting that your example — while of course a valid rebuttal of the general form of the criticism that OT phonologists ‘don’t care about representations’ that Tobias (and others, including, in so many words, myself) — doesn’t really address the problem of whether features are/should be language-specific head-on; instead, the focus is very much on the implementation of the basic insight that some segments become more like each other in the UR -> SR mapping; a question that is still to a very large extent about computation. I.e., while there are areas of phonology where representations have, strictly speaking, been quite important, segmental representations have indeed been rather sidelined as an area of live debate among OT adherents. I don’t think it’s at all coincidental that representational debates are alive and well in at least two areas where a principles-and-parameters typology allows building a rich taxonomy, easily feeding into factorial typology building, namely stress and harmony.

    This feeds into the second point. You write:

    But in terms of the particular theory I’ve just described, I don’t see any arguments for language-specific differences in feature specification.

    Such arguments have to come from whole-language analysis – a point that came up repeatedly in the fringe meeting. Without whole-language analysis, I don’t think how one can assume that an analysis offered in the course of a CUP investigation must be correct – such an analysis is valid as a hypothesis, of course, but there’s an invisible ‘assuming this ranking is correct for the language at large’ caveat that doesn’t really get explicitly discussed all that much. Even so, I don’t think it’s very hard to find counterexamples to your claim, it’s just that sociologically people don’t tend to cross-check different phenomena in the same language against each other. (My hunch is that’s rather more frequently done in dissertations, but doesn’t tend to be viewed as sufficiently interesting/important to survive the dissertation -> paper transition and so stays buried. Peter Staroverov’s talk last Friday was a nice counterexample – I’m very happy to be proven wrong, though!)

    For a specific example, consider my my paper on vowel reduction in Standard Russian. This pattern was analysed, among other patterns of VR, by Catherine Crosswhite in her treatment of reduction. It worked pretty well, but the SPE features were mildly problematic (as Jamie White notes in his dissertation, there’s a saltational mapping there), so she has to do constraint conjunction and/or build an argument about morphology. Later on, Paul de Lacy comes along and says Crosswhite’s approach is problematic, so here’s a new and better set of constraints (using the same features!), and he analyses the fairly similar pattern in the closely related Belarusian. That’s all fairly standard OT practice – examine the constraints, show they give undesirable results, rejig the constraints, and that’s the analysis improved. Fine. Features don’t really come into this as a question to be discussed at all, presumably because a lot of people assume something like what you said. But in my paper I show that de Lacy’s analysis runs into a ranking paradox unless we make some highly dubious assumptions about the phonetics and about rich base inputs. (It also seems likely that pretty much any analysis has to assume that the low vowel in Russian is [+bk] to make the reduction work, and that is already a bit of a concession to some sort of language-specific features-to-substance mapping, since it’s not back phonetically – again fine as far as it goes, but then we need a theory of these concessions!) Again this is a hunch, but if we run into this issue with a fairly ‘simple’ pattern in a relatively well-understood language like Russian then I’d be very surprised if it’s the only case like this. But people aren’t really making the arguments about this explicit, which must be the what drives this perception that OT people don’t care too much/enough about representations.

  2. Joe Pater Post author

    Thanks Pavel – there are clearly limits to how much phonology one can do in a pub late at night, and I’m glad we’re transcending those.

    I’ve realized that my statement about the universality of features in one variant of OT (let’s call it NC-OT) that you’ve highlighted was so ambiguous that it’s impossible to know what would count as evidence against it. I intended it as just a description of how the framework works, rather than trying to make a falsifiable claim, but it would be nice to figure out if it can be empirically tested against alternative ways of formalizing phonology. So let me try to clarify, and we’ll see where it gets us.

    Because we’re in standard OT, the set of features and feature combinations is universal in the sense that we can give any of them as inputs to the grammar of a given language, and test whether what comes out is well-formed in the language (Richness of the Base). If a feature doesn’t exist in some language, we filter it out by a high enough ranked constraint against it. So if we are talking about URs/Inputs, my statement so far seems to be trivially true, and if we are talking about SRs/Outputs, it’s trivially false.

    What I was actually hinting at in my comment was the notion that two segments that we want to say are “the same” phonetically have different phonological specifications across languages. For contrast vs. predictability, we don’t need differences in specification, since this can be gotten from ranking (i.e. there are no phonemes in OT). I’ll need to look more carefully at the cases you mention to see they bear on this.

    One could certainly have a variant of OT that does produce language-specific differences in SRs that merge to the same phonetic interpretation. I think immediately of Paul Boersma’s work in this context, and I especially like the fact that he models the interpretive component, as well as learning. And phonetically identical strings with different phonological structures are much used in metrical theory, and it’s not too hard to imagine language-specific differences on segmental phonological labels operating in a similar way, something Alex Nazarov has explored ( But typically at the level of segmental representation phonological outputs are WYSIWYG in OT, and let’s say that they are in NC-OT, the theory we’re trying to understand.

    Having done my PhD at McGill with Heather Goad and Glyne Piggott, and having spent lots of time talking with Elan Dresher about these issues, I’m aware of the important body of research that studies the link between contrastiveness and phonological activity of features. I don’t see any obvious way that NC-OT captures these generalizations. But the impression I get from my relatively cursory examination of that literature is that these generalizations are probabilistic, that sometimes redundant features are active (sonorant voice clearly often is). I expect that’s not how people working in contrast theory would say it; instead, the claim seems to be that if you get your model of segmental elaboration right, then the seemingly redundant features will get specified when they should be (e.g. the feature Sonorant Voice on voice-active sonorants). My own strategy in terms of theory development is to abstract from these generalizations, along with Feature Economy, until we have well-developed theories that can deal with probabilistic typological generalizations, and until we better understand the role of contrastiveness in phonological processing and learning.

    Finally, let me agree with the importance of whole language analysis, and link to a reply I made on this point to Ricardo Bermudez-Otero ( I don’t think in depth study of individual languages is antithetical to anyone’s approach to phonology. We just can’t all be doing everything all the time.

    1. Joe Pater Post author

      A couple more thoughts on this thread. First, I can report what Morris Halle said after hearing me present the NC work at NELS at MIT in 1995 (I “warmed up” for Noam Chomsky, probably because the organizers liked my title *NC in that position in the program): “There are some things we can’t explain”. At the time I had no idea what he was talking about, but now I’d like to interpret this as a profound endorsement of the approach I was taking. Let’s say that there are generalizations that can’t be “explained” by phonology, such as the ones I was trying to capture with *NC. So what we can do is stipulate them as universal constraints, and then formalize their interaction with other constraints, in order to formalize the phonology of individual languages, and study typology.

      Second, I’ve realized that the approach that Bakovic (2000) took was another example of a move that was made possible by violable constraints. Because we now had a way of properly formalizing the effects of constraints (, we could look at assimilation as just a way of satisfying the constraint that demanded that two segments have the same specification – we didn’t necessarily need to say there was spreading per se. The sour grapes problem may indicate that this was the wrong move, but we didn’t know that a priori (McCarthy’s solution uses spreading, for example – there are other possibilities that keep Agree-like constraints, like Wilson’s targeted constraints).

    2. Pavel Iosad

      Actually, I’m in agreement with much if not most of what you say here. I do think that our general theory of segmental processes should abstract away from what the specific features are. Putting finicky restrictions like “voicing does one thing but manner does another” or “final voicing is impossible but final fortition isn’t”, which are potentially explainable by grammar-external factors, into some a universal (innate?) UG, to me, lacks generality. A general theory should ideally have some restrictions on what kind of UR->SR mappings are (im)possible (as Jeff Heinz emphasized in his talk), because that has real implications for learnability (and some implications for typology) but whether the a in an expression like anbm ‘really is’ [voice] or [continuant] is probably irrelevant. Dave Odden makes much the same point about this here. This type of theory must of course underpin the ‘further’ theory of probabilistic variation within these bounds, which probably embraces the whole range of elements from Neolithic settlement patterns to learning paths to Ohalaesque transmission biases to models of the interpretative component (I particularly agree that Boersma’s model is the sort of thing we should be doing more of).

      The key quote here is this:

      But typically at the level of segmental representation phonological outputs are WYSIWYG in OT

      This seems to be taken, without discussion, as a reasonable null hypothesis. The WYSIWYG label here covers, as far as I can see, a whole range of logically independent assumptions that all come as a package and are not really being questioned. This includes, among other things, a universal set (in practice more or less the SPE set) of binary features and full specification of SRs for every feature (i.e. no surface underspecification). None of this is usually defended explicitly, but for a null hypothesis it is really quite a big set of assumptions.

      For instance, one frequent assumption in this system is that the phonological SR is WYSIWYG in terms of featural differences being directly reflected in the phonetics. Compare, for instance, Jaye Padgett’s proposal for the schizophrenic behaviour of Russian /v/ (namely that it has some feature which makes it phonologically distinct from other voiced fricatives but also phonetically distinct) with Milica Radisic’s work on Serbian /g/ which behaves like a non-stop lexically but like a stop postlexically, implemented through a combination of the contrastive hierarchy and stratalism but without the explicit phonetic commitments. (There is of course earlier serial work on Russian /v/ but there, as in Jaye’s account, the difference between /v/ and other fricatives is stipulated in order to account for the facts, while in Milica’s work on Serbian the difference flows from the inventory structure fairly directly.) Here, the predictions of a WYSIWYG mapping and a more elaborated one are different, but they are not usually discussed explicitly (or even too frequently acknowledged).

      Similarly, surface underspecification is a major departure from the SPE bundle of assumptions, and there is quite a bit of phonetic evidence out there to support it. It is used, more or less opportunistically, in various OT work (e.g. by Ricardo Bermúdez-Otero), but I’m not aware of any systematic discussion in favour of full surface specification.

      This ‘standard theory’ appears to be restrictive (in the sense that you can easily whip up a factorial typology), but it comes with a bunch of strong, not-frequently-examined assumptions, and the restrictiveness is at the wrong level – just shuffling IPA symbols about with no major learnability implications. Why would full specification (coming for free from it’s-not-exactly-clear-where) be preferable to minimal specification built on demand on the basis of explicit evidence from phonological activity? Why can’t we have surface underspecification? (And if we can, where and when does it arise?) Why assume binary features over privative ones? (The usual answer here is still ternary distinctions, but e.g. Mirco Ghini’s work showing how these can be done with privative features has been around for almost 15 years now) Why is binary OK but ternary isn’t? These sorts of questions aren’t frequently addressed, and certainly in my work I’m just trying to examine these assumptions and see if we can come up with a leaner theory of segmental processes.

      I should hasten to add that of course those of us who care about segmental representations should take on board the fact that the OT computation is able to defeat language-specific input restrictions (i.e. those that are not in GEN, unless you choose to restrict GEN). In Toronto, Daniel Hall and Sara Mackenzie have both engaged the question of how one can do OT with the contrastive hierarchy, and in Tromsø Sylvia Blaho, Islam Youssef and myself have all looked at this issue too. I don’t think it’s good enough to just draw up some nice representations and wave your hand assuming they’ll do the job automagically.

  3. Joe Pater Post author

    Thanks for the references Pavel! The theory I described seemed to me like a reasonable null hypothesis in 1995 when we were seeing that various uses of underspecification were superannuated by existing machinery in OT, but I can see how subsequent work, and even some previous work, might make it seem unreasonable. I’m glad I put it on the table for discussion.

    It looks to me that at least some uses of underspecification are WYSIWYG under the definition I was intending; a language with interpolation of nasal across an underspecified vowel as in Cohn’s work would be phonetically distinct from a language with full specification. What I see as less clearly compatible with the core NC-OT formalism at are cases like where phonetically identical [u] could differ in its features depending on what language we are analyzing.

    In the context of representations, segmental phonology and OT, I should also mention Flemming’s work – for example, the point that dispersion seems to be required to get vowel inventory markedness to work out (though how exactly we want to model that formally is a big open question).

    If I were to work on this topic more right now, I’d likely follow in Boersma and Hamann’s footsteps and model the phonology-phonetics mapping and its learning. It seems like we don’t need to abstract from that anymore, given the current state of the art.

  4. Pavel Iosad

    What I see as less clearly compatible with the core NC-OT formalism at are cases like where phonetically identical [u] could differ in its features depending on what language we are analyzing.

    I’m not entirely sure why, though — certainly at the level of the formalism there’s nothing preventing this from being done. Sure, it sits rather uncomfortably with a sort of maximalist OT position where all all cross-linguistic variation comes from ranking, but that model, while formally elegant, is under quite a bit of pressure, as Wendell in particular emphasized. I appreciate there can be differences of opinion though.

    Some of the work on this I find quite convincing (in other news, dog bites man!), quite apart from the Russian case that I have. Will Oxford’s recent paper on variation in Algonquian vowel systems is a very nice example, I think. Christian Uffmann had these really interesting examples of what’s going in the south-east of England, where GOOSE is phonetically [y] across the board while THOUGHT has raised to [u] but still produces intrusive-[r] in hiatus rather than [w], thus (from memory) sawing [suːɹɪŋ] vs. sewing [syːwɪŋ] (and if I recall correctly vocalized [l] is essentially [o] but without the intrusive-[r]). Stuff like that.

    I agree emphatically that a lot of the action there is happening between the output of the URSR mapping and ‘the phonetics’ (whatever that is), and we need an explicit theory of that mapping, which must also probably include some sort of ‘usage component’ where sociolinguistics and maybe other things kick in, cf. work by Laurel MacKenzie, Joe Fruehwald and Meredith Tamminga. But I also fail to see how acknowledging this is not a concession that the 1995 working hypothesis was insufficient/too strong. Again, there’s nothing wrong with that, it’s what hypotheses are for, but it seems the hypothesis was adopted without much explicit reflection, and its importance (and hence the import of its revision) isn’t widely appreciated. Again I’m very happy to be proven wrong!

    My starting point is in some ways the opposite of what Ricardo suggests in the neighbouring thread for productivity, which is adopt maximally rich hypotheses and throw stuff away when you can’t get away with keeping it: I think that with respect to segmental representations the space of possibilities is so large that an overly strong null hypothesis prevents us from roaming the space freely enough. In that respect the contrastivist approach is nice because it allows us to start with the absolute minimum of assumptions that we cannot get away with not having, i.e. lexical contrast, and build from there based on explicit evidence, rather than assume a whole bunch of rather inert representations for which the positive evidence is fairly thin.

  5. Joe Pater Post author

    Pavel – You’re probably right that there’s nothing in the formalism itself that stops you from having two phonetically “identical” segments with different output feature specifications. This will ultimately require, however, a richer theory of phonology-phonetics mapping. And again, many of the other uses for underspecification were supplanted in OT, so it’s not obvious that you need to develop that particular aspect of the p-p map for independent reasons. The logic here is similar to why I don’t think there is an a priori reason to prefer input underspecification approaches to exceptions over diacritic approaches in OT.

    I have to say that at this point I still wouldn’t put exploring the possibility of segmental output ambiguity as very high on my research agenda, but perhaps my mind will be changed by those who do.


Leave a Reply

Your email address will not be published. Required fields are marked *