Tag Archives: speech perception

a grant proposal on Ganong effects

I’m in the midst of revising a grant proposal on Ganong effects with my colleagues Adrian Staub and Andrew Cohen, of PBS UMass, and Amanda Rysling, for this week only still Linguistics UMass but as of 1 July 2017 Linguistics UC Santa Cruz. Congratulations Amanda! The crux of the proposal is that there is no Ganong effect, at least in the sense that a listener’s knowledge of words always and necessarily influences their performance in a phoneme categorization task in which one category makes a word with the rest of the string and the other does not. The reason for denying that the effect exists in this sense is that it comes and goes, and when it comes it varies in size, and is often absolutely tiny even when it comes. There is some evidence that the Ganong effect bigger when the categorization task is made harder, either by degrading the stimuli or by imposing a cognitive load on the listener via a simultaneous task. Those findings suggest that otherwise its size is small, perhaps even vanishingly small because the signal suffices in laboratory listening conditions, and the listener therefore needs no help from their linguistic knowledge. But even the effects of degrading the signal or imposing a cognitive load aren’t consistent.

But this presents us with at least a rhetorical problem: what general theory captures such an evanescent effect and accounts for the variation in its size? Our current strategy, which I’m convinced is the right one, because we don’t have the results we need yet, is to say that until we have the results of the proposed experiments, we cannot propose or even outline any general theory.

Nonetheless, here are some ideas. (1) A listeners’ linguistic knowledge is likely to create a response bias for the word-making category when the target sound is final in the string, because the listener has already heard all the sounds but one that are necessary to activate a word, (2) their linguistic knowledge is instead likely to influence evidence accumulation when the target is initial, as they have to wait until all the following sounds have been heard to activate a word beginning with one category and not the other. This predicts that linguistic knowledge should interact with response times differently for initial and final targets. (3) Degrading the signal with noise should increase reliance on lexical knowledge for initial and final targets similarly because it hinders recognition of the sounds that would activate a word. (4) How a cognitive load influences the size of the effect may depend on whether the load consists of a linguistic task.

Keeping words out of your ears

This is the first post of what I hope will be a regular series of posts on topics in phonetics, phonology, linguistics, and any of the many things that I can connect to these topics. At times, I will indulge in polemic, but for the most part my purpose is to write informally about what I’m thinking about these topics. Comments and competing polemics are welcome!

Lately, I’ve been trying to work out how best to follow up experiments in which we’ve pitted the listeners’ application of their linguistic knowledge against an auditory process that may be linguistically naive.

The auditory process produces perceptual contrast between the target sound and a neighboring sound, its context. (See Lotto & Kluender, 1998 in Perception & Psychophysics for the first demonstration of such effects, and Lotto & Holt, 2006, also in P&P, for a general discussion of contrast effects. Contrast is the auditory alternative to compensation for coarticulation, see Fowler, 2006, for discussion and arguments against contrast as a perceptual effect. I’ll come back to the contrast versus compensation debate in future posts. For the time being, it’s enough that the context causes the target sound to sound different than the context. I’ll describe that effect as “contrast” but it could also be described as “compensation for coarticulation.”)

For example, we have shown that listeners are more likely to respond “p” to a stop from a [p-t] continuum following [i] than following [u]. They do so because [p] ordinarily concentrates energy at much lower frequencies in the spectrum than [t] does, while [i] concentrates it at much higher frequencies than [u] does. Thus, a stop whose energy concentration is half way between [t]’s high value and [p]’s low one will sound lower, i.e. more like [p], next to a sound like [i] that concentrates energy at high frequencies (and higher, more like [t] next to a sound like [u] that concentrates energy at low frequencies).

The linguistic knowledge our experiments tested is knowledge of what’s a word. That knowledge can either cooperate with this auditory effect, as for example when the preceding context is “kee_” [ki_], where “p” but not “t” makes a word, keep, or it can conflict, as when the preceding context is “mee_” [mi_] instead, where “t” and not “p” makes a word, meet.

We describe both effects as “biases” and distinguish them as “contrast” versus “lexical” biases.  In these stimuli, the preceding [i] or [u] is the source of the contrast bias, while the consonant preceding that vowel is the source of the lexical bias.  (The lexical bias is also known in the literature as the “Ganong” effect, after William Ganong, who first described it in a 1980 paper in the Journal of Experimental Psychology: Human Perception and Performance.)

All of our experiments so far have used materials like these, where the context that creates the contrast and the one that creates the lexical bias both occur in the same syllable. (The order of target sound, context sound, and the sound that determines the lexical bias have all been manipulated. If anyone wants to know, I can provide a full list of the stimuli.) Those experiments have shown that the two biases are effectively independent of one another.

Even so, we want to separate them in the stimuli, by delaying the moment when the lexical bias determines what word it is, that is, by delaying the lexical uniqueness point. For example, the uniqueness point in the word rebate is the vowel following the [b] (compare rebound), and the uniqueness point in redress is likewise the vowel following the [d] (compare reduce). So the listener would not know these words are rebate or redress until at least one sound later.

The [b] in rebate would contrast perceptually with [i] in the first syllable, while the [d] in redress would not. Would this contrast effect make the listener more likely to hypothesize that the next sound is [b] rather than [d]? If so, how could we test it? Right now, we’re considering a phoneme monitoring experiment, where we measure how quickly the listener responds that a “b” or “d” occurs in these words. If contrast increases the expectation of a [b], then listeners should be faster to respond “yes” to rebate and slower to respond “no” to redress when the sound they’re monitoring is [b]. The opposite effect would be expected if the preceding sound were [u] rather than [i] because then the [d] and not the [b] would contrast.

An alternative is an eyetracking experiment, where we show the two words on the screen, play one of them, and measure the probability and latency of first fixations to the two words as a function of whether the context and target contrast.

A whole host of questions come up (which is largely the reason for this post):

  1. Will this work even though the target sounds are unambiguous? One reason to be hopeful that it would is that we have eyetracking data showing contrast effects with unambiguous sounds — I’ll be posting on these at another time.
  2. Is phoneme monitoring the right task?
  3. Getting more to the heart of the problem, is the uniqueness point late enough that we’d effectively separate the lexical bias from the contrast bias?
  4. It won’t surprise you to learn that the lexicon of English is not perfectly designed for the purposes of this experiment. Among other problems, it’s hard to find: (a) equal numbers of words with all the combinations of vowel and consonant place we want (vowels: front versus back, consonants: coronal versus labial), (b) as noted, words with uniqueness points that are late enough, (c) words that contrast minimally up through the target sound and its context, (d) lists which are reasonably well-balanced for lexical statistics, (e) words that our likely participants, UMass-Amherst undergraduates are likely to know, etc. etc. The question here is: how much should any of this matter? Can’t we control these properties the best we can, while making sure we get enough items, and then include possible confounding factors in the model of the results?