Perception and extraction of sound signal in context
Psychoacoustic, audition, pitch, auditory scene analysis, auditory
streaming, temporal mechanisms, envelope, fine structure, hearing loss,
auditory prostheses, cochlear implant
Cognition Auditive et Psychoacoustique
UMR 5020 Neurosciences et Systèmes Sensoriels
CNRS - Université Claude Bernard - Lyon 1
50 Av. Tony Garnier
69366 Lyon Cedex 07
33 (0)4 37 28 74 91
Fax: 33 (0)4 37 28 76 01
In everyday life
situations, several sound sources interact to form a complex acoustical
sound mixture that must be interpreted by the auditory system (Figure
1; Audio demo 1). Auditory Scene Analysis (ASA) refers to the ability
of the human auditory system to segregate sounds issued from different
acoustical sources in different perceptual streams and to amalgamate
sounds issued from the same acoustical source in a single perceptual
stream. As such, a stream is defined as the perceptual auditory object
that corresponds to a single acoustic sound source (for e review
see Bregman, 1990).
Most studies from this group are dedicated to decrypt the
mechanisms involved in auditory scene analysis.
Example of auditory scene made with two speekers. This kind of scene is
named a cocktail party situation (Cherry, 1953). In such a
situation, normal hearing listeners can easily hear two steams. This is
much more difficult for hearing impaired listeners.
In laboratory conditions, it is
traditionally investigated using simplified stimuli consisting of a
repeating sequence of ‘‘A’’ and
‘‘B’’ tones (e.g., van Noorden,
1975); when the stimulus repetition rate is rapid enough, or the
frequency separation between the
‘‘B’’ tones large enough, the
sequence breaks down into two perceptual streams (Figure 2; Audio demo
2-a and 2-b). The minimum frequency separation between
‘‘B’’ tones for which two
streams can be heard when the listener is trying to attend to one or
the other subset of elements has been dubbed the
‘‘fission’’ boundary (van
Time-frequency representation of a sequence ABA-ABA- as used
by van Noorden (1975). The frequency spacing and the repetition rate
lead to the perception of a single stream (Audio demo 2-a) or two
streams (Audio demo 2-b)
2-2- Auditory streaming based on spectral cues
While certain authors have suggested
that streaming is a central phenomenon (Bregman, 1990), others have
proposed that it is determined to a large extent by the functioning of
peripheral mechanisms (Beauvois and Meddis, 1996). One question, in
particular, concerns the role of peripheral auditory filtering in
streaming. Hartmann and Johnson (1991) have proposed that beyond
differences in the physical characteristics of the sounds, streaming is
determined by parallel bandpass filtering, i.e.,
‘‘channeling’’ of incoming
sounds by the auditory periphery. Basically, sounds falling in
different auditory channels are easily segregated, while sounds
occupying successively the same auditory filters are less likely to be
allocated to different auditory streams. This view is supported by the
results of early experiments. Computer models based on this
‘‘channeling’’ principle can
account successfully for a variety of experimental data on streaming
(Beauvois and Meddis, 1996; McCabe and Denham, 1997). On the other
hand, however, some experimental results demonstrate that signal
features not related to channeling can affect stream segregation. For
example, it has been shown that differences in temporal envelope
between sounds having the same frequency content can promote streaming
(Iverson, 1995) and that the segregation boundary can be shifted by
temporal envelope factors (Singh and Bregman, 1997). Therefore, at
present, the extent to which streaming depends on peripheral filtering
2-3-Auditory streaming based on temporal cues
In general, the
channeling theory of streaming predicts that any salient difference
between the excitation patterns evoked by A and B sounds would lead to
a segregated percept (Hartmann and Johnson, 1991). Some previous
studies have however evidenced that a sequence of sounds with similar
spectral properties but with different temporal properties can be heard
as segregated (e.g. Vliegen and Oxenham, 1999; Grimault et al, 2002;
Roberts et al, 2002) whenever the excitation patterns evoked by the
stimuli are similar. In particular, temporal cues are undoublty
responsible for the segregated percept when hearing a sequence of
bursts of white noises that are amplitude-modulated at widely different
rates (Grimault et al, 2002; Figure 3; Audio demo 3).
2-4- Concurent speech segregation and auditory scene analysis
Various acoustic cues induce sequential
segregation (for a review, see Moore and Gockel, 2002). For complex
tones sequences (as a first approximation of speech), the streaming
effect seems to be influenced by two main competing factors: pitch and
timbre (Bregman et al., 1990; Singh, 1987; Singh and Bregman, 1997).
However, the timbre variations of the stimuli involved in those studies
involved either of elimination of harmonics, or, at best, of spectral
shaping with a single formant. Such conditions are far from voiced
speech, which typically involve two to three formants to characterize
vowels. In a more speech-oriented approach, Nooteboom et al. (1978)
tested the effect of pitch 3 against silent-interval duration for
sequences of synthesized vowels (/a u i/). They found that, for
realistic speech rates, a pitch difference between about two and five
semitones can produce stream segregation. However, the method of
measurement was highly subjective and the number of subjects was low
(two). Dorman et al. (1975) studied the influence of formant
differences on streaming using four-item vowel sequences. They observed
that the ability of subjects to perceive the sequences in the correct
order was dependent upon the sequence being perceived as a single
auditory stream. The authors concluded that, in the absence of formant
transitions, vowel sequences of constant pitch could induce stream
segregation. Based on these results and extrapolating from studies
involving complex tones, it can be argued that both timbre and pitch
contribute to segregation of speech stimuli. However, further
investigation is required to examine the influence of a pitch
difference on the tendency of sequences of vowels to form separate
auditory streams. Some studies in process are designed to
examine mechanisms involved in
the segregation of vowel sequences, and potential limitations to
segregation associated with spectral smearing. An objective temporal
order paradigm is employed in which listeners reported the order of con
stituent vowels within a sequence (Figure 4, Audio demo 4-a and 4-b).
4 : Left
panel: Vowel sequence with close alternating fundamental
frequencies (100 et 110 Hz). A single stream is heard and the
temporal order of the vowels is perceptible. Right panel: Vowel
sequence with widely spaced alternating fundamental
frequencies (100 et 238 Hz). Two streams are heard
and the temporal order of the vowels
is hardly perceptible.
2-5- audiovisual interaction in auditory scene
analysis and speech segregation
Some previous works suggested that lip
reading could be a useful cue for speech perception in noise. However,
the underlying mechanisms remain largely unknown. In
particular, it is unclear if lip reading enhance the
signal-to-nois ratio or enhance the auditory scene mechanisms. Some
studies are in process to try to evidence some degree of interaction
between lip reading and auditory streaming with vowels.
2-6-Interaction between perceptual machanisms and auditory
Although ASA mechanisms
have been extensively described in the literature, the relationships
they share with other auditory processes still remain largely
undetermined. Some interference, however, has been demonstrated. In
particular, strong interactions exist between the mechanisms underlying
pitch perception and those underlying the fusion of tonal components.
The grouping of simultaneous tonal components is based upon spectral
regularities, such as a regular spacing between components (Roberts
& Brunstrom, 1998, 2003) or harmonicity (Hartmann et al, 1990;
Hartmann and Doty, 1995). The observation that a mistuned component is
perceived as a separate auditory event and that it makes a reduced
contribution to the fundamental pitch of the rest of the complex tone
demonstrate that the mechanisms underlying pitch perception are closely
related to the perceptual fusion of spectral components (Moore, Peters
and Glasberg, 1986). Additional interactions or interdependencies
between ASA mechanisms and other auditory processes have also been
brought to light. For example, several studies have shown that a
streaming effect can significantly reduce the pitch perception
impairment induced by the presentation of a temporal fringe immediately
before or after a target complex (Micheyl and Carlyon, 1998; Gockel et
al, 1999). The ability to detect modulation interference (MDI) across
frequency regions is also known to be influenced by simultaneous and
sequential grouping mechanisms (Oxenham and Dau, 2001), as well as by
the degree of across-channel comodulation masking release (CMR) (Dau et
al, 2004). Some studies are in progress in order to depict the
relationships between auditory scene analysis and other perceptual
features (loudness, pitch, timbre...).
Apoux, F., Crouzet, O. & Lorenzi, C. (2001) Temporal envelope
expension of speech in noise for normal-hearing and hearing impaired
listeners: effects on identification performance and response time.
Hear. Res., 153, 123-131.
Bacon, S.,P. & Gleitman R., M. (1992) Modulation Detection in
subjects with relatively flat hearing losses. J. Speech Hear. Res., 35,
Beauvois, M., W. & Meddis, R. (1996) Computer simulation of
auditory stream segregation in alternating-tone sequences, J. Acoust.
Soc. Am., 99, 2270-2280.
Bregman, A., S. (1990) Auditory Scene Analysis: The perceptual
Organisation of Sound (MIT, Cambridge, MA).
Bregman, A.S. et Levitan, R. (1983). Stream segregation based on
fundamental frequency and spectral peak. I: Effects of Shaping by
filters, Unpublished manuscript, Psychology Department, McGill
Bregman, A.S. et Tougas, Y. (1989). Propagation of constraints in
auditory organization. Perception & Psychophysics, 46, 395-396.
Carlyon, R., P. & Datta, A., J. (1997) Excitation produced by
Schroeder-phase complexes: Evidence for fast-acting compression in the
auditory system. J. Acoust. Soc. Am., 101, 3636-3647.
Chatterjee, M. et Galvin III, J.J. (2002) Auditory streaming in
cochlear implant listeners, J. Acoust. Soc. Am. 111, 2429.
Darwin, C.J. et Carlyon, R.P. (1995) Audirory grouping, Handbook of
perception and cognition, B.C.J. Moore (Ed.), Academic press, 387-424.
Dorman, M.F., Cutting, J.E. et Raphael, L.J. (1975) Perception of
temporal order in vowel sequences with and without formant transitions.
J. of Exp. Psychology : Human Perc. and Perf., 1, 121-129.
Glasberg, B.R. et Moore, B.C.J. (1990). Derivation of auditory filter
shapes from notched-noise data, Hearing Research, 47, 103-198.
N., Bacon S.P., Micheyl C. (2002) "Auditory stream segregation on the
basis of amplitude-modulation rate
", J. Acoust. Soc. Am 111, 1340-1348.
N., Micheyl C., Carlyon R.P., Artaud P., Collet L. (2001) "Perceptual
auditory stream segregation of sequences of complex sounds in subjects
with normal and impaired hearing
", British J. of Audiol 35, 173-182.
N., Micheyl C., Carlyon R.P., Artaud P., Collet L. (2000) "Influence of
peripheral resolvability on the perceptual segregation of harmonic
complex tones differing in fundamental frequency
", J. Acoust. Soc. Am.,
Hall, J., W., Buss, E. & Grose, J., H. (1998) Discrimination of
the fundamental frequency of unresolved harmonics. J. Acoust.
Soc. Am., 104, 1799.
Hartmann, W.M. et Johnson, D. (1991). Stream segregation and peripheral
channeling, Mus. Perc. 9, 155-184.
Kiang, N.Y.S. (1965) Discharge patterns of single fibers in the
cat’s auditory nerve. Cambridge Mass.,MIT press.
McCabe, S., L. & Denham, M., J. (1997) A model of auditory
streaming, J. Acoust. Soc. Am., 101, 1611-1621.
Micheyl, C., Maison, S. & Carlyon, R., P. (1999) Contralateral
suppression of transiently evoked otoacoustic emissions by harmonic
complex tones in humans, J. Acoust. Soc. Am., 105, 293.
Moore, B., C., J. (1995) Perceptual consequences of cochlear damage
(Oxford: University Press.).
Noteboom, S.G., Brokx, J.P.L. et De Rooij, J.J. (1978) Contributions of
prosody to speech perception. In W.J.M. Levelt and G.B. Flores
d’Arcais (eds.) Studies in the perception of language.
Plack, C.J. et Carlyon, R.P. (1995) Loudness Perception and intensity
coding, In Hearing Academic Press, 123-159.
Recio, A. et Rhode, W.S. (2000) Basilar membrane responses to
broadband stimuli J. Acoust. Soc. Am. 108,2281.
Roberts, B. et Brunstrom, J.M. (2003) Spectral pattern, harmonic
relations, and the perceptual grouping of low-numbered components, J.
Acoust. Soc. Am. 114, 2118-2134.
Rose, M.M. et Moore, B.C.J. (1997). Perceptual grouping of tone
sequences by normallyhearing and hearing-impaired listeners, J. Acoust.
Soc. Am. 102, 1768-1778.
Shannon, R.V, Zeng, F.G., Kamath, V., Wygonski, J. et Ekelid, M. (1995)
Speech recognition with primarily temporal cues, Science 270, 303-304.
Siohan, O. (1995) Reconnaissance automatique de la parole continue en
environnement bruité : application à des
stochastiques de trajectoires, Thèse de doctorat,
Université Henri Poincaré, Nancy 1.
Smith, Z.M., Delgutte, B et Oxenham, A.J. (2002) Chimaeric sounds
reveal dichotomies in auditory perception, Nature 416, 87-90.
Van Noorden, L.P.A.S. (1975). Temporal coherence in the perception of
tone sequences, Unpublished Doctoral Dissertation, Technische
Hogeschool Eindhovern, Eindhoven, The Netherlands.
Vliegen, J. & Oxenham, A., J. (1999) Sequential stream
in the absence of spectral cues. J. Acoust. Soc. Am., 105, 339-346.
Vliegen, J., Moore. B., C., J. & Oxenham, A., J. (1999) The
spectral and periodicity cues in auditory stream segregation, measured
using a temporal discrimination task. J. Acoust. Soc. Am., 106,
4-Publications Internationales avec comité de lecture
Grimault N., Micheyl C., Carlyon R.P., Artaud P.,
Collet L. (2000) “ Influence of peripheral resolvability on
the perceptual segregation of harmonic complex tones
differing in fundamental frequency ”, J. Acoust. Soc. Am.,
Grimault N., Micheyl C., Carlyon
R.P., Artaud P., Collet L. (2001) “ Perceptual auditory stream segregation
of sequences of complex sounds in subjects with normal
and impaired hearing ”, British J. of Audiol
Grimault N., Micheyl C., Carlyon
R.P., Collet L. (2002) “ Evidence for two pitch
encoding mechanisms using a selective auditory training
paradigm ”, Perception and Psychophysics, 64,
Grimault N., Bacon S.P., Micheyl C. (2002)
“ Auditory stream segregation on the basis of
amplitude-modulation rate ”, J.
Acoust. Soc. Am 111,
Morand N., Garnier S.,Grimault N.,
Veuillet E., Collet L., Micheyl C. (2002) “ Medial
olivocochlear activation and perceived auditory intensity
in humans ”, Physiology and
Behavior, 77, 311-320.
Bacon S.P., Grimault N.,
Lee J. (2002). “ Spectral integration in bands of
modulated or unmodulated noise ” J. Acoust. Soc. Am.,
Grimault N., Micheyl C., Carlyon R.P., S.P. Bacon,
Collet L. (2003) “ Learning in discrimination of frequency or modulation rate: generalization to fundamental frequency discrimination ”, Hear.
Res 184, 41-50.
Grimault N. (2004)
“ Analyse séquentielle des scènes auditives chez le
malentendant ” Revue de Neuropsychologie, 14,
Grimault N., Gaudrain E. (2006)
“The consequences of cochlear damages on auditory scene
analysis”, Current Topics in Acoustical Research 2006, Vol
Hoen, M., Meunier, F., Grataloup, C.L., Grimault,
N., Perrin, F., Perrot, X., Pellegrino, F., Collet,
L. (2007) Phonetic and lexical interferences in
informational masking during speech-in-speech
comprehension. Speech Com 49, 905-916.
Gaudrain E., Grimault, N. Healy,
E.W., Béra, J.C. (2007) “Effect of spectral
smearing on the perceptual segregation of vowel
sequences”, Hear. Res. 231, 32-41.
Gaudrain E., Grimault N., Healy E.W., Béra
J.C. (2008) Streaming of vowel sequences based on
fundamental frequency in a cochlear implant simulation. J.
Acoust. Soc. Am., 124, 3076-3087.
Spinelli, Grimault, Meunier & Welby
(2010) An intonational cue to word segmentation in
phonemically identical sequences. Attention, Perception
and Psychophysics, 72 (3), 775-787.
Devergie, A., Grimault, N., Tillmann, B.,
& Berthomier, F. (2010) Effect of rhythmic attention
on the segregation of interleaved melodies. J. Acoust.
Soc. Am. 128, EL1-EL7.
Devergie, A., Grimault, N.,
Gaudrain, E., Healy, E.W., Berthommier, F. (2011)
The effect of lip-reading on primary stream segregation J.
Acoust. Soc. Am.
Tillmann B, Burnham D, Nguyen S, Grimault
N, Gosselin N, Peretz I (2011) Congenital
amusia (or tone-deafness) interferes with pitch processing
in tone languages, Frontiers in Auditory Cognitive
Signoret C, Gaudrain E, Tillmann B, Grimault
N and Perrin F (2011). Facilitated auditory
detection for speech sounds. Front. Psychology 2:176. doi:
Grimault N., Garnier S., Collet L.
(1998) “ Relationship between amplification,
fitting age and speech perception performance in
school-age children ” Proc. of “ A sound foundation
trough early amplification ”, Chicago, 1998, pp
Grimault, N., Micheyl, C., Carlyon,
R.P., Collet, L. (2000) “ Transfert
d’apprentissage de la discrimination de la fréquence
fondamentale ”, Actes de 5ème congrès Français
d’acoustique, pp. 450-453.
Grimault, N., Micheyl, C., Carlyon, R.P., Collet, L. (2000) “ .
Etude des mécanismes d'encodage de la hauteur des
sons complexes au moyen du transfert d'apprentissage. ”,
Actes des Journées Internationales de Sciences Cognitives
Grimault, N. (2004) “ Are
fine structure cues an important feature for temporal
streaming ? ”, Actes du 7ème congrès Français
d’acoustique, pp. 383-384.
Grimault, N., Bacon, S.P., and Micheyl, C. (2005).
“ Auditory streaming without spectral cues in
hearing-impaired subjects, ” in Auditory signal
processing: physiology, psychoacoustics, and models,
edited by D. Pressnitzer, A. de Cheveigné, S. McAdams
and L. Collet. Springer Verlag: New York. pp
Grimault N., Gaudrain E. (2006)
« Conséquences d'une perte auditive neurosensorielle sur
l'analyse des scènes auditives. » Actes du congrès des
Hoen, M., Grataloup, C., Grimault, N.,
Perrin, F., Perrot, X., Pellegrino, F., Meunier, F.,
Collet, L. (2006). Tomber le masque de
l’information: effet cocktail party, masque informationnel
et interférences psycholinguistiques en situation de
compréhension de la parole dans la parole. Actes des
XXVIemes Journées d’Etudes sur la Parole (JEP). 12-16
Juin, Dinard, France.
E Gaudrain, N Grimault, E W.
Healy, J C Béra (2006) Ségrégation de séquences de
voyelles avec ou sans simulation de perte auditive. Actes du 8éme congrès français
d’acoustique, 24-27 Avril, Tours, France.
Grimault N., McAdams, S., Allen
J.B. (2007) “ Auditory scene analysis: a
prerequisite for loudness perception ”, In Hearing - From
Sensory Processing to Perception edited by Kollemeier B.,
Klump, G., Hohmann, V., Langemann, U., Mauermann, M.,
Uppenkamp, S., Verhey, J. (Springer), pp 295-302.
Devergie, A., Berthommier, F., Grimault,
N. (2009) Pairing audio speech and various visual
displays: binding or not binding ?, International
Conference on Auditory-Visual Speech Processing 2009,
10-13 September 2009, Norwich, UK
Grimault, N., Gaudrain, E. (2010)
Ségrégation séquentielle et structure fine temporelle,
Actes du 10iéme Congrès Français d’Acoustique,
Devergie, A., Grimault, A., Berthommier,
F. (2010) Infuence de la lecture labiale sur la
ségrégation auditive de flux de parole, Actes du 10iéme
Congrès Français d’Acoustique, Lyon, 12-16/04/2010.