Le CNRS
Accueil  DR7 
Autres sites CNRS


              Nicolas Grimault


Perception and extraction of sound signal in context

Keywords:


Psychoacoustic, audition, pitch, auditory scene analysis, auditory streaming, temporal mechanisms, envelope, fine structure, hearing loss, auditory prostheses, cochlear implant








Nicolas Grimault

Cognition Auditive et Psychoacoustique
UMR 5020 Neurosciences et Systèmes Sensoriels
CNRS - Université Claude Bernard - Lyon 1
50 Av. Tony Garnier
69366 Lyon Cedex 07
France
Tel: 33 (0)4 37 28 74 91
Fax: 33 (0)4 37 28 76 01
nicolas.grimault@olfac.
univ-lyon1.fr

1-Introduction

    In everyday life situations, several sound sources interact to form a complex acoustical sound mixture that must be interpreted by the auditory system (Figure 1; Audio demo 1). Auditory Scene Analysis (ASA) refers to the ability of the human auditory system to segregate sounds issued from different acoustical sources in different perceptual streams and to amalgamate sounds issued from the same acoustical source in a single perceptual stream. As such, a stream is defined as the perceptual auditory object that corresponds to a single acoustic sound source (for e review see Bregman, 1990).
Most studies from this group are dedicated to decrypt the mechanisms involved in auditory scene analysis. 
Figure 1: Example of auditory scene made with two speekers. This kind of scene is named a cocktail party situation  (Cherry, 1953). In such a situation, normal hearing listeners can easily hear two steams. This is much more difficult for hearing impaired listeners.


2-Background.

2-1-Experimental methodologies.

    In laboratory conditions, it is traditionally investigated using simplified stimuli consisting of a repeating sequence of ‘‘A’’ and ‘‘B’’ tones (e.g., van Noorden, 1975); when the stimulus repetition rate is rapid enough, or the frequency separation between the ‘‘A’’ and ‘‘B’’ tones large enough, the sequence breaks down into two perceptual streams (Figure 2; Audio demo 2-a and 2-b). The minimum frequency separation between ‘‘A’’ and ‘‘B’’ tones for which two streams can be heard when the listener is trying to attend to one or the other subset of elements has been dubbed the ‘‘fission’’ boundary (van Noorden, 1975).

VN1flux                    VN2flux
Audio demo 2-a                                                                       Audio demo 2-b

Figure 2 : Time-frequency representation of a sequence ABA-ABA- as used by van Noorden (1975). The frequency spacing and the repetition rate lead to the perception of a single stream (Audio demo 2-a) or two streams (Audio demo 2-b)

2-2- Auditory streaming based on spectral cues

    While certain authors have suggested that streaming is a central phenomenon (Bregman, 1990), others have proposed that it is determined to a large extent by the functioning of peripheral mechanisms (Beauvois and Meddis, 1996). One question, in particular, concerns the role of peripheral auditory filtering in streaming. Hartmann and Johnson (1991) have proposed that beyond differences in the physical characteristics of the sounds, streaming is determined by parallel bandpass filtering, i.e., ‘‘channeling’’ of incoming sounds by the auditory periphery. Basically, sounds falling in different auditory channels are easily segregated, while sounds occupying successively the same auditory filters are less likely to be allocated to different auditory streams. This view is supported by the results of early experiments. Computer models based on this ‘‘channeling’’ principle can account successfully for a variety of experimental data on streaming (Beauvois and Meddis, 1996; McCabe and Denham, 1997). On the other hand, however, some experimental results demonstrate that signal features not related to channeling can affect stream segregation. For example, it has been shown that differences in temporal envelope between sounds having the same frequency content can promote streaming (Iverson, 1995) and that the segregation boundary can be shifted by temporal envelope factors (Singh and Bregman, 1997). Therefore, at present, the extent to which streaming depends on peripheral filtering remains unclear.

2-3-Auditory streaming based on temporal cues

In general, the channeling theory of streaming predicts that any salient difference between the excitation patterns evoked by A and B sounds would lead to a segregated percept (Hartmann and Johnson, 1991). Some previous studies have however evidenced that a sequence of sounds with similar spectral properties but with different temporal properties can be heard as segregated (e.g. Vliegen and Oxenham, 1999; Grimault et al, 2002; Roberts et al, 2002) whenever the excitation patterns evoked by the stimuli are similar. In particular, temporal cues are undoublty responsible for the segregated percept when hearing a sequence of bursts of white noises that are amplitude-modulated at widely different rates (Grimault et al, 2002; Figure 3; Audio demo 3).
StimGrimault
Fig1Grimaulttetal



2-4- Concurent speech segregation and auditory scene analysis

   Various acoustic cues induce sequential segregation (for a review, see Moore and Gockel, 2002). For complex tones sequences (as a first approximation of speech), the streaming effect seems to be influenced by two main competing factors: pitch and timbre (Bregman et al., 1990; Singh, 1987; Singh and Bregman, 1997). However, the timbre variations of the stimuli involved in those studies involved either of elimination of harmonics, or, at best, of spectral shaping with a single formant. Such conditions are far from voiced speech, which typically involve two to three formants to characterize vowels. In a more speech-oriented approach, Nooteboom et al. (1978) tested the effect of pitch 3 against silent-interval duration for sequences of synthesized vowels (/a u i/). They found that, for realistic speech rates, a pitch difference between about two and five semitones can produce stream segregation. However, the method of measurement was highly subjective and the number of subjects was low (two). Dorman et al. (1975) studied the influence of formant differences on streaming using four-item vowel sequences. They observed that the ability of subjects to perceive the sequences in the correct order was dependent upon the sequence being perceived as a single auditory stream. The authors concluded that, in the absence of formant transitions, vowel sequences of constant pitch could induce stream segregation. Based on these results and extrapolating from studies involving complex tones, it can be argued that both timbre and pitch contribute to segregation of speech stimuli. However, further investigation is required to examine the influence of a pitch difference on the tendency of sequences of vowels to form separate auditory streams. Some studies in process are designed to examine mechanisms involved in
the segregation of vowel sequences, and potential limitations to segregation associated with spectral smearing. An objective temporal order paradigm is employed in which listeners reported the order of con
stituent vowels within a sequence (Figure 4, Audio demo 4-a and 4-b).
1fluxdevoyelles                  
Audio demo 4-a                                          Audio demo 4-b
Figure 4 : Left panel: Vowel sequence with close alternating fundamental frequencies (100 et 110 Hz). A single stream is heard and the temporal order of the vowels is perceptible. Right panel: Vowel sequence with widely spaced alternating fundamental frequencies (100 et 238 Hz). Two streams are heard and the temporal order of the vowels is hardly perceptible.

2-5- audiovisual interaction in auditory scene analysis and speech segregation

    Some previous works suggested that lip reading could be a useful cue for speech perception in noise. However, the underlying mechanisms remain largely unknown. In particular, it is unclear if lip reading enhance the signal-to-nois ratio or enhance the auditory scene mechanisms. Some studies are in process to try to evidence some degree of interaction between lip reading and auditory streaming with vowels.

Read more.

2-6-Interaction between perceptual machanisms and auditory scene analysis.

Although ASA mechanisms have been extensively described in the literature, the relationships they share with other auditory processes still remain largely undetermined. Some interference, however, has been demonstrated. In particular, strong interactions exist between the mechanisms underlying pitch perception and those underlying the fusion of tonal components. The grouping of simultaneous tonal components is based upon spectral regularities, such as a regular spacing between components (Roberts & Brunstrom, 1998, 2003) or harmonicity (Hartmann et al, 1990; Hartmann and Doty, 1995). The observation that a mistuned component is perceived as a separate auditory event and that it makes a reduced contribution to the fundamental pitch of the rest of the complex tone demonstrate that the mechanisms underlying pitch perception are closely related to the perceptual fusion of spectral components (Moore, Peters and Glasberg, 1986). Additional interactions or interdependencies between ASA mechanisms and other auditory processes have also been brought to light. For example, several studies have shown that a streaming effect can significantly reduce the pitch perception impairment induced by the presentation of a temporal fringe immediately before or after a target complex (Micheyl and Carlyon, 1998; Gockel et al, 1999). The ability to detect modulation interference (MDI) across frequency regions is also known to be influenced by simultaneous and sequential grouping mechanisms (Oxenham and Dau, 2001), as well as by the degree of across-channel comodulation masking release (CMR) (Dau et al, 2004). Some studies are in progress in order to depict the relationships between auditory scene analysis and other perceptual features  (loudness, pitch, timbre...).

3-References.


Apoux, F., Crouzet, O. & Lorenzi, C. (2001) Temporal envelope expension of speech in noise for normal-hearing and hearing impaired listeners: effects on identification performance and response time. Hear. Res., 153, 123-131.
Bacon, S.,P. & Gleitman R., M. (1992) Modulation Detection in subjects with relatively flat hearing losses. J. Speech Hear. Res., 35, 642-653.
Beauvois, M., W. & Meddis, R. (1996) Computer simulation of auditory stream segregation in alternating-tone sequences, J. Acoust. Soc. Am., 99, 2270-2280.
Bregman, A., S. (1990) Auditory Scene Analysis: The perceptual Organisation of Sound (MIT, Cambridge, MA).
Bregman, A.S. et Levitan, R. (1983). Stream segregation based on fundamental frequency and spectral peak. I: Effects of Shaping by filters, Unpublished manuscript, Psychology Department, McGill University.
Bregman, A.S. et Tougas, Y. (1989). Propagation of constraints in auditory organization. Perception & Psychophysics, 46, 395-396.
Carlyon, R., P. & Datta, A., J. (1997) Excitation produced by Schroeder-phase complexes: Evidence for fast-acting compression in the auditory system. J. Acoust. Soc. Am., 101, 3636-3647.
Chatterjee, M. et Galvin III, J.J. (2002) Auditory streaming in cochlear implant listeners, J. Acoust. Soc. Am. 111, 2429.
Darwin, C.J. et Carlyon, R.P. (1995) Audirory grouping, Handbook of perception and cognition, B.C.J. Moore (Ed.), Academic press, 387-424.
Dorman, M.F., Cutting, J.E. et Raphael, L.J. (1975) Perception of temporal order in vowel sequences with and without formant transitions. J. of Exp. Psychology : Human Perc. and Perf., 1, 121-129.
Glasberg, B.R. et Moore, B.C.J. (1990). Derivation of auditory filter shapes from notched-noise data, Hearing Research, 47, 103-198.
Grimault N., Bacon S.P., Micheyl C. (2002) "Auditory stream segregation on the basis of amplitude-modulation rate", J. Acoust. Soc. Am 111, 1340-1348.
Grimault N., Micheyl C., Carlyon R.P., Artaud P., Collet L. (2001) "Perceptual auditory stream segregation of sequences of complex sounds in subjects with normal and impaired hearing", British J. of Audiol 35, 173-182.
Grimault N., Micheyl C., Carlyon R.P., Artaud P., Collet L. (2000) "Influence of peripheral resolvability on the perceptual segregation of harmonic complex tones differing in fundamental frequency", J. Acoust. Soc. Am., 108, 263-271.
Hall, J., W., Buss, E. & Grose, J., H. (1998) Discrimination of the  fundamental frequency of unresolved harmonics. J. Acoust. Soc. Am., 104, 1799.
Hartmann, W.M. et Johnson, D. (1991). Stream segregation and peripheral channeling, Mus. Perc. 9, 155-184.
Kiang, N.Y.S. (1965) Discharge patterns of single fibers in the cat’s auditory nerve. Cambridge Mass.,MIT press.
McCabe, S., L. & Denham, M., J. (1997) A model of auditory streaming, J. Acoust. Soc. Am., 101, 1611-1621.
Micheyl, C., Maison, S. & Carlyon, R., P. (1999) Contralateral suppression of transiently evoked otoacoustic emissions by harmonic complex tones in humans, J. Acoust. Soc. Am., 105, 293.
Moore, B., C., J. (1995) Perceptual consequences of cochlear damage (Oxford: University Press.).
Noteboom, S.G., Brokx, J.P.L. et De Rooij, J.J. (1978) Contributions of prosody to speech perception. In W.J.M. Levelt and G.B. Flores d’Arcais (eds.) Studies in the perception of language. Chichester: Wiley.
Plack, C.J. et Carlyon, R.P. (1995) Loudness Perception and intensity coding, In Hearing Academic Press, 123-159.
Recio, A. et Rhode, W.S. (2000) Basilar membrane responses to broadband stimuli J. Acoust. Soc. Am. 108,2281.
Roberts, B. et Brunstrom, J.M. (2003) Spectral pattern, harmonic relations, and the perceptual grouping of low-numbered components, J. Acoust. Soc. Am. 114, 2118-2134.
Rose, M.M. et Moore, B.C.J. (1997). Perceptual grouping of tone sequences by normallyhearing and hearing-impaired listeners, J. Acoust. Soc. Am. 102, 1768-1778.
Shannon, R.V, Zeng, F.G., Kamath, V., Wygonski, J. et Ekelid, M. (1995) Speech recognition with primarily temporal cues, Science 270, 303-304.
Siohan, O. (1995) Reconnaissance automatique de la parole continue en environnement bruité : application à des modèles stochastiques de trajectoires, Thèse de doctorat, Université Henri Poincaré, Nancy 1.
Smith, Z.M., Delgutte, B et Oxenham, A.J. (2002) Chimaeric sounds reveal dichotomies in auditory perception, Nature 416, 87-90.
Van Noorden, L.P.A.S. (1975). Temporal coherence in the perception of tone sequences, Unpublished Doctoral Dissertation, Technische Hogeschool Eindhovern, Eindhoven, The Netherlands.
Vliegen, J. & Oxenham, A., J. (1999) Sequential stream segregation in the absence of spectral cues. J. Acoust. Soc. Am., 105, 339-346.
Vliegen, J., Moore. B., C., J. & Oxenham, A., J. (1999) The role of spectral and periodicity cues in auditory stream segregation, measured using a temporal discrimination task. J. Acoust. Soc. Am., 106, 938-945.

4-Publications Internationales avec comité de lecture


Grimault N., Micheyl C., Carlyon R.P., Artaud P., Collet L. (2000) “ Influence of peripheral resolvability on the perceptual segregation of harmonic complex tones differing in fundamental frequency ”, J. Acoust. Soc. Am., 108, 263-271.

Grimault N., Micheyl C., Carlyon R.P., Artaud P., Collet L. (2001) “ Perceptual auditory stream segregation of sequences of complex sounds in subjects with normal and impaired hearing ”, British J. of Audiol 35, 173-182.

Grimault N., Micheyl C., Carlyon R.P., Collet L. (2002) “ Evidence for two pitch encoding mechanisms using a selective auditory training paradigm ”, Perception and Psychophysics, 64, 189-197.

Grimault N., Bacon S.P., Micheyl C. (2002) “ Auditory stream segregation on the basis of amplitude-modulation rate ”, J. Acoust. Soc. Am 111, 1340-1348.

Morand N., Garnier S.,Grimault N., Veuillet E., Collet L., Micheyl C. (2002) “ Medial olivocochlear activation and perceived auditory intensity in humans ”, Physiology and Behavior, 77, 311-320.

Bacon S.P., Grimault N., Lee J. (2002). “ Spectral integration in bands of modulated or unmodulated noise ” J. Acoust. Soc. Am., 112, 219-226.

Grimault N., Micheyl C., Carlyon R.P., S.P. Bacon, Collet L. (2003)Learning in discrimination of frequency or modulation rate: generalization to fundamental frequency discrimination ”, Hear. Res 184, 41-50.

Grimault N. (2004) “ Analyse séquentielle des scènes auditives chez le malentendant ” Revue de Neuropsychologie, 14, 25-39.

Grimault N., Gaudrain E. (2006) “The consequences of cochlear damages on auditory scene analysis”, Current Topics in Acoustical Research 2006, Vol 4. 17-24.

Hoen, M., Meunier, F., Grataloup, C.L., Grimault, N., Perrin, F., Perrot, X., Pellegrino, F., Collet, L. (2007) Phonetic and lexical interferences in informational masking during speech-in-speech comprehension. Speech Com 49, 905-916.

Gaudrain E., Grimault, N. Healy, E.W., Béra, J.C. (2007) “Effect of spectral smearing on the perceptual segregation of vowel sequences”, Hear. Res. 231, 32-41.

Gaudrain E., Grimault N., Healy E.W., Béra J.C. (2008) Streaming of vowel sequences based on fundamental frequency in a cochlear implant simulation. J. Acoust. Soc. Am., 124, 3076-3087.

Spinelli, Grimault, Meunier & Welby (2010) An intonational cue to word segmentation in phonemically identical sequences. Attention, Perception and Psychophysics, 72 (3), 775-787.

Devergie, A., Grimault, N., Tillmann, B., & Berthomier, F. (2010) Effect of rhythmic attention on the segregation of interleaved melodies. J. Acoust. Soc. Am. 128, EL1-EL7.

Devergie, A., Grimault, N., Gaudrain, E., Healy, E.W., Berthommier, F. (2011) The effect of lip-reading on primary stream segregation J. Acoust. Soc. Am.

Tillmann B, Burnham D, Nguyen S, Grimault N, Gosselin N, Peretz I (2011) Congenital amusia (or tone-deafness) interferes with pitch processing in tone languages, Frontiers in Auditory Cognitive Neuroscience.

Signoret C, Gaudrain E, Tillmann B, Grimault N and Perrin F (2011). Facilitated auditory detection for speech sounds. Front. Psychology 2:176. doi: 10.3389/fpsyg.2011.00176


5- Publications


Grimault N., Garnier S., Collet L. (1998) “ Relationship between amplification, fitting age and speech perception performance in school-age children ” Proc. of “ A sound foundation trough early amplification ”, Chicago, 1998, pp 191-198.

Grimault, N., Micheyl, C., Carlyon, R.P., Collet, L. (2000) “ Transfert d’apprentissage de la discrimination de la fréquence fondamentale ”, Actes de 5ème congrès Français d’acoustique, pp. 450-453.

Grimault, N., Micheyl, C., Carlyon, R.P., Collet, L. (2000) “ . Etude des mécanismes d'encodage de la hauteur des sons complexes au moyen du transfert d'apprentissage. ”, Actes des Journées Internationales de Sciences Cognitives d'Orsay (J.I.O.S.C.).

Grimault, N. (2004) “ Are fine structure cues an important feature for temporal streaming ? ”, Actes du 7ème congrès Français d’acoustique, pp. 383-384.

Grimault, N., Bacon, S.P., and Micheyl, C. (2005). “ Auditory streaming without spectral cues in hearing-impaired subjects, ” in Auditory signal processing: physiology, psychoacoustics, and models, edited by D. Pressnitzer, A. de Cheveigné, S. McAdams and L. Collet. Springer Verlag: New York. pp 212-220.

Grimault N., Gaudrain E. (2006) « Conséquences d'une perte auditive neurosensorielle sur l'analyse des scènes auditives. » Actes du congrès des audioprothésistes 2006.

Hoen, M., Grataloup, C., Grimault, N., Perrin, F., Perrot, X., Pellegrino, F., Meunier, F., Collet, L. (2006). Tomber le masque de l’information: effet cocktail party, masque informationnel et interférences psycholinguistiques en situation de compréhension de la parole dans la parole. Actes des XXVIemes Journées d’Etudes sur la Parole (JEP). 12-16 Juin, Dinard, France.

E Gaudrain, N Grimault, E W. Healy, J C Béra (2006) Ségrégation de séquences de voyelles avec ou sans simulation de perte auditive. Actes du 8éme congrès français d’acoustique, 24-27 Avril, Tours, France.

Grimault N., McAdams, S., Allen J.B. (2007) “ Auditory scene analysis: a prerequisite for loudness perception ”, In Hearing - From Sensory Processing to Perception edited by Kollemeier B., Klump, G., Hohmann, V., Langemann, U., Mauermann, M., Uppenkamp, S., Verhey, J. (Springer), pp 295-302.

Devergie, A., Berthommier, F., Grimault, N. (2009) Pairing audio speech and various visual displays: binding or not binding ?, International Conference on Auditory-Visual Speech Processing 2009, 10-13 September 2009, Norwich, UK

Grimault, N., Gaudrain, E. (2010) Ségrégation séquentielle et structure fine temporelle, Actes du 10iéme Congrès Français d’Acoustique, Lyon, 12-16/04/2010.

Devergie, A., Grimault, A., Berthommier, F. (2010) Infuence de la lecture labiale sur la ségrégation auditive de flux de parole, Actes du 10iéme Congrès Français d’Acoustique, Lyon, 12-16/04/2010.



5-Téléchargements

Neurosciences & Systèmes Sensoriels




 

 






Cognition Auditive et Psychoacoustique (CAP)
Auditory Cognition and Psychoacoustics 

Accueil
Equipes
Productions
Documentations
Enseignement
Partenaires
Valorisation
Offres
Annuaires
Intranet
Plan d'accès
Liens
Neurobiologie de la mémoire olfactive
Audiologie
Cognition auditive et psychoacoustique
Neuroplasticité du système olfactif et pathologie
Olfaction et alimentation
Intégration des signaux sensoriels et processus
Perception, Emotion et cognition olfactives
Equipe logistique
Cellule Administration et Gestion
Audition
Olfaction
Emploi
Stage
Recrutement de volontaires