Challenges in hearing research and technology

By Professor Torsten Dau and Associate Professor Torben Poulsen

What does "hearing" mean in our everyday lives? Hearing often means determining the sources of sound: we hear a baby cry, a person talk, piano strings vibrating and so on. What reaches our ears is a mixture of all these sources, combined with sound reflections from walls, tables and floors. From this sound soup, the auditory system segregates the different sources and picks out the relevant information. Most people with normal hearing deal with this situation effortlessly, but how people solve this cocktail-party phenomenon is still not understood. No artificial systems currently perform anywhere near as well as humans in segregating and identifying sound sources. People with hearing loss, however, often have major problems in situations with several people talking at the same time. This is not because they cannot hear the speech signals; instead they seem to have problems separating the desired sound from the background activity. One great challenge of auditory research is to understand the principles underlying the perceptual organization of sound. If researchers understand how sounds are processed, they will be in a better position to design hearing aids and cochlear implants. In addition, as sound perception and segregation is an important part of speech communication, automatic speech recognition systems can benefit greatly from improved algorithms that process the incoming signals such that they minimize interfering sounds.

Another great challenge is characterizing the quality of sound. Why are some sounds perceived as pleasant and others as annoying? Why does the general public so often discuss the official limit values for industrial noise, traffic noise, low-frequency noise or noise from neighbors? What about the loud music played by many musicians - even orchestras? Only professional musicians can refer to the occupational limit values for noise - but these limits are set for industrial noise and not for music.

Current research in hearing focuses on the principles of how people perceive sound in simple and complex sound environments, models of how auditory signals are processed and perceived, new listener-relevant techniques for measuring sound and how auditory models can be applied to technical communication systems and to clinical diagnostics.

A brief history

Denmark's research traditions in acoustics can be traced back to the early 1940s. An acoustics laboratory directed by Fritz Ingerslev was established in 1943 with the combined purposes of conducting research and testing in architectural acoustics and electroacoustics. In 1963, the laboratory became a part of the Technical University of Denmark. Denmark already had a strong position in acoustics and electroacoustics, having such well-known companies as Brüel & Kjær, Bang & Olufsen, Kirk, Oticon, Widex, Danavox, GN Resound, Madsen, Otometrics, Interacoustics, Rastronic, Peerless, Jamo and others. The products of these companies were related to the concept of acoustic communication or to reducing the noise load on people. Human perception of speech and other sounds and general research on hearing, hearing loss, loudness, annoyance and related topics became important.  

Fig. 12.1

The Technical University of Denmark initiated basic psychoacoustics, investigating how humans perceive sound, in the early 1960s, inspired by activities at the Massachusetts Institute of Technology. The topics of the courses at the Technical University of Denmark covered the anatomy and physiology of the ear, how humans perceive sound and how they can distinguish different sounds, how sounds mask each other, the implications of hearing loss, the importance of understanding speech and other topics. All these topics used human test subjects as "measuring instruments". In the beginning, this was almost controversial at the University. Why should engineers learn about physiological and psychological acoustics? Today researchers generally understand that the listener is the most important part of an electroacoustic transmission system such as a telephone, loudspeaker or hearing aid. It seems obvious that a telephone engineer should know something about the transmitted signal (speech) and something about the recipient of the transmitted signal (the human listener). 

 

How does the human auditory system process sound and how can this be measured? 

Fig. 12.2.

The Centre for Applied Hearing Research (Acoustic Technology, Ørsted•DTU) of the Technical University of Denmark has a multidisciplinary research strategy for investigating the mechanisms of hearing. The methods include: psychophysical listening experiments in various sound environments (such as the large anechoic (echo-free) room at the Technical University of Denmark (Fig. 12.1)), audiological and clinical hearing tests, modern imaging techniques to study the activity of the brain in response to sound and computational modeling of auditory processing (Fig. 12.2).

The psychophysical listening tests investigate the capabilities and limitations of the auditory system using either natural sounds (such as speech or music) or synthetic signals (such as complex tones or noises) presented by computer-controlled algorithms. The signals are typically presented over headphones in soundproof listening booths or in a free-field condition provided by an anechoic room. The tasks typically involve measuring the ability to detect sounds with a silent background or in the presence of concurrent sounds, measuring the ability to locate sounds in a mixture of interfering sources coming from different directions or measuring the intelligibility of speech with a noisy background.   

Fig. 12.3. 

Based on the experimental results from perceptual listening experiments, functional models of hearing are developed that describe human performance in the conditions tested. Fig. 12.3 shows an example of such a model of auditory signal processing and perception. The model transforms an acoustic sound pressure waveform into an (idealized) internal representation that the brain can use. The first stages of processing mimic the processing in the inner ear, the cochlea. The cochlear filtering is simulated by a band-pass filter bank. The signals at the output of the filters are half-wave rectified and then low-pass filtered. This roughly simulates the transformation of the mechanical oscillations of the basilar membrane into the neural activity of the hair cells. The low-pass filtering essentially preserves only the envelope of the signal. Feedback loops simulate the effects of neural adaptation in the auditory system. This adaptation stage emphasizes abrupt changes in the signal (as onsets and offsets) relative to the stationary portions of the signal and is inspired by similar properties observed in physiological data for animals. A second filter bank then follows at the output of each cochlear filter; it decomposes the envelope into different frequency bands tuned to different amplitude modulation rates. To simulate a human observer's ability to discriminate between two auditory stimuli, an optimal detection process is then attached to the model after these preprocessing stages. The detector analyzes the differences between the internal representations of the two stimuli. In this way, the model can be considered as imitating a human observer. The optimality of the detection process refers to the best theoretical performance in detecting signals under specific conditions. The processor is inspired by signal detection concepts developed for radar technology. Several technical applications recently considered this auditory model. For example, the model has been used as a preprocessing stage for objectively assessing and predicting speech transmission and sound quality. The model is also currently being investigated for use as a front-end in automatic speech recognition.  

Fig. 12.4

A complementary approach to studying the auditory system is to measure the response of the brain to sound stimulation by using electrodes placed on the surface of the head. The electrodes are used to measure an electrical potential that is generated by the activated neurons in the brain as a consequence of sound stimulation. These auditory-evoked potentials can be used as an objective indicator of hearing impairment. They are therefore important for clinical applications, especially among newborns and young children who cannot participate in behavioral listening tests. Evoked potentials are measured in electrically shielded chambers to eliminate interference with other electrical sources. The evoked potentials provide important information about the neural processing principles in the brain. If multi-channel recording is performed (such as using 32 recording electrodes as the left panel of Fig. 12.4 indicates), the data also allow conclusions about the likely locations of the main neural generators in the brain.

Investigating the early peaks of the evoked potentials provides valuable information in clinically diagnosing hearing disorders. These early peaks reflect the activity from the first stages of neural processing along the auditory pathway. A current challenge is to improve understanding of how the evoked potential waveforms are generated. This can be achieved by using models that make reasonable assumptions about the processing of neural signals in the auditory system. The right panel of Fig. 12.4 illustrates the schematic structure of such a model. The model simulates the transformation of a signal passing through the outer, middle and inner ear. After processing in the inner ear, the cochlea, the activity of all excited neurons is summed. This total activity is finally filtered by the assumed transfer function of the head, also called the unitary response, which reflects how the internal neural activity is "seen" from the positions of the electrodes placed on the surface of the head. The result of this filtering process represents the simulated potential in response to the given acoustic input signal. The model is also interesting for clinical applications: it can simulate the effects of various types of hearing impairments and allow the effects of such losses on the generation of evoked potential to be studied in detail. In addition, the model can be used to investigate how effective various input signals (such as transient sound pressure pulses, tones or frequency chirps) are in producing large evoked-potential magnitudes. A significant response magnitude, generated by highly synchronized activity of many neurons in the brain, is a prerequisite for clinical investigations.  

The cocktail-party problem of hearing-impaired people

Hearing loss is typically discovered at a family dinner or a cocktail party. A hearing-impaired person "suddenly" experiences being unable to understand what another person is saying - even though they can probably hear the speech as such. The listener may be confused and perhaps start guessing about what was said. Hearing-impaired people often have great difficulty, especially with more than one person talking or with background noise or reverberation. This can dramatically affect their social interaction, such as avoiding group situations. In contrast, a conversation in a relatively quiet surrounding is typically much less problematic, and almost everything can be understood. One might think that the problem could be solved by simply increasing the signal-to-noise or speech-to-noise ratio. Indeed, recent advances in hearing aid technology have addressed the challenge that speech is less intelligible in noisy situations by focusing on ways to improve the signal-to-noise ratio delivered to the listener. However, the benefit varies strongly among listeners. The challenge seems to be more complex.

The family-dinner situation was the starting-point of the Odin project at the Technical University of Denmark. The main objective was to investigate the opportunities for improving the intelligibility of speech in background noise for hearing-impaired people by means of suitable signal processing. The three hearing technology companies Oticon, Widex and GN Resound (previously Danavox), the Danish Technical Research Council and the Technical University of Denmark funded the project. Various principles for enhancing speech and/or noise reduction were investigated. Almost any signal modification can now be applied by means of signal processing, but whether such modifications benefit the users was not known. The signal-to-noise ratio could be improved by several decibels, but when the processing algorithms were tested on real test subjects, the overall results were not always satisfactory.

The intelligibility of speech depends highly on the time structure of the speech signal, especially on the fluctuations of its inherent amplitude modulations. Enhancing speech or reducing noise requires measuring the incoming signal. This must be done online and involves some measurement time. Enhancing a speech signal or reducing competing noise also entails delay. The final result is that, in the process of enhancing speech, it can be modified in a way that deteriorates the modulation needed for intelligibility - even though the speech-to-noise ratio is improved. Noise reduction does not essentially change the speech but reduces the background noise by filtering. This works very well if the background noise is stationary, but this is very rare. Instead, background noise usually fluctuates over time, and thus the filter must constantly adapt to the current noise level. Again, this implies delays in the processing and thus a modulation in the noise reduction. The Odin project demonstrated that such extra background noise fluctuation can disturb intelligibility just as much as the original noise. The project enhanced the speech-to-noise ratio by several decibels, but test listeners evaluating the signal modification said that the enhancement often made the speech less intelligible or more uncomfortable to listen to. The project also investigated whether the difficulty in understanding speech in a noisy background was a pure auditory phenomenon or whether impaired cognitive functioning could cause some of the effects. It turned out that, even if a hearing aid amplified sounds to compensate for the loss of audibility, some listeners still had problems understanding speech in noisy situations. Individual listeners with a fitted hearing aid differed widely in performance.

Choosing an appropriate compensation strategy for each hearing-impaired person requires understanding the source of this variability among the listeners. The key questions to investigate further are therefore what these other limitations are besides audibility and how they can be characterized. Our current hypothesis is that the differences in how people perceptually organize sound may explain the differences in their performance. The ability to segregate a mixture of sound sources perceptually into the different components depends on the availability of acoustic cues such as pitch or common onset. What happens to perceptual organization when the ability of the auditory system to encode these acoustical cues declines? The challenge in current research is to systematically characterize the impaired listeners in conditions associated with perceptual organization and to relate these results to their performance when they listen to speech in noisy conditions.  

Hearing technology: What is a good loudspeaker?

People commonly experience that the sound reproduced by a loudspeaker and perceived by a listener strongly depends on the listening room, the position of the loudspeaker in the room and the position of the listener in the room. This presents several questions that are of interest for the basic understanding of human perception of sound in rooms. Answers to these questions could form the basis for improving the design of loudspeakers such that the sound from the loudspeaker would be independent of the room and the position of the loudspeaker in the room. This was investigated in the Archimedes Project by Bang & Olufsen A/S of Struer, Denmark, KEF Electronics Ltd of the United Kingdom, and Acoustic Technology (Ørsted•DTU, Technical University of Denmark).

The Project investigated the timbre of monophonically reproduced sound perceived by a listener in a typical domestic room. A listening room was simulated in the large anechoic room of the Technical University of Denmark (Fig. 12.1). The simulation included a direct sound, a simulation of 17 individual reflections arriving at the listening position with a delay of less than 22 ms and a reverberant sound field consisting of all the sound arriving after 22 ms. An interactive computer-operated system produced all sounds and conducted measurements, generating and presenting signals to the test subjects and collecting their responses.

This comprehensive simulation set-up enabled experiments to be performed that would otherwise be impossible. For example, an investigation studied how floor reflection influences the sound quality perceived by a listener. In the simulated acoustics of the listening room, a listening situation with a carpet on the floor could be compared with a situation without a carpet on the floor by simply changing a switch. This test would be difficult to implement in a real listening room. The test subjects underwent several listening tests to determine how important the various reflections are for the sound quality they perceived. The floor and ceiling reflections were found to disturb perception, whereas the reflections from the walls are perceived as natural.

The psychoacoustic results from the Archimedes Project have formed the basis for new and very elaborate loudspeaker designs, and the project paved the way for the introduction of digital signal processing in loudspeakers. The most prominent example of this is the recently introduced BeoLab 3 and BeoLab 5 loudspeakers from Bang & Olufsen (Fig. 12.5).

Fig. 12.5.

How can the annoyance of sound be measured?

Even though sound and noise levels can be measured very accurately, the measurement results still do not always relate to the subjectively perceived characteristics of the noise. An example is a dripping water tap. The sound from the dripping water can be enormously annoying, but the sound pressure level is almost immeasurable.

The annoyance of sound has been a main research area at Acoustic Technology. The goal has been to find measurement methods that better describe the subjective perception of such noise as fluctuating noise (including road traffic noise) and impulsive noise. The latest has been a very successful laboratory investigation of the annoyance of low-frequency noise. In this investigation, test subjects listened to typical low-frequency noise sources, such as a gas motor in a combined heating and power plant, a high-speed ferry, distant noise from a steel rolling plant, a cooling compressor or music transmitted through a building. The sounds were presented to the test subjects at realistic sound levels: relatively low. Finally, the annoyance the listeners perceived was compared with the outcome of the objective measurement procedures.

The results showed that, for the methods tested, the method the Danish Environmental Protection Agency developed is by far the best for describing the annoyance of low-frequency sound for the average listener. This method takes the average energy in the low-frequency range into account, which the other methods do not. The annoyance ratings and the measurement values were almost linearly related (Fig. 12.6). Fig. 12.7 shows the same relationship for a special group of test subjects that find themselves annoyed by low-frequency sounds in their daily life. In this case, the annoyance ratings start around 5 and come close to the maximum rating of 10 after the sound exceeds a defined limit by a few decibels.

Fig. 12.6.

Fig. 12.7.

The special test subjects clearly differed from the ordinary test subjects in how they evaluated annoyance. The special test subjects rated annoyance close to the maximum regardless of the objective noise level. This means that a measurement method based on measuring the noise level (such as the standard European methods) will have no meaning for many people being annoyed by low-frequency noise.  

Future research and visions

The signal-processing capabilities of personal digital instruments no longer limit the realization of advanced signal-processing algorithms. Nevertheless, the kinds of signal processing that are needed and would be most beneficial have not yet been determined. Much more psychoacoustic knowledge is needed to be able to improve the performance of acoustic communication products. Developing state-of-the-art auditory models can improve understanding of how people with normal and impaired hearing function, which will help to improve the performance of such products. Improving understanding of the auditory system's processing strategies involved in perceptually organizing sound can strongly influence algorithms for signal processing in hearing aids. "Smart" hearing aids may be provided for individuals who have difficulty in segregating concurrent sounds. Algorithms incorporated in these aids would carry out segregation, based on the principles of perceptual organization, and allow the user to better hear specific sounds. Automatic speech recognition systems and automatic noise reduction systems may also benefit greatly from improved front-ends that process the incoming signals to minimize interfering sounds by using methods inspired by the human auditory system. Solving the cocktail-party problem for automatic speech recognition systems would revolutionize how people interact with computers, cars and many other devices.

Educating engineers with a background in acoustic communication, audiology, signal processing, speech processing and perception and neural modeling provides a solid basis for excellent research on these key problems. The idea of the recently established Centre for Applied Hearing Research is to provide such a background and to inspire exciting research in the multidisciplinary field of hearing science. 

More to explore

  • Moore B. An introduction to the psychology of hearing. 5th edition. New York: Academic Press, 2003.
  • Dau T. The importance of cochlear processing for the formation of auditory brainstem and frequency following responses. Journal of the Acoustical Society of America 2003: 113: 936-950.
  • Poulsen T. Annoyance of low frequency noise (LFN) in the laboratory assessed by LFN sufferers and non-sufferers. Journal of Low Frequency Noise, Vibration and Active Control 2003: 22: 191-201.
  • Bech S. Spatial aspects of reproduced sound in small rooms. Journal of the Acoustical Society of America 1998: 103: 434-445.
  • Dau T, Kollmeier B, Kohlrausch A. Modeling auditory processing of amplitude modulation. Journal of the Acoustical Society of America 1997: 102: 2892-2905.
  • Bregman AS. Auditory scene analysis. The perceptual organization of sound. Cambridge: MIT Press, 1990.  
 

Torsten Dau

Diploma degree in Physics, University of Göttingen (1992), doctoral degree in Physics (1996), University of Oldenburg and habilitation degree (dr. rer. nat. Habil.) in Applied Physics, University of Oldenburg (2003). Dissertation: Physical principles in auditory signal processing and perception. Visiting scientist, Department of Biomedical Engineering, Boston University and the Research Laboratory of Electronics, Massachusetts Institute of Technology, 1999-2000. Professor of Acoustics and Audiology and head, Centre for Applied Hearing Research, Acoustic Technology, Ørsted•DTU since May 2003.

     
 Torben Poulsen

Torben Poulsen

MSc, Technical University of Denmark. After research projects on the perception of short-duration sounds, faculty member, Technical University of Denmark since 1975. Research: sound perception by normal-hearing and hearing-impaired people, psychoacoustics, speech intelligibility, technical audiology, hearing protector measurements and noise annoyance. Chair, Danavox Jubilee Foundation. Chair, International Organization for Standardization Working Group 17 on hearing protector measurement methods (under Technical Committee 43, Acoustics, Subcommittee 1, Noise). Chair, Education Committee, Ørsted•DTU since 2001.



 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figures:

Fig. 12.1.

Some of the perceptual experiments take place in the large anechoic room at the Technical University of Denmark. The test subject sits in the chair (here replaced by an artificial torso). The room is especially interesting for studying spatial hearing, such as measuring speech intelligibility in the presence of interfering sounds coming from loudspeakers at different positions in the room. The yellow spheres are loudspeakers used to produce the sound signal being tested and the interfering sound such as reflections or reverberant sound. During the measurements with a real test subject, the curtain around the test subject is raised to hide the loudspeakers and the anechoic room. The test subject enters the room through the white "tunnel" at the lower right.

Fig. 12.2

The Centre for Applied Hearing Research at Ørsted•DTU investigates how the auditory system works using the various methods shown here. Based on psychoacoustic listening experiments, functional models of hearing are developed. The brain activities associated with perception are studied using such brain imaging techniques as electroencephalography (EEG) and magnetoencephalography (MEG). The results are important for applications in hearing aids and cochlear implants, speech perception, clinical audiology and sound reproduction.

Fig. 12.3

Example of a computational model of auditory signal processing. The incoming sound is first processed by stages that simulate the transformation of the inner ear (the cochlea). These stages include the mechanical band-pass filtering on the basilar membrane, the transformation from mechanical to neural activity in the inner hair cells (simulated by half-wave rectification and low-pass filtering) and the effects of neural adaptation (simulated by feedback loops). It follows a modulation filter bank and the addition of internal noise activity. For comparison with behavioral data, the internal representation of the signal is subjected to a detection device, realized as an optimal detector. This model can be used as a front-end, such as in automatic speech recognition systems. Source: reprinted with permission from Dau T, Kollmeier B, Kohlrausch A. Modeling auditory processing of amplitude modulation. Journal of the Acoustical Society of America 1997: 102: 2892-2905. Copyright 2004, Acoustical Society of America.

Fig. 12.4

Acoustically evoked brain potentials are recorded in special electrically shielded chambers. This technique is interesting for developing new "objective" hearing tests, such as for newborns and young children who cannot participate in behavioral listening experiments. Left. Spatial diagram of an overhead view of a head and the temporal waveforms of the potentials recorded at 32 electrodes placed at the surface of the head. The analysis of the potential waveforms provides information about the likely position of the corresponding neural generators. Right. Functional model for the generation of evoked potentials. Specific assumptions are made about how the sound is transformed in the middle and inner ear and how the assumed internal activity in the brain can be related to the recorded activity at the electrodes. A realistic simulation of the neural processes in the inner ear, as indicated in the dashed box, is especially important for correctly simulating the recorded brain potentials. Source: reprinted with permission from Dau T. The importance of cochlear processing for the formation of auditory brainstem and frequency following responses. Journal of the Acoustical Society of America 2003: 113: 936-950. Copyright 2004, Acoustical Society of America.

Fig. 12.5

New loudspeakers that use the results from the Archimedes Project. The acoustic lenses on the upper part of the loudspeakers force the sound to be directed into a broad angle in the horizontal plane - exactly as suggested by the results of the Archimedes Project.

Fig. 12.6

Assessment of the annoyance of low-frequency sounds on a scale of 0 to 10 among ordinary test subjects according to the exceedance of Denmark's limit values for low-frequency noise. The relationship is almost linear.

Fig. 12.7

Assessment of the annoyance of low-frequency sounds on a scale of 0 to 10 among a group of special test subjects according to the exceedance of Denmark's limit values for low-frequency noise.