Centre for

UK Speech: Special Interest Workshop

Enhancement of Degraded Speech: Processing, Modelling, Evaluation

October 31st, 2012
Imperial College London


This workshop aims to provide a focus for current research on speech enhancement in university, commercial and government organizations. Whilst addressing principally the topic of speech enhancement, the scope of the workshop includes the processes by which speech signals are degraded and the manner in which humans extract intelligible speech, since these are crucially important in aiding our understanding and development of new enhancement methodologies.



Room 611, Electrical and Electronic Engineering, Exhibition Road, London SW7 2AZ. Maps and Directions.


Tue 30th October

19:30 Workshop Dinner (venue near Imperial to be announced, please email p.naylor@imperial.ac.uk for details)

Wed 31st October

9:30 Registration and coffee
10:00 Keynote: Rainer Martin, Ruhr-Universität Bochum
The Acoustic Signal Processing Challenge: Enabling Communication in Adverse Conditions
11:00 Poster Session 1
11:50 Keynote: Jon Barker, University of Sheffield
Towards Speech Enhancement by Statistical Speech Resynthesis
12:20 Keynote: Mike Brookes, Imperial College London
Enhancement of Very Noisy Speech
12:50 Lunch break (Buffet lunch provided)
14:00 Keynote: Jonathon Chambers, Loughborough University
Video-aided model-based source separation in reverberant rooms
14:30 Poster Session 2
15:00 Keynote: Ben Milner, University of East Anglia
Speech enhancement by reconstruction from acoustic speech features
15:30 Keynote: Gaston Hilkhuysen, University College London
Effects of noise reduction on speech intelligibility
16:00 End

If you would like to present a poster at the workshop, please mail a title and abstract to p.naylor@imperial.ac.uk.


Full Registration: £36
Technical sessions, sandwich lunch, tea/coffee breaks.

Student Registration: £20
Technical sessions, sandwich lunch, tea/coffee breaks.


Accommodation can be booked separately in nearby hotels.

Workshop Materials

Posters and presentations will be made available to registrants online.

Organizing Committee

Mike Brookes and Patrick A. Naylor (Imperial College London), Mark Huckvale and Gaston Hilkhuysen (University College London).

Contact for further information: p.naylor@imperial.ac.uk

Keynote Abstracts

The Acoustic Signal Processing Challenge: Enabling Communication in Adverse Conditions

Rainer Martin, Ruhr-Universität Bochum

With the broad proliferation of mobile communications we have become used to communicating in notoriously difficult acoustic scenarios. Ambient noise, reverberation, and echoes all contribute to a significantly degraded communication experience. Furthermore, the impact of these factors becomes significantly worse when participants suffer from a hearing loss. Nevertheless, voice and audio communication devices, such as smartphones, headsets and hearing instruments are frequently used in these adverse conditions and are expected to enable effortless communication.

This talk discusses challenges in audio signal processing as they are frequently encountered in difficult communication scenarios. While the design of algorithms is often inspired by high-level processing paradigms such as Auditory Scene Analysis (Bregman, 1990), the constraints of real-world applications and devices must also be taken into account. In many cases rather strict requirements result from the size of the device, the power budget, and the admissible processing latency. This talk will discuss examples and applications in the area of single- and multi-channel speech enhancement and will highlight recent developments and solutions to these tasks.

Towards Speech Enhancement by Statistical Speech Resynthesis

Jon Barker, University of Sheffield

This talk will discuss recent speech resynthesis approaches to speech enhancement. These approaches contrast with noise--subtractive techniques by employing a model of clean speech to effectively resynthesise an uncorrupted version of the signal. Previous versions of this approach have used concatenative synthesis systems coupled with a model of how noise distorts the speech. However, the recent convergence of speech synthesis and speech recognition on similar hidden Markov modelling techniques offers up new opportunities. The talk will present a speech enhancement-by-resynthesis framework whose strength lies in the sharing of a common statistical speech model for both the analysis and re-synthesis stages. The approach takes advantage of auditory-based spectral-temporal representations which are able to crudely estimate perceptual masking. Missing data speech recognition techniques are employed to analyse the masked speech. Information is passed to HMM synthesis algorithms that reconstruct the masked spectro--temporal regions prior to conversion back into the signal domain. I will present a very early implementation of the concept and demonstrate its performance on a simple speech dataset in stationary noise conditions. The talk will conclude by discussing what needs to be done in order to progress from this simple demonstration to an application of the concept in more realistic conditions. In particular, I will be considering the problems raised by unpredictable non--stationary noise and motivating the discussion using data from the CHiME Speech Separation and Recognition challenges.

Enhancement of Very Noisy Speech

Mike Brookes, Imperial College London

Awaiting abstract.

Video-aided model-based source separation in reverberant rooms

Jonathon Chambers, Loughborough University

Source separation algorithms which utilize only audio data may perform poorly if multiple sources or reverberation are present. In this talk, a video-aided model-based source separation algorithm for a two-channel reverberant recording in which the sources are assumed static will be described. By exploiting cues from video, we first localize individual speech sources in the enclosure and then estimate their directions. The interaural spatial cues, the interaural phase difference and the interaural level difference, as well as the mixing vectors are probabilistically modelled. Simulation results will confirm that by exploiting the visual modality the proposed algorithm can yield improved time-frequency masks thereby giving improved source estimates. This advantage makes our algorithm a suitable candidate for use in underdetermined highly reverberant settings where the performance of other audio--only and audio--visual methods is limited.

Speech enhancement by reconstruction from acoustic speech features

Ben Milner, University of East Anglia

Most conventional methods of speech enhancement operate by removing a noise estimate from a noisy speech signal to leave an estimate of clean speech. This talk presents an alternative method where a clean speech signal is reconstructed from a set of acoustic speech features and a model of speech production. Many different models of speech production exist and could be applied to this problem although in this talk the sinusoidal model is discussed. A major challenge for this approach is to obtain accurate estimates of the acoustic speech features needed to drive the model, which include voicing classification, fundamental frequency, spectral envelope and phase. A MAP method is proposed to enable accurate estimation of these features from noise contaminated speech. Current evaluations of this approach reveal very good suppression of the noise but with some distortion of the speech signal. Demonstrations of the proposed method in comparison to conventional methods of speech enhancement will be made as well as the results of subjective listening tests.

Effects of noise reduction on speech intelligibility

Gaston Hilkhuysen, University College London

Noise reduction (NR) systems attempt to attenuate the noisy bits of a noisy speech signal while preserving the fragments containing speech. Increasing computational power of audio devices like mobile phones and hearing aids has led to a widespread application of NR. But although widely applied, the effects of NR on speech intelligibility are poorly understood. In this presentation attempts are made to unravel some of the myths and mysteries concerning NR. Can NR improve intelligibility? Are the effects stable across noises and speech-to-noise ratios? Can experts adjust NR systems such that intelligibility becomes optimal? Can intelligibility metrics be used or are these poor predictors of the intelligibility of noise reduced noisy speech? Our outcomes will reveal some of the merits and pitfalls of NR.