speech and audio Signal processing
The Institute of Communication Acoustics is actively engaged
in the area of speech and audio signal processing. We welcome students
wishing to do their student projects/Dipl.-/Bachelor- and Masters' theses with us and offer the opportunity to participate in, and contribute
to advancing the technologies in this field.
Speech is not only the most important means of communication
between humans, but it also plays an important role in speech
controlled human-machine interfaces.
Speech signal processing encompasses a wide range of topics,
e.g.,
- Speech Enhancement
- Statistical Models of Speech and Audio Signals
- Noise Power Estimation
- Source Localization and Separation
- Robust Speech Recognition
- Robust Speech Transmission and Voice over IP
- Selective Temporal Cepstrum Smoothing for Speech Enhancement
- Auditory Virtual Environments
Figure 1 shows the interconnection between these
various aspects of speech signal processing. The acoustic
signal is picked up by a single microphone or an array of
microphones. Using arrays enables spatial selectivity of the
signals and requires, in practice, a priori knowledge of the
speaker positions, automatic speaker localization algorithms,
or so-called 'blind' approaches.

Bild 1: Algorithm for speech-signal processing
Often, noise reduction (separation of information-bearing
signals and interference, based on statistical features) and
echo compensation are also included at this stage. The signal
thus obtained is compressed, and either: transmitted (e.g.,
over a mobile transmission channel) after performing error-control
coding, or input to a speech-controlled human-machine interface.
The principle of such an interface is illustrated in Figure
2. The processed signal from the previous step is the input
to a speech recognizer. The speech recognizer passes the recognized
words to an approapriate language-based module, which finally
communicates with a dialog manager. The response from the
dialog manager can, with the help of a speech synthesizer,
be resynthesized into an acoustic signal.
At the Institute of Communication Acoustics, research is conducted
in the area of multi-modal human-machine interaction as applied
to, amongst others, virtual environments. One of the current
results of this research is an interactive environment that
can be loaded and executed in a web browser.

