Universal Speech and Audio Coding Algorithms 

for 

Multimedia and Teleconferencing Applications

Dr. Andreas Spanias, Sassan Ahmadi, and Ted Painter,
Arizona State University (ASU)
Sponsored by Intel NDTC 
Last updated: June 12, 1997
 
Contents  
Introduction  

What are Speech and Audio Coding?

Speech and audio coding or compression is the field concerned with compact digital representations of speech or audio signals for the purpose of efficient transmission or storage. The central objective is to represent a signal with a minimum number of bits while maintaining perceptual quality. Current applications for speech and audio coding algorithms include cellular and personal communications networks (PCNs), teleconferencing, desktop multi-media systems, and secure communications. Historically, coding algorithms using incompatible compression techniques have been optimized for particular signal classes, i.e., narrowband (telephone quality; 4 kHz BW), wideband (AM grade; 7 kHz BW), high-quality (FM grade; 15 kHz BW), and high-fidelity (CD quality; 20 kHz BW).   
Objectives  
 

The goal of this project is to develop a new family of universal, scalable, and interoperable speech and audio compression algorithms for teleconferencing and multimedia applications. 

In particular, the project objectives are to: 
  • Develop and evaluate a signal model for inter-operable and universal audio coding.
  • Develop inter-operable algorithms for narrowband, wideband, high-quality, and high-fidelity audio coding.
  • Analyze the algorithms in terms of audio quality, robustness, and complexity.
  • Optimize the algorithms for performance and complexity.
The universal model will provide a set of parameters that can be used for the coding of the different grades of audio. Moreover the set of parameters to be encoded will have the property that the "richest" high-fidelity set includes all the other parametric sets, ensuring interoperability.   
Research Topics  
 

Several research topics are currently under study, including: 

  • Development of a universal signal model using transform-domain signal analysis.  Fourier,  wavelet, and other signal analysis methods are under investigation.
  • Development of enhanced psychoacoustic signal analysis techniques, both in time and frequency. 
  • Development of scalable, multimodal, interoperable speech and audio coders based upon the universal signal model.
  • Design, development, evaluation, and implementation of algorithms, models, and techniques through which the sinusoidal parameters can be represented and quantized more efficiently, and integration of these methods into low-bit rate sinusoidal coders capable of reproducing speech of good quality, intelligibility, and naturalness are the objectives of this research regarding narrowband speech coding.
Recent Accomplishments  

Until now, our work has progressed separately on two parallel paths along the lines of low rate coding of narrowband speech and high-fidelity audio coding.  Recently, we have:

     IN LOW RATE NARROWBAND SPEECH CODING, we have: 
  • Developed a sinusoidal phase model using allpass filtering and delay compensation.
  • Finalized low-rate STC coders operating at 2.4, 4.8, and 9.6 kbps.
     IN HIGH FIDELITY (CD-QUALITY) AUDIO CODING, we have begun investigating: 
  • A harmonic plus residual signal model.
  • Application of perceptually-weighted VQ to reduce bit rates in a 2-mode DFT-based coder.
  • Enhanced psychoacoustic modeling techniques which accurately reflect responses of the human auditory system to complex stimuli. 
Presentation on NDTC Project (April 1997)  
 

We recently presented to Intel a summary of our research activities during 1997. 

Click here to view the presentation on low-rate coding of narrowband speech. 
Click here to view the presentation on variable-rate coding of high-fidelity audio.  
Example Speech and Audio Coders  

We have developed several speech and audio coders during the course of this project. 

Click here to see an example 2400 bit-per-second narrowband speech coder. 
Click here to see an example variable rate high-fidelity audio coder.    
Research Group  
 

This page describes speech/audio coding research conducted by several people, including: 

Dr. Andreas Spanias, Principal Investigator 
Sassan Ahmadi, Research Associate 
Ted Painter, Research Associate
Publications  
Related Sites  

Other sites related to this project include the following: 

Telecommunications Research Center 
College of Engineering and Applied Sciences 
Arizona State University, which is located in the city of Tempe, Arizona  85287-7206 USA
Acknowledgements  

The work described on this page is sponsored by a grant from the Intel Corporation. 

We gratefully acknowlege the generous support of the Intel Corporation's NDTC group which has made possible the work desribed on this site.  In addition to several research grants, the Intel NDTC group has donated to the ASU-TRC Speech Lab several high performance workstations fully equipped with application software, including two high-end Pentium and two state-of-the art Pentium-Pro NT workstations. 
Contacts  

For further information, direct all correspondance to: 

Dr. Andreas S. Spanias <spanias@asu.edu>
Last updated on June 12, 1997