AES PNW Meeting Report - A Personal History of Perceptual Audio Coding

Meeting held October 14, 2010 at Microsoft Studios, Redmond, WA

AES PNW Section Meeting Report A Personal History of Perceptual Audio Coding James D. (JJ) Johnston DTS Inc.

PNW Vice-Chair Rick Chinn opens the meeting.	JJ Johnston
Photos by Gary Louie
PowerPoint deck Audio recording

James (JJ) Johnston spoke at the October PNW Section meeting, giving a personal viewpoint on his participation in the evolution of perceptual audio coding. Some 15 AES members and 18 nonmembers attended the meeting held at Microsoft Studios in Redmond.

PNW vice-chair Rick Chinn opened the meeting in Bob Moses' absence. Rick announced the Acoustical Society Young Fellowship in acoustics and the ASA convention in Seattle in May 2011. The November PNW meeting would be John Vanderkooy, and he noted the upcoming AES convention in San Francisco.

James D. (JJ) Johnston received BSEE and MSEE degrees from Carnegie-Mellon University. He then worked for AT&T Bell Labs and its successor AT&T Labs Research, retiring (temporarily) in 2002. He also worked for Microsoft as Windows Audio Architect for 6 years. Most recently he has been working in the area of auditory perception of soundfields, ways to capture soundfield cues and represent them, and ways to expand the limited sense of realism available in standard audio playback for both captured and synthetic performances. He is currently Chief Scientist forDTS Inc. Mr. Johnston is an IEEE Fellow, an AES Fellow, a NJ Inventor of the Year, an AT&T Technical Medalist and Standards Awardee, and a co-recipient of the Donald Fink Paper Award and the 2006 James L. Flanagan Signal Processing Award from the IEEE.

The music world is now well familiar with using reduced bit rate codecs such as the MP3 (really MPEG-1 audio layer 3) and AAC (Advanced Audio Codec) after decades of using linear PCM with the compact disc. JJ gave his personal view of his pioneering work into music codecs. He jokingly called his talk a personal history of his making a test program for a computer that turned into MP3.

He reviewed the 1970's analog and digital technology he worked with while researching music codecs at AT&T. These were power hungry, big, complex, hybrids of digital and analog circuits-withlots of touchy adjustments. The addition ofanalog dividers and multipliers allowed him to adjust step sizes in the converters. Then, sub-band coders (SBC) using analog filters were developed-which worked well, showing that integer band sampling and SBCs were a practical concept, even if the analog implementation was very complex. A 56kbs "commentary" coder using such techniques (in analog hardware) worked well, proving the concepts. The Quadrature Mirror Filter (QMF) could be done digitally, but no good filters existed. JJ's early filter designs for this in 1979 are still his most cited papers.

By the early 1980s, computer power could barely do two bands of SBC, and yet the results sounded poor - an "upward spread of masking" effect, JJ learned, and his first hint of the need for perceptual coding.

In 1984, an Alliant FX minicomputer had arrived at AT&T, and now JJ had enough computer memory space - more than 32kwords! He was assigned to "break the compiler" and run it through its paces. He then developed a series of test programs to insert perceptual noise, measure perceptual entropy, and to do perceptual transform coding (PXFM). There was some pre-echo, but PXFM generally sounded much better than previous techniques.

Test material sound sources were fourvinyl LP clips, copied to audio Compact Cassette at home, then played into a 12 bit converter on a minicomputer in the lab - it took weeks to get useful amounts of material processed. The arrival of CDs made things easier.

By 1986, JJ's computer was working well and the first informal listening tests were held at AT&T. But by 1987 JJ was working on video, and his audio work was not published as AT&T balked at patenting costs. In 1988,after finally getting the patents, JJ took his concepts to the IEEE/ICASSP conference. Next to him was Karlheinz Brandenburg's paper (presented by Heinz Gerhäuser). They looked at each other's posters and realized they had the same concept. They convinced their respective bosses that they should be working together on this. This, he feels, is probably the birth of MP3.

JJ played some of the original test material, male voices singing in an African style:

original digital track (originally 12bit floating (pseudo 16bit)/32kHz)
with perceptual noise insertion, a 13.6dB signal/noise ratio, and not bad sounding
with sample modulated white noise at 13.6dB S/N - very noisy
the white noise difference signal -a spitty sound, often with a tonal character
the difference signal of the perceptual noise example -it sounded somewhat like a very distorted original, but it's buried under the original and you don't normally hear it.

Next came the pain of dealing with the standards bodies. Four years late, JJ wrote the paper. There was no money in the idea and no market, so management said to make it a standard, and MPEG was starting up at the time. At the MPEG meeting, it seemed the IRT (German broadcast research) had it's own ideas about audio codecs, but the 16 various codec proposals were combined into 4 groups, told they had to combine their ideas, then submit the 4 ideas for evaluation.

JJ's group created ASPEC (Audio Spectro-Perceptual Entropy Coding) using PXFM, an OCF (Optimum Coding in the Frequency Domain) filterbank (using MDCT, Modified Discrete Cosine Transform), block switching, and other ideas. In spite of hardware interface changes stipulated late in the game, which meant ASPEC had some jitter, ASPEC still won the quality evaluation.

Oddly, however, they were told it was too complicated to implement, and rejected. Instead, 2 audio parts for MPEG-2 (layers 1 and 2) were specified. There was much acrimony between various factions. Layers 3 and 4 were proposed, but since they needed to use the filter banks of Layers 1-2, a hybrid filter bank was finally devised and called Layer 3. Oddly, the hybrid was not deemed too complicated like MDCT, even though it was. After much more intergroup battles, the standard was finally agreed to.

And what of AAC? While MP3 was finishing, researchers went on to suggest further changes regarding backward compatibility and MPEG-2 audio layers.JJ figured none of this was adequate, dropped out of the standards mess, and teamed with Anibal Ferierrato make an improved codec, PAC (Perceptual Audio Coder). When MPEG had a test, they had to allow 2 non-backwards compatible codecs including PAC. PAC won the test.

For this non-backward compatible (NBC) project, the top NBC developers were forced to work together to work on one NBC that MPEG might allow to be standardized. Most features of PAC replaced most features of ASPEC - and was renamed AAC.

After the break, a Q&A session included thoughts on codec testing material, WMA and dithering. JJ recommended higher bit rates for better quality, and lossless if you can. He has no iPod, and uses speakers to hear his CDs.

Reported by Gary Louie, PNW Section Secretary

Last modified 02/06/2015 0:22:21.