Audio: Listen to this article.
Welcome to the Digital Signal Processing chapter of the Ultimate Guide To High End Immersive Audio. The main table of contents can be viewed here.
Digital signal processing or DSP is a proverbial four letter word in many audiophile circles. DSP means many things to many people, and is often an undefinable scapegoat on which questionable sound quality is pinned. Back in the day there were good reasons for this distaste of DSP. The first implementations were technically interesting, but sounded terrible. Today, DSP is used in every digital playback system, and is even used to create the infamous but great sounding Mobile Fidelity Ultradisc One-Step series of vinyl reissues. This chapter of The Definitive Guide To High End Immersive Audio scratches the surface of the cavernous topic of digital signal processing, focusing on two areas of importance, decoding immersive music and digital room correction.
Definitions
Decoding - Nearly all immersive music is encoded in a proprietary format requiring decoding by the listener’s audio system. Discrete Immersive content is the only music that doesn’t require decoding because it’s delivered as ten or twelve channel WAV files at 24 bit / 352.8 kHz. Most other music is encoded in a Dolby format such as Dolby Digital Plus or TrueHD Dolby Atmos. Auro 3D, Sony 360 Real Audio, and IAMF (Immersive Audio Model and Formats) are also available somewhat, but extremely limited in distribution and market acceptance (currently).
The decoding process not only involves unpacking a digital audio stream, but also rendering audio to the correct, and correct number of, channels. An encoded Atmos file can be played on systems from two through sixteen channels. The decoding system is told how many channels and in which configuration it should render the audio for playback.
Proprietary formats are often viewed skeptically by audiophiles who’ve used FLAC for decades. However, formats such those from Dolby don’t really have an open source or free alternative that can match the market penetration and feature set.
Digital Room Correction - Another sensitive topic in the audiophile world, even though by 2024 it really shouldn’t be. Everyone should be at least trying state of the art digital room correction in their own systems because it’s that good. DRC is a massively confusing topic for all but the most nerdy audiophiles. For this chapter the DRC concepts most easily digestible are time and frequency correction. Time correction ensures that the direct sound hits the listening position at the same time while frequency correction smooths out the peaks and dips caused by one’s listening room (too much or too little bass for example).
Within the world of digital room correction there are countless main topics, sub-topics, and differing opinions. This guide attempts to cover some broad areas and provide listeners actionable information they can use to audition the results of different DRC concepts by listening to different products or working with an expert in DRC.
Why It’s Required
As an audiophile I like to think I can “will” my musical playback into perfection with the straight wire with gain philosophy, but that’s a fool’s errand. A middle ground approach, involving the manual adjustment of time and frequency parameters, is also one that’s more likely to produce dubious results, but at least provide endless hours of DIY fiddling / entertainment for those so inclined. Don’t get me wrong, I have the utmost respect for those who roll up their sleeves and white knuckle DSP and I have no doubt they are satisfied with the results, but the level of accuracy achieved by a human can’t match that of a machine. Enabling a machine to handle the tough parts and using human subjective evaluations for the final touches, results in a state of the art listening experience of which our audiophile forefathers could’ve of only dreamt.
The focus for a long time in this hobby was bit perfection. Playing an album as perfect as possible was a laudable goal in the early days of computer audio, when many apps mangled our music before our DACs even had a chance to convert the bits to audio. Now, with playback apps more under control and state of the art DSP we can focus on audio that’s “bit perfect” at our ears.
Using digital room correction in the time domain is absolutely required unless one’s listening position is equidistant from every loudspeaker. It takes a very special room to accommodate such a setup. This is typically only seen in audio laboratory settings or certified ITU/EBU control rooms. Associated with the timing adjustments is the volume level because a loudspeaker that’s closer to the listening position may be louder than those further away and may have different sensitivity characteristics than the “main” front speakers.
Digitally correcting for frequency issues should be done after one attempts to physically adjust the listening room, using absorption, diffusion, and preferably normal human items such as plants, furniture, etc… No matter how a room is designed, the laws of physics will overpower the will of even the most dedicated anti-DSP listener. Correcting for bass issues, with very long sound waves, can foil all but an anechoic chamber’s worth of absorptive material.
Last, the decoding aspect of DSP is required if one wants to hear all the channels of an immersive album. Without a TrueHD Dolby Atmos decoder, one can’t hear the entire album as it was designed to be heard. Listeners may get a portion of the channels and a portion of the music via some other means, but not the true immersive experience.
Decoding and Room Correction Options
The reason both decoding and digital room correction are included in the same DSP chapter is because they are linked in most audio systems. Splitting the decoding from DRC will gain more traction as new devices hit the market that enable decoded audio to be output to a number of other devices as pure PCM audio, but currently these devices are few and far between (Arvus offers two, and another manufacturer will offer one soon). An example of this decoding and DSP link can be seen when using a traditional processor (Trinnov, Marantz, Anthem, etc…). If one decodes an immersive audio signal into twelve channels prior to the processor’s input (HDMI or other), the processor can’t handle a decoded PCM signal with that many channels. Thus, the decoding must take place within the processor, if one wants to use the processor’s room correction.
It is possible to decode immersive audio using an Arvus H1-D and output the audio via Ethernet or the Arvus H2-4D and output via AES or Ethernet, unlinking the decoding from room correction, as long as one has a device capable of accepting a high channel count AES or Ethernet signal and running the proper room correction.
In Simple Terms
Here are three ways of decoding and running digital room correction for immersive music playback, in simple terms and in no specific order.
Computer - This is how I do everything because it’s the only way to obtain true state of the art playback at the highest of audiophile capabilities. Please understand that the other methods are also great, otherwise I wouldn’t mention them, but just like in sports, only one team / method can be the best with respect to sound quality. However, there are also drawbacks to using a computer. For one, it’s a computer. It’ll have issues. There’s no way around that fact. Fortunately I’m capable of handling any of the issues that come up, but I understand not everyone cares to deal with them, even if they are tech savvy.
Using a computer to decode immersive audio can be done using the macOS operating system when playing from Apple Music, as the Dolby Digital Plus decoder is built-in. Decoding Dolby TrueHD Atmos and Auro 3D are more difficult. Auro offers a VST plugin that I’ve used to decode Auro 3D music through JRiver Media Center, but this plugin hasn’t been updated to work on Macs with Apple Silicon. In other words, the Auro plugin doesn’t work on any Mac sold in stores today. It does work on Windows and Intel based Macs for roughly $20 per month.
Decoding TrueHD Dolby Atmos on a computer, for content that’s sold as MKV files or ripped from Blu-ray, is done either in real time or offline mode using a combination of apps. This approach requires a bit of extra work, but results in decoded WAV files capable of being played with any app that supports the requisite number of channels (JRiver, Audirvana, etc…).
The easiest way to decode TrueHD Dolby Atmos is to do it offline. Using the application named Music Media Helper and the Dolby Reference Player, MKV files downloaded or ripped from Blu-ray can be converted into any supported Atmos channel configuration (5.1.2, 5.1.4, 7.1.4, 9.1.6, etc…) as WAV files. FLAC will never support more than eight channels without embedded / encoded data, and WAV works pretty good anyway. Once the files are decoded, the listener is free to use state of the art digital room correction.
There are countless ways to do this, but I will explain what I believe is the absolute best. At a high level, using a good mic preamp with an Earthworks M30 microphone or better, and Audiolense on a Windows PC (only runs on Windows currently) to measure and create the room correction filters, is the best. Period. I recommend hiring Mitch Barnett to walk you through the measurement process and create filters for you, unless you’re a glutton for punishment.
The current state of the art in room correction begins with the Audiolense application. To my knowledge, and I will happily include corrections if notified, no other application that runs on a computer or in a traditional processor, is as powerful and capable as Audiolense. As a real world example of this superiority, Audiolense features digital crossovers with bass offloading that’s totally configurable for each loudspeaker. This means speakers with limited frequency ranges can have the bass offloaded to a subwoofer, while full range speakers in the same system can reproduce audio to the limits of their capabilities as well. In practice, a listener playing Tsuyoshi Yamamoto’s album A Shade of Blue, with Hiroshi Kagawa’s double bass emanating from the center channel, can have the very bottom end of the frequency range of that bass offloaded to a subwoofer, if the center channel can’t reproduce the aforementioned frequencies. Without this capability, the bass is sent to the center channel and not reproduced in the audio system. Another less than optimal way would have all the bass for all channels sent to the subwoofer, but then the front left and right channels wouldn’t reproduce Hiroshi Kagawa’s bass as they should because they can often reach down to 20 Hz.
Using a computer for room correction also enables one to use incredibly powerful FIR filters created by Audiolense. I use Accurate Sound’s Hang Loose Convolver to host these filters as it works better than any native in-app convolution engine. A real world example of these powerful filters can be seen using simple math.
It starts with 65,536 tap FIR filters. This alone is well beyond the capabilities of traditional processors. As one listens to higher sample rates, the filter can be upsampled to several hundred thousand or over one million taps automatically. This ensures the frequency resolution of the FIR filter stays the same when the sample rate increases and is a distinction with a major difference.
Frequency resolution = fs / N where fs is the sample rate and N is the number of filter taps.
A 65,536 tap FIR filter at 48 kHz (Atmos is currently all released at 48 kHz) has a frequency resolution of 48000/65536 = 0.732 Hz.
The frequency range spans 0Hz to 24 kHz. Thinking of an FIR filter as a graphic equalizer: 24000/0.732 = 32,768 sliders for an FIR equalizer. This FIR real world example has 1000 times the frequency resolution of a 1/3 octave equalizer. In addition a rough rule of thumb is that the effective low frequency limit of the filter is to multiply the frequency resolution by 3, which is 3 x 0.732 Hz = 2.2 Hz. A 65,536 tap FIR filter running on a computer can control frequencies down to 2.2 Hz.
Notice I’ve been talking about FIR (finite impulse response) filters. These are phase linear and processor/memory intensive. Traditional hardware processors can’t use FIR filters for the lowest frequencies, because they lack hardware DSP processing power, and often use less precise IIR (infinite impulse response) filters in combination with FIR filters to cover the full range. IIR filters are frequently less stable and suffer from unequal delays at different frequencies. More information about the difference between IIR and FIR filters can be seen here (link).
There is no free lunch with 65k tap FIR filters or such powerful DSP in general. Using a computer can either be a pro or a con depending on the user and situation. In addition, high tap count filters increase latency. This is a non-issue for music only listeners, but can be an issue for those watching movies. Sophisticated applications such as JRiver Media Center offer latency compensation that works in conjunction with Hang Loose Convolver’s VST plugin. HLC reports the latency to JRMC, and JRMC compensated for this during video playback, removing lip-sync issues. For my music only system this isn’t an issue at all. Alternatively, one can use minimum phase FIR filters which still have the power to control the bass frequencies at the expense of giving up the time domain correction. But a minimum phase FIR filter has zero latency so will work with Apple TV, or YouTube or Netflix through standalone convolution (example).
Another potential issue with these state of the art filters is called insertion loss. This means the volume level is cut, based on the amount of correction used. An audio system with enough headroom can easily make up for this volume reduction, but it should be understood while designing an audio system.
One last benefit of using a computer for digital room correction is the ability to play discrete immersive albums, and even on rare occasions Atmos ADM files. I’ve purchased ADM files through Bandcamp, but these are certainly not the norm. Playing ten or twelve channel discrete immersive DXD content with 500,000+ tap filters is the height of living, with respect to high fidelity music playback. It takes a computer to make it in the studio, and to play it at home.
Note: One method I've bene experimenting with is using an Aurender music server to play immersvie music, up through tweleve channel DXD, and routing the audio through a computer for DSP, then on to my Merging Technologies hardware for playback. This method will continue to evolve and improve ease of use in the long run for music lovers.
Hybrid Approach -There is a hybrid approach between using a computer for everything and nothing. As we move away from using a computer, the solutions usually get easier to use, but performance does decrease. Whether or not that performance decrease matters is up to each listener. This guide is about presenting facts, not making friends.
One example of this hybrid approach to DSP is decoding and measuring on a computer while running the room correction filters on an audio hardware device. In my system I have this setup for testing as well as the previously mentioned computer only approach. I use the Sonarworks SoundID Reference application with a Sonarworks microphone to measure my system in my room. The process takes about an hour, but is fairly idiot proof. The app walks one through each microphone placement and tells the user what to do at each step along the way. This is different from Audiolense which requires either serious knowledge or a professional such as Mitch Barnett working with the user.
After running the measurements, SoundID Reference displays a few options and shows a frequency response curve. It’s possible to make manually adjustments or select from built-in options such as the Dolby curve. I’ve done both, but usually wound up using the Dolby option. After a curve is selected, the filter is exported to work with a number of hardware devices. In my case I uploaded the filter to my Merging Technologies Anubis and enabled it.
One the filter is enabled on the Anubis, thinking about filters is over. It operates on all audio signals routed through the device, no matter the sample rate or channel count, without user intervention. This is convenient. Changing channel counts while using Hang Loose Convolver can involve manually switching filters, to ensure the channels are routed to the correct loudspeaker.
In the real world playback looks fairly similar to the computer only approach, with the exception of not running convolution software on the computer. This means no VST plugin in an app like JRiver or no Hang Loose Convolver accepting audio from Apple Music or Audirvana before outputting to the same Merging Anubis.
The downsides of this Hybrid stem from a limited measurement and filter creation application and hardware horsepower. Using the double bass in the center channel example from above, when I play this track and use SoundID Reference in my system, none of the center channel bass is offloaded to the subwoofer. Because my center channel, like most center channel speakers, is low frequency limited, I just don’t hear the full capabilities of the double bass in the center channel.
Other negatives are the mixed filter mode using FIR and IIR, with phase changes, and lack of filter taps for low frequency control compared to a full computer solution, equaling less resolution.
One really nice feature of this hybrid approach is a zero latency mode. It’s possible to have zero latency, but the amount of correction is limited. I’ve used SoundID Reference for testing video playback, and everything is in sync perfectly. However, use of minimum phase FIR filters, as mentioned above, also offer zero latency and control bass frequencies very well.
This hybrid approach is most commonly used in professional studios rather than audiophile listening rooms. However, it is a nice option to have. I’m glad it is an add-on to the Merging Anubis because I can use it if I need it. But, I wouldn’t go out of my way to get it, if I already had Audiolense and a convolution engine running on a computer. The audio output just doesn’t sound as good to me, most likely because it’s objectively less precise due to hardware and software limitations.
Traditional Processor - This approach is the most popular and by far the easiest. Traditional processors such as those from Trinnov, Marantz, and Anthem have built-in immersive audio decoders and digital room correction. Dirac, RoomPerfect, and Audyssey are some of the bigger names embedded into traditional processors.
The typical workflow for decoding and room correction couldn’t be easier. Connecting an Apple TV to a processor and streaming Apple Music or Tidal will get Dolby Atmos music flowing into the system with a couple clicks. Playing TrueHD Dolby Atmos can be done by putting the MKV downloads or Blu-ray rips onto an NVIDIA Shield connected to the processor and tapping play. Fully decoded and processed with the tap of a finger.
The quality of digital room correction in traditional processors is all over the board. It ranges from those that make the sound worse to those that do a really great job. It all comes down to the sophistication of the software and the horsepower of the hardware.
Taking measurements involves zero computers and often a microphone made specifically for the processor or brand of processors. Just add to cart, connect it when it arrives, and run through the setup wizard. Sophisticated products like the Trinnov Altitude 16 or 32 enable one to VNC into the processor to make adjustments and see an approximation of the end results of the DSP. The beauty of this is a good Trinnov dealer will handle all of the configuration, and use the brilliant team at Trinnov for backup in tough situations.
Similar to the hybrid approach, running DSP on A/V hardware hits its limits due to lack of horsepower. Limited number of filter taps, IIR filters or mixed mode IIR and FIR filters, will equate to sound quality that isn’t as good as the computer only approach. However, and this is a big however, because these processors are designed to work with video simultaneously, they are designed to have minimal latency, which lowers the amount of DSP processing they can do. The products are working as designed.
On the other hand, a processor like a Trinnov Altitude 32 uses a computer internally and technically could adjust for latency like JRiver does, but I don’t believe it has enough processing power to run 65,536 tap FIR filters to keep everything in phase and control bass down to 2.2 Hz.
The ease of use of these processors can’t be overstated. In fact, I’ve been working with a very high end dealer for the last several months on an immersive system design, and I recommended a Trinnov Altitude processor for the specific installation. It’s the right horse for many courses. In this case, the listener forwent playback of discrete DXD and state of the art room correction, in favor of great room correction and ease of playing Atmos content from an Apple TV and NVIDIA Shield. The fact that the Trinnov is a Roon Ready endpoint for up through eight channel PCM is also a bonus that factored in the final decision.
Given that the traditional processors are all proprietary, it’s hard to say what’s going on inside. The user manuals give some clues and show users how to use different filter modes, such as IIR, FIR, and mixed FIR+IIR, but that’s a very high level look into what’s going on. I wish some of them would reveal more details because they really have a lot to offer as opposed to some of the mass market processors using the cheapest and weakest chips to get audio decoded and processed.
When I began my immersive audio journey I planned on using a Trinnov Altitude processor as method of playback. I would still like to make this happen, more because I want to experience it first hand in my own room and I want to know how it works as well as I can. This would enable me to educate readers about the product much more.
Last, as a music only audiophile I don’t want a screen in my listening room. Call me old school, but that’s just the way I like it. A traditional processor necessitates the use of a screen of some type. There are possible ways to use some processors without a screen, but as of right now, I wouldn’t wish it upon anyone who likes a fiddle-free listening experience. I’m looking for and testing solutions that enable me to use a device like the Arvus to decode and output Atmos from an Apple TV and Shield, without a display. The Shield can be operated without a display, but the Apple TV is another story. I have some ideas.
Digital Signal Processing Wrap Up
DSP, both decoding and digital room correction, not to mention all the other items for which DSP is used, is a cavernous hole with many unknowns to all but the most geeky audiophiles. I don’t consider myself an expert by any stretch of the imagination, but I have used many of the products and talked to several true experts in the field.
Like many technologies, digital signal processing is limited by the sophistication of the software and horsepower of the hardware. In addition, in the hands of a professional, DSP can be magical or in the hands of a novice, it can enable sound quality that reaches new lows. Tread lightly and call in the pros when needed.
Immersive audio playback involves decoding and room correction, which are commonly linked to the same device. They don’t have to be, but they usually are. Understanding oneself is key to making a decision about which route will best work in any given system. A Computer only route will provide the best objective audio performance, while the traditional processor route will provide the most convenient and easiest use. For many the key will be bringing these two ends of the continuum as close together as possible, and as of today this is done with a Trinnov Altitude processor.
I am sold on the state of the art room correction offered by Audiolense and filters created by Mitch Barnett of Accurate Sound. The sound quality is second to none, both subjectively and objectively. This is the only way to create a high end immersive experience on the same level as many two channel audiophile systems.
Further Reading
- Playing TrueHD Atmos Music Downloads & Blu-ray Rips The Easy Way
- How To Decode and Play Dolby TrueHD Atmos on Windows and macOS
- The Digital Side Of My Immersive Atmos Music System
- Lossless TrueHD Atmos Just Got Much Easier
- An Audiophile’s Journey into Immersive Audio
- All Audiophile Style immersive audio articles can be fund here (link)
NOTE: Please post comments, questions, concerns, corrections in the section below or contact us.
Recommended Comments
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now