14210 NE 20th St
Bellevue, WA 98007
Phone: (425) 440-6206

      Home
      Why Us
     Products
      Services
       Gallery
      Articles
     Company
       Jobs
      Contact

Digital Audio: The Possible and Impossible

By, Amir Majidimehr

[This article originally was published in Widescreen Review Magazine.]

Pop quiz:  Which of these statements would you say is true?
1.  If you change the HDMI cable between your Blu-ray player and AVR/Processor, the analog audio output can change.
2.  If you turn off the display and video circuits in your Blu-ray player or AVR/Processor, the analog audio output can change.
3.  If you have a DAC with USB and S/PDIF inputs, changing from one to the other can change the analog output.
4.  If you change from HDMI input to S/PDIF on your AVR or Processor, the analog output can change.

I bet most of you reading this article consider all of the above impossibilities and the talk otherwise, that of a charlatan selling you “snake oil.”  Well, I am here to tell you that the answer is the opposite!  And it is so as a matter of science and engineering, not belief.  Interestingly enough, the more technical you are the more you are apt to think what I am telling you is wrong.  The reasons for this will become clear later.

Digital Audio is Not All Digital!

The first thing to understand is that digital audio is not completely “digital.”  Yes, you read that right.  The bit you are familiar with is indeed digital, namely the numbers that represent the audio sample amplitudes.  CD audio for examples has 16 bits for each sample or two bytes.  Audio on Blu-ray can go up to 24 bits or three bytes.  As long as audio samples are inside a digital device, they remain purely digital as a set of samples to be reproduced sometime later.  You can copy them around as many times as you like and the stay pristine and lossless just like your computer files.  So far, they are the digital data we believe them to be.

At the risk of stating the obvious, we do not hear digital samples but rather analog waveforms.  To create that analog waveform we feed the digital audio samples to a device called Digital to Analog Converter or DAC for short.  That conversion needs two pieces of information: the audio samples we have just described, and their timing.  The timing tells the DAC when that audio sample needs to be output.

The reason timing is important is that sampling theory mandates that our output samples match the precise time we converted them from analog to digital originally.  If we maintain identical timing, we can show that digital systems have extremely high levels of fidelity.  Violate this principal though, and you introduce distortion. 

Think of the output waveform as a set of dots connected to each other with the dots representing our digital audio samples.  If I take those audio samples and move them left and right, the waveform would change.  That is what jitter does.  The time defines the horizontal position of those dots and changing them means distortion just the same.

As is often the case in real life, we cannot have perfect precision in the timing that drives our DAC.  Some amount of variability always exists.  We call this “jitter.”  This is a measurement of how much the timing source to the DAC varies from the ideal value.  The effect of jitter is determined by its swing (amplitude), the frequency (of jitter and source audio) and its spectrum/waveform (how it varies). 

Jitter is often poorly characterized as a single figure of merit with a unit of time such as “1.5 nanoseconds.”  I say poorly characterized because the other factors, especially the spectrum, can be much more significant, often surpassing the amplitude of jitter so represented.    That said, I am going to commit the same sin in the rest of this article and focus on that one number as to get through this introduction to jitter without writing a whole book.

You may be puzzled as to why anyone would worry about jitter if the typical numbers are measured in nanoseconds.  After all, how would 1.5 ns which translates into 1.5 billionth of a second make a difference in the output waveform?  Surely our CD audio, with relatively few samples of 44,100 per second, can’t be sensitive to such small variations.  It would be a mistake to think so as the impact of jitter is frequency and sample depth dependent.  The latter is a big problem for us as audio samples have comparatively high resolution.  Let's take the example of 16 bits in CD audio which translates into 65,535 sample values.  Those are awfully small increments in the output audio waveform whose magnitude is represented roughly by the voltage of a single AA battery!  24 bits gets even crazier with 16,777,215 levels.  Imagine the voltage of that single AA battery now being divided into 16 million tiny divisions!  For our digital audio transmission/reproduction to be "perfect," it would need to preserve all those increments.

How Much Jitter is Too Much?

One way we measure the impact of jitter is by making the simplifying assumption of it being a sine wave (it is not many times but let’s go with it).  We can then compute how much it needs to be to generate distortion equal to the voltage represented by one bit of our audio sample as described earlier.  For CD audio, it would be the amount of jitter that makes 16 bit audio sample equivalent to 15 bits.  The idea here is that if we are trying to play 16 bit audio samples we would ideally want our reproduction system to have sufficient transparency to rise up to at least that level of transparency. 

Jitter's effect on music is that it modulates all the tones in it.  Using mathematics which in the interest of not boring you I won’t go into, we can model sinusoildal jitter as a signal that generates two distortion products, one of which is the sum of the jitter and our source frequency and the other, the difference between the two.  Using this model, and the fact that CD music has a response of roughly 20 KHz, we can compute how much jitter it takes to overwhelm a single bit of our 16-bit audio sample.  Performing the math, we arrive at the unbelievable fact that jitter amplitude cannot be more than 0.5 nanoseconds!  You read that right.  If timing variation of the DAC is more than half a billionth of a second you generate enough distortion to swamp one bit of your audio sample.  It is not opinion that says that.  It is pure mathematics.  And this is for the simplest case of jitter, not the more complex but typical scenarios where there are many jitter frequencies and spectrums acting on your your music.

To visualize this obscure explanation in practice, here is a measurement of jitter acting on a single tone as created by the late Julian Dunn which was published in the digital audio measurement handbook of Audio Precision Audio Analyzer:

Digital Audio Jitter
The 10 KHz tone is our source frequency.  The distortion sidebands (smaller peaks on each side) are purely the result of changing the timing clock of the DAC by a sine wave at a frequency of 3 KHz with an amplitude of 5 nanoseconds.  This is for a system with 20 bits of resolution and hence, has a noise floor of -120 dB.  Jitter distortion of -80db is hugely above that and has reduced the system performance to well under what CD can do at 16 bits (96 dB).  So even though 7.6 billionths of a second sounds like a very small value, its impact in distorting our audio signal is quite significant. 

Jitter Sources

The next question becomes what causes jitter.  The answer is that it can be any and all things.   DACs are usually inside what is normally a very noisy environment.  The same power line that feeds our DAC and its clock eventually also feeds high-speed digital circuits such as microprocessors/DSPs,  video circuits, front panel high displays which often use voltage circuits, etc.  While there is usually filtering on both the power lines and in the DAC clock circuits, one cannot eliminate all variations and those variations translate into tiny changes in the DAC clock.  For this reason, it is possible to reduce the jitter level by turning off unnecessary circuits.  The so called “Stereo Direct” mode and such in some products performs this function as they turn off video, shut off front panels, and such.  This will often have a measureable effect on the DAC output and its performance.

But that is not all.  Some jitter is induced even before the digital samples arrive at the receiving device/DAC.  Take the S/PDIF digital audio cable.  The audio bits travel on it as a series of “ones and zeros.”  But no cable can reproduce the perfect square wave with pulses going from zero to their final value in zero time.  The cable and its driving and receiving circuits distort these waveforms and make them noisy pulses that take some time to go from low value to high.  To capture these values, despite such electrical distortion, the receiver samples them at “zero crossing.”  That is the time that a waveform crosses a threshold that tells us if it is a “one” or “zero.” 

The above lets us capture the digital sample values but when it comes to detecting their timing we are faced with a tough situation.  You can see this visually in this second measurement by Julian Dunn as he shows what happens when you try to use an ordinary audio cable for S/PDIF rather than one designed for that purpose:
Cable Induced Jitter

For starters on the left, you see that there is nothing resembling perfect square waves or pulses of zeros and ones.  The image on the right is the amplified version of the zero crossing point.  You see that as the waveform moves slightly up and down (which it does routinely as the bits change), the precise moment that it crosses our horizontal reference line also changes, and with it our timing varied as seen by the receiver.  We call this “cable induced” jitter.  Now you know why I said at the beginning that it is possible that changing cables alone could cause the analog output of the system to change.  The cable change will change the above waveforms and with it, the jitter transmitted to the external DAC (e.g. from your Blu-ray player to your AVR).

I know what you are thinking, especially if you have an engineering degree.  Why not capture the samples and then output them using a new, high precision clock?  Indeed that is what occurs in all of our digital systems such as computers where such variations exist just the same.  As long as the level of jitter is low we capture the samples and life is well. 

Unlike your computer, in digital audio our job does not finish there.  We must convert those samples to analog.  Unfortunately that conversion cannot occur using a new oscillator.  Let me repeat again: you cannot use a new clock to drive the DAC.  You must instead use the timing as it was arriving on the input to the system.  Our systems put the source as the "master" meaning it controls how fast or slow audio samples must be played.  This is because our sources can vary from the advertized sampling rate.  As an example, just because your Blu-ray player says the movie soundtrack is at “48 KHz,” it does not mean that there are 48,000 samples per second.  When the movie was mastered on disc, as part of syncing audio with video, it is entirely possible that a few more or less samples per second were put in the package than this “nominal” value.  Therefore, even though your processor/DAC knows the audio stream sampling rate, it cannot use it to derive its DAC clock.  Instead, it must “lock” onto the incoming source and attempt to measure precisely how many samples are arriving per second and obey its source of timing.  If that number is 47,999 samples/second then that is precisely what it must play.  Not one more, not one less.  If the DAC deviates you will start to lose audio/video sync as you would be falling behind or getting ahead of the video as created on disc. 

The same process exists even in the case of an audio CD as playing too fast can cause you to run out of audio samples to play or, if you go too slowly, you will eventually have too much data on your hand.  To make this system work, the DAC in your receiving device has a local clock that it varies slightly to match the incoming data rate using the zero crossing method I explained earlier.  Unfortunately this means that the DAC follows the timing variations that exists upstream from it. 

So the often stated notion that jitter doesn’t matter because the receiver captures the data and puts it in a “buffer” (memory) and then plays it from there is wrong.  Yes, audio samples are captured and stored for convenience until played.  But no, that fact does not eliminate the effect of timing variations as the DAC must output the samples using the timing prior to that buffer capture. 

The engineers reading this quickly point out that the receiver has a circuit called PLL that has the job of creating an adjustable clock for the DAC.  And that the PLL is able to filter out timing jitter on its input.  Alas, in practice the PLL is not able to fully filter all the variations.  Most implementations remain sensitive to low frequency jitter which unfortunately is the type that can be audible.  The reason for this is that the lower the frequency that the PLL filters, the slower it is able to “lock” to the incoming rate in order to capture any data.   In other words, the act of removing jitter causes the device to take longer to start playing anything.  That then puts an upper bound on how much filtering can occur in the PLL.   You may have seen this in some processors/DACs where you change the input and it seems to take a long time for it to start playing.

There are smart but more complex solutions to the above problem.  After many years of building devices with the S/PDIF interface, manufacturers have mostly figured out how to suppress jitter to very low and acceptable levels.  Then came this interface called HDMI which set us back many years in this respect.  Jitter over HDMI can be as much 10X higher than S/PDIF! 
 
Making matters worse, jitter measurements of HDMI are hard to come by.  Audio magazines take jitter measurements but in that world HDMI is a rare thing and at any rate, they don’t test many home theater products.  Video magazines don’t usually focus on audio measurements for the most part, so there is no data there either.  The one exception is Paul Miller of UK’s Hifi News who performs measurements of jitter on both interfaces.  It is eye opening to see the measurements of mass market products done side by side this way.  Here is an example measurement for the Onkyo TX-NR5007 AV Receiver:

S/PDIF: 0.79 ns
HDMI: 4.87 ns

As you see, the jitter over HDMI is not only more than six times higher than S/PDIF, it is also way above the maximum threshold for 16 bits of fidelity.  No wonder then that Paul gives the product a failing grade on that interface.  Here are the measurements on another AVR, the Yamaha RX-V3900, so that you don’t think the above is the exception:

SPDIF: 0.183 ns
HDMI: 7.7 ns

Here we have excellent response on S/PDIF but HDMI is a whopping 41 times worse!  There is not one measurement on Paul Miller’s site that has better measurements for HDMI vs. S/PDIF.  The common ratio is 10:1 in favor of S/PDIF.

Invariably, by the time I get to this point of the argument with someone, the conversation turns into “yes but… is it audible?”  As unfair as it might be, I am going to punt that question.  Here is the thing.  It doesn’t cost much to get this right.  It is like asking me why it is bad to drive a car with a slight imbalance in the tires.  Why should I do that instead of getting the tires balanced?

Look at the Onkyo or Yamaha AVR above.  If you are listening to a CD from your Blu-ray player, why not use the S/PDIF connection and with it, enjoy much reduced jitter levels?  You get better fidelity without spending a dime.  When shopping for a new product, why not search out measurements of jitter and opt for the one with better performance?  And what is wrong with pushing your equipment manufacturer to do better with HDMI than they are?  Careful attention to the design here can reduce jitter.  It is the lack of consumer awareness that has led us to such excessive levels of jitter and less than optimal performance.

My wish in writing this article is not to convince you of audibility of jitter anyway but rather, the precision of communication in discussing performance of products.  Think of this video analogy.  If I have a 1080p display but I sit too far from it, I may not discern whether it is better than 720p.  Because of that, would you call that display 720p?  Of course not.  The same is true here.   We have to stop making arguments against cable effects and such because “digital is digital.”  The cable may not make an audible difference but that is no excuse for describing the system operation incorrectly in that way.  Architecturally, we have a rather complex system here and understanding how it works is an important part of being an informed consumer.
 
 Back to Articles