Digital Audio: The Possible and Impossible
By, Amir Majidimehr
[This article originally was published in Widescreen Review
Magazine.]
Pop quiz: Which of these statements would you say is true?
1. If you change the HDMI cable
between your Blu-ray player and AVR/Processor, the analog audio
output can change.
2. If you turn off the display
and video circuits in your Blu-ray player or AVR/Processor, the
analog audio output can change.
3. If you have a DAC with USB
and S/PDIF inputs, changing from one to the other can change the
analog output.
4. If you change from HDMI input
to S/PDIF on your AVR or Processor, the analog output can change.
I bet most of you reading this article consider all of the above
impossibilities and the talk otherwise, that of a charlatan selling
you “snake oil.” Well, I am here to tell you that the answer
is the opposite! And it is so as a matter of science and
engineering, not belief. Interestingly enough, the more
technical you are the more you are apt to think what I am telling
you is wrong. The reasons for this will become clear later.
Digital Audio is Not All Digital!
The first thing to understand is that digital audio is not
completely
“digital.” Yes, you read that right. The bit you are
familiar with is indeed digital, namely the numbers that represent
the audio sample amplitudes.
CD audio for examples has 16 bits for each sample or two bytes. Audio on
Blu-ray can go up to 24 bits or three bytes. As long as audio
samples are inside a digital device, they remain purely digital as
a set of samples to be reproduced sometime later. You can copy
them around as many times as you like and the stay pristine and
lossless just like your computer files. So far, they are the
digital data we believe them to be.
At the risk of stating the obvious, we do not hear digital samples
but rather analog waveforms. To create that analog waveform we
feed the digital audio samples to a device called Digital to Analog
Converter or DAC for short. That conversion needs two pieces
of information: the audio samples we have just described, and
their timing. The timing tells the DAC when that audio sample needs
to be output.
The reason timing is important is that
sampling theory mandates that our output samples match the precise
time we converted them from analog to digital originally.
If we maintain identical timing, we can show that digital systems have extremely high
levels of fidelity. Violate this principal though, and you
introduce distortion.
Think of the output waveform as a set of dots connected to each
other with the dots representing our digital audio samples. If
I take those audio samples and move them left and right, the
waveform would change. That is what jitter does. The
time defines the horizontal position of those dots and changing them
means distortion just the same.
As is often the case in real life, we cannot have perfect precision
in the timing that drives our DAC. Some amount of variability
always exists. We call this “jitter.” This is a
measurement of how much the timing source to the DAC varies from the
ideal value. The effect of jitter is determined by its swing
(amplitude), the frequency (of jitter and source audio) and its
spectrum/waveform (how it varies).
Jitter is often poorly characterized as a single figure of merit
with a unit of time such as “1.5 nanoseconds.” I say poorly
characterized because the other factors, especially the spectrum,
can be much more significant, often surpassing the amplitude of
jitter so represented. That said, I am going to
commit the same sin in the rest of this article and focus on that
one number as to get through this introduction to jitter without
writing a whole book.
You may be puzzled as to why anyone would worry about
jitter if the typical numbers are measured in nanoseconds.
After all, how would 1.5 ns which translates into 1.5 billionth of a
second make a difference in the output waveform? Surely our CD
audio, with relatively few samples of 44,100 per second, can’t be
sensitive to such small variations. It would be a mistake to
think so as the impact of jitter is frequency and sample depth
dependent. The latter is a big problem for us as audio samples
have comparatively high resolution. Let's take the example of 16 bits
in CD audio which translates into 65,535 sample values. Those are awfully small increments in the output
audio waveform whose magnitude is represented roughly by the voltage
of a single AA battery! 24 bits gets even crazier with 16,777,215
levels. Imagine the voltage of that single AA battery
now being divided into 16 million tiny divisions! For our digital
audio transmission/reproduction to be "perfect," it would need to
preserve all those increments.
How Much Jitter is Too Much?
One way we measure the impact of jitter is by making the simplifying
assumption of it being a sine wave (it is not many times but let’s
go with it). We can then compute how much it needs to be to
generate distortion equal to
the voltage represented by one bit of our audio sample as described
earlier. For CD audio, it would be the amount of jitter that makes 16 bit audio sample equivalent to 15 bits. The
idea here is that if we are trying to play 16 bit audio samples we
would ideally want our reproduction system to have sufficient
transparency to rise up to at least that level of transparency.
Jitter's effect on music is that it modulates all the tones in
it. Using mathematics which in the interest of not boring you I won’t
go into, we can model sinusoildal jitter as a signal that generates
two distortion products, one of which is the sum of the jitter and
our source frequency and the other, the difference between the two. Using this
model, and the fact that CD music has a response of roughly 20 KHz, we can compute
how much jitter it takes to overwhelm a single bit of our 16-bit
audio sample.
Performing the math, we arrive at the
unbelievable fact that jitter amplitude cannot be more than 0.5 nanoseconds!
You read that right. If timing variation of the DAC is more
than half a billionth of a second you generate enough distortion to
swamp one bit of your audio sample. It is not opinion that
says that. It is pure mathematics. And this is for the
simplest case of jitter, not the more complex but typical scenarios
where there are many jitter frequencies and spectrums acting on your
your music.
To visualize this obscure explanation in practice, here is a
measurement of jitter acting on a single tone as created by the late
Julian Dunn which was published in the digital audio measurement
handbook of Audio Precision Audio Analyzer:
The 10 KHz tone is our source frequency. The distortion
sidebands (smaller peaks on each side) are purely the result of changing the
timing clock of the DAC by a sine wave at a frequency of 3 KHz
with an amplitude of 5 nanoseconds. This is for a system
with 20 bits of resolution and hence, has a noise floor of -120 dB.
Jitter distortion of -80db is hugely above that and has reduced the system performance to
well under what CD can do at 16 bits (96 dB). So even though 7.6 billionths of a
second sounds like a very small value, its impact in distorting our
audio signal is quite significant.
Jitter Sources
The next question becomes what causes jitter. The answer is
that it can be any and all things. DACs are usually
inside what is normally a very noisy environment. The same
power line that feeds our DAC and its clock eventually also feeds
high-speed digital circuits such as microprocessors/DSPs,
video circuits, front panel high
displays which often use voltage circuits, etc. While there is
usually filtering on both the power lines and in the DAC clock
circuits, one cannot eliminate all variations and
those variations translate into tiny changes in the DAC clock.
For this reason, it is possible to reduce the jitter level by turning off
unnecessary circuits. The so called
“Stereo Direct” mode and such in some products performs this
function as they turn off video, shut off front panels, and such.
This will often have a measureable effect on the DAC output
and its performance.
But that is not all. Some jitter is induced even before the
digital samples arrive at the receiving device/DAC. Take the
S/PDIF digital audio cable. The audio bits travel on it as a
series of “ones and zeros.” But no cable can reproduce the
perfect square wave with pulses going from zero to their final value
in zero time. The cable and its driving and receiving circuits
distort these waveforms and make them noisy pulses that take some
time to go from low value to high. To capture these values,
despite such electrical distortion, the receiver samples them at
“zero crossing.” That is the time that a waveform crosses a
threshold that tells us if it is a “one” or “zero.”
The above lets us capture the digital sample values but when it comes to
detecting their timing we are faced with a tough situation.
You can see this visually in this second measurement by Julian Dunn
as he shows what happens when you try to use an ordinary audio cable
for S/PDIF rather than one designed for that purpose:
For starters on the left, you see that there is nothing resembling
perfect square waves or pulses of zeros and ones. The image on the right is the amplified
version of the zero crossing point. You see that as the
waveform moves slightly up and down (which it does routinely as the
bits change), the precise moment that it
crosses our horizontal reference line also changes, and with it our
timing varied as seen by the receiver. We call this “cable
induced” jitter. Now you know why I said at the beginning that
it is possible that changing cables alone could cause the analog
output of the system to change. The cable change will change
the above waveforms and with it, the jitter transmitted to the
external DAC (e.g. from your Blu-ray player to your AVR).
I know what you are thinking, especially if you have an engineering
degree. Why not capture the samples and then output them using
a new, high precision clock? Indeed that is what occurs in all
of our digital systems such as computers where such variations exist
just the same. As long as the level of jitter is low we
capture the samples and life is well.
Unlike your
computer, in digital audio our job does not finish there. We
must convert those samples to analog. Unfortunately that
conversion cannot occur using a new oscillator. Let me repeat
again: you cannot use a new clock to drive the DAC. You must
instead use the timing as it was arriving on the input to the
system. Our systems put the source as the "master" meaning it
controls
how fast or slow audio samples must be played. This is because
our sources can vary from the advertized sampling rate. As an
example, just because your
Blu-ray player says the movie soundtrack is at “48 KHz,” it does not
mean that there are 48,000 samples per second. When the movie
was mastered on disc, as part of syncing audio with video, it is
entirely possible that a few more or less samples per second were
put in the package than this “nominal” value. Therefore, even though your processor/DAC knows the audio stream
sampling rate, it cannot use it to derive its DAC clock.
Instead, it must “lock” onto the incoming source and attempt to
measure precisely how many samples are arriving per second and obey
its source of timing. If that number is 47,999 samples/second then that
is precisely what it must play. Not one more, not one less.
If the DAC deviates you will start to lose audio/video sync as you
would be falling behind or getting ahead of the video as created on
disc.
The same process exists even in the case of an audio CD as playing
too fast can cause you to run out of audio samples to play or, if
you go too slowly, you will eventually have too much data on your
hand. To make this system work, the DAC in your receiving
device has a local clock that it varies slightly to match the
incoming data rate using the zero
crossing method I explained earlier. Unfortunately this means
that the DAC follows the timing
variations that exists upstream from it.
So the often stated notion that jitter doesn’t matter because the
receiver captures the data and puts it in a “buffer” (memory) and
then plays it from there is wrong. Yes, audio samples are
captured and stored for convenience until played. But no, that fact
does not eliminate the effect of timing variations as the DAC must
output the samples using the timing prior to that buffer capture.
The engineers reading this quickly point out that the receiver has a
circuit called PLL that has the job of creating an adjustable clock
for the DAC. And that the PLL is able to filter out timing
jitter on its input. Alas, in practice the PLL is not able to
fully filter all the variations. Most implementations remain
sensitive to low frequency jitter which unfortunately is the type
that can be audible. The reason for this is that the lower the
frequency that the PLL filters, the slower it is able to “lock” to
the incoming rate in order to capture any data. In other
words, the act of removing jitter causes the device to take longer
to start playing anything. That then puts an upper bound on
how much filtering can occur in the PLL. You may have seen this in
some processors/DACs where you change the input and it seems to take
a long time for it to start playing.
There are smart but more complex solutions to the above
problem. After many years of building devices with the S/PDIF
interface, manufacturers have mostly figured out how to suppress
jitter to very low and acceptable levels. Then came this
interface called HDMI which set us back many years in this respect.
Jitter over HDMI can be as much 10X higher than S/PDIF!
Making matters worse, jitter measurements of HDMI are hard to come
by. Audio magazines take jitter measurements but in that world
HDMI is a rare thing and at any rate, they don’t test many home
theater products. Video magazines don’t usually focus on audio
measurements for the most part, so there is no data there either.
The one exception is Paul Miller of UK’s Hifi News who performs
measurements of jitter on both interfaces. It is eye opening
to see the measurements of mass market products done side by side
this way. Here is an example measurement for the Onkyo
TX-NR5007 AV Receiver:
S/PDIF: 0.79 ns
HDMI: 4.87 ns
As you see, the jitter over HDMI is not only more than six times
higher than S/PDIF, it is also way above the maximum threshold for
16 bits of fidelity. No wonder then that Paul gives the
product a failing grade on that interface. Here are the
measurements on another AVR, the Yamaha RX-V3900, so that you don’t think the above is the
exception:
SPDIF: 0.183 ns
HDMI: 7.7 ns
Here we have excellent response on S/PDIF but HDMI is a whopping 41
times worse! There is not one measurement on Paul Miller’s
site that has better measurements for HDMI vs. S/PDIF. The
common ratio is 10:1 in favor of S/PDIF.
Invariably, by the time I get to this point of the argument with
someone, the conversation turns into “yes but… is it audible?”
As unfair as it might be, I am going to punt that question.
Here is the thing. It doesn’t cost much to get this right. It
is like asking me why it is bad to drive a car with a slight
imbalance in the tires. Why should I do that instead of
getting the tires balanced?
Look at the Onkyo or Yamaha AVR above. If you are listening
to a CD from your Blu-ray player, why not use the S/PDIF connection
and with it, enjoy much reduced jitter levels? You get better
fidelity without spending a dime. When shopping for a new
product, why not search out measurements of jitter and opt for the
one with better performance? And what is wrong with pushing
your equipment manufacturer to do better with HDMI than they are?
Careful attention to the design here can reduce jitter. It is
the lack of consumer awareness that has led us to such excessive
levels of jitter and less than optimal performance.
My wish in writing this article is not to convince you of audibility
of jitter anyway but rather, the precision of communication in
discussing performance of products. Think of this video
analogy. If I have a 1080p display but I sit too far from it,
I may not discern whether it is better than 720p. Because of
that, would you call that display 720p? Of course not. The
same is true here. We have to stop making arguments against
cable effects and such because “digital is digital.” The cable
may not make an audible difference but that is no excuse for
describing the system operation incorrectly in that way.
Architecturally, we have a rather complex system here and understanding how
it works is an important part of being an informed consumer.
Back to Articles