14210 NE 20th St
Bellevue, WA 98007
Phone: (425) 440-6206

      Home
      Why Us
     Products
      Services
       Gallery
      Articles
     Company
       Jobs
      Contact

Understanding Audio/Video Formats

By Amir Majidimehr

Note: this article originally was published in Widescreen Review. This is a revised and updated version.

Get ready to learn the most fundamental concept in audio/video which sadly, most people don’t understand or get badly wrong. I can’t blame them as properly understanding them requires pretty deep understanding of how these content streams are captured, encoded and transmitted. This is useful knowledge to have as you would want to know how many songs or movies you can store on your 16 Gigabyte tablet or 4 Terabyte video server. Turns out that with some simple math and basic concepts we can learn everything we need here.

Let’s start at the top. My assumption in this article is that most of you know that there are eight bits in one byte. Okay, if you didn’t, don’t feel bad, as even the people who are supposed to know such as magazine writers and technical people in the field often confuse bits and bytes. As you can imagine, with almost an order of magnitude difference between these two terms, it is super important to get them right.

To avoid the above confusion I rarely abbreviate “b” for bit and “B” for byte, as is often done, and instead will spell them out as bits and Bytes. Since we usually deal with larger numbers of these, the prefixes Kilo, Mega, and Giga are added to represent thousands, millions, and billions, respectively.

Since these are computer concepts, the units here are not decimal but rather, binary. This means that “kilo” is actually 1024, not 1000. Throughout this article I will be using the familiar decimal versions of these numbers. The world won’t come to an end if we are off by 2.4 percent.

Digital Audio Formats

To record audio on CDs, we have to convert the analog audio signal to digital. The “sampling rate” for this application is 44.1 KHz, which means that we digitize the analog value 44,100 times per second (frequency is measured in Hz or cycles, which means one sample per second in this situation). CD is stereo which means it has two independent channels. Each audio sample in turn has 16 bits of resolution, or two bytes. So at any instant, we have four bytes of information and 44,100 instances per second.

If we multiply 44.1 KHz (the sampling rate) by two (number of channels), and then by 16 (bits of resolution), we get the data rate of CD in bits per second. The result is 1,411 Kbits/sec (“kbps”) or roughly 1.4 Mbit/sec (Mbps). Converting this to bytes, we get about 176 KBytes/sec.

When a CD drive is used for data the industry uses 150 Kbytes/sec as its reference baseline speed accounting for some overhead used for error correction and such. This has become a marketing spec for optical media where the speed of the drive is specified by a number followed by “X” with X representing this 150 Kbytes/sec. So if you see a “20X” drive, it means that it can read at 20 * 150, or 3 MBytes/sec. I said this was a marketing spec because in reality the drive speed is variable depending on the track being read so average speed across the entire media (e.g. when you rip a CD) is lower than this number.

So far we are in the traditional A/V realm. Let’s jump into the new era. Here, we are talking about things like “128 kbps MP3.” This audio stream is exactly what it says it is. That is, the music file is represented as 128 Kbit/sec, compressed in MP3 format. This compares to our original CD source at 1.4 Mbit/sec.

To figure out how much the file is compressed, we simply divide the MP3’s data rate (128 Kbits/sec or 0.128 Mbits/s) by the CD’s data rate (1.4 Mbps) and arrive at 0.09. In other words, the MP3 file represents only 9 percent of the data of the original source. Perhaps more interesting is the inverse ratio, which tells us that a whopping 91 percent of the original information has been thrown out and you are hearing what is left! While as audiophiles we want our music to be of higher fidelity than 128 kbps MP3, it is remarkable how much quality is preserved relative to so little bits contained in that file.

Keep in mind that the sampling rate and bit rates of compressed files are two entirely different things. A 128 Kbits/sec MP3 has the same sampling rate as the uncompressed music. Same is true of 256 Kbit/sec MP3 and 384 Kbits/sec. They are compressed versions of the same 44.1 KHz audio stream. So don’t make the common mistake of talking about the bit rate of the file as sampling rate.

Back to our original uncompressed CD, if we multiply its 176 Kbytes/sec data rate by 3,600 (seconds in one hour), we get the total space consumed for one hour of music which is 630 MBytes (rounded to 650 MBytes to include overhead).

Now let’s apply the same math to the 128 Kbits/sec MP3. The 0.128 Mbit/sec must be divided by eight to convert it to bytes and then multiplied by 3,600 to get the capacity requirements. This adds up to 57.6 MBytes/hour, showing the remarkable saving in storage size when using “lossy” audio compression. This is the reason that solid-state “flash”-based music players and phones can hold so much compressed music. For a typical three-minute song, a 128 kbps MP3 would take up 2.88 MBytes of space. So if you have an even small 4-Gigabyte flash memory player, it can hold 4,000 / 2.88 or 1,388 songs. The same player would only hold 125 songs in the original uncompressed format of the CD.

Let’s put this in the context of audio for DVD and Blu-ray Disc. In this application we typically have “5.1” channels of content to create a surround experience. The notation means we have five (5) full frequency channels and a sixth low bandwidth channel indicated by the “.1.” Note that we don’t really have 5.1 channels as a mathematical figure because the low-frequency channel does not equate to 10 percent of the full bandwidth main channels. But for the sake of simplifying our life, let’s pretend that it does use 10 percent as many bits to do its job and use the number 5.1 just like we used 2.0 for stereo computations.

For sampling rate of surround music, the standard in the industry is to deliver 48 KHz as opposed to 44.1 used in CDs. The sample resolution can be 16, 20, or 24 bits.  Assuming 20-bit samples, the math becomes 5.1 (channels) * 48 (sampling rate) * 20 (bits of resolution) = 4.9 Mbits/sec. To figure this out for 16 and 24 bit audio samples, simply swap out the 20 for those numbers.

If we take the uncompressed data rate of 4.9 Mbit/sec and divide it by 8 to get bytes/sec, then multiply by 3,600, we get a capacity requirement of 2.2 GBytes/hour. A two-hour movie would then need 4.4 Gigabytes just for the audio, or nearly half the capacity of a standard DVD!

Movies on DVD therefore are compressed using Dolby® Digital (AC-3) compression at typical data rate of 448 kbps (or optionally using DTS at higher data rates). Let’s compute the compression ratio as we did with MP3. We simply repeat the same math by dividing 0.448 (data rate of Dolby Digital) by 4.9 (data rate of the uncompressed audio) and get 0.09. So as in the case of MP3, quite a bit—91 percent—is thrown out in the process of compressing the multichannel audio. DTS® Digital Surround™, in contrast, at 1.5 Mbit/sec, would represent 30 percent of the original, or a very mild 3:1 compression ratio (although “half-rate” DTS at 750 Kbit/sec is also applying a reasonable amount of compression at 6:1).

Let’s put things in perspective in a different way. If we divide 448 Kbits/sec by 5.1 channels, we get 88 Kbits/sec allocated to each channel on the average. If we had a stereo track at the same rate, it would be two (2) channels * 88 (data rate), or 176 Kbits/sec. So, this Dolby Digital encoding has a 50 percent higher data rate than the 128 Kbits/sec MP3. While the compression techniques are different between the two formats, one can still see that the 448 kbps Dolby Digital is able to “breathe more,” as far as data rate is concerned as compared to 128 Kbps MP3. So in theory, this encoding is more transparent to the source.

When it comes to Blu-ray disc, we have a third option: lossless encoding. This is a process by which the audio data rate is reduced but the full fidelity maintained. Think of it as compressing your files on your computer and how you can get them back intact after decompression. Dolby TrueHD and DTS-HD™ Master Audio are both lossless surround audio formats supported optionally in Blu-ray Disc for this purpose. The price we pay here is that lossless compression is far less efficiency than lossy techniques such as MP3 and Dolby Digital. Achieved compression ratios are about 2:1 for music, reaching up to 3:1 for multichannel movie sound. The efficiency becomes higher with more channels and non-intuitively lower at higher sample resolutions (e.g. 24 bits compared to 16 bits). Using a rough figure of a 2.5:1, we save a whopping 2.6 GBytes of space for our two-hour movie.

Digital Video Formats

As with our audio example we need to first understand how the analog video signal is captured and encoded. In this case let's review how standard definition video (SD) is encoded for broadcast applications and DVD. Here we are talking about 720 horizontal pixels and 486 vertical pixels, which is often rounded to 720 x 480. Multiply the two numbers and we arrive at 345,600 pixels in each frame of video. Converting this to millions and rounding, we get 0.3 million pixels per frame. Yes, what you are thinking is right. That the resolution of your 6 megapixel camera in your phone is a whopping 20 times higher than SD video! But wait, are you watching that DVD image with the same resolution on a 50-inch display or a 8-foot projection screen? Hmmm.

Movies are recorded at 24 frames per second. So in every second we have 24 * 345,600 = 8,294,400 pixels. To compute the data rate we need to know the number of bits allocated to each of these pixels. Computer users are comfortable with the concept of RGB color pixels which is 24 bits for each pixel. Eight (8) bits are allocated to Red, Green, and Blue "sub-pixels."

When it comes to storage and transmission of video for home video we do not use RGB but a different scheme that separates color from the black and white portion of video signal.  We do this because our eyes are less sensitive to color resolution than black and white.  By keeping these two separate we can then decide how much data to allocate to each one. The black white sample is called “Luminance” and  the color, “Chrominance.” We shorten the former to "Luma" and the latter to Chroma.  You should do the same if you like people to think you are a video engineer :). 

The notation that tells us how much data is allocated to luma and chroma is a triplet such as 4:4:4 which in this example means that the color and black and white samples have the same resolution. For the reason just explained, this is not a very common format, even in the broadcast world. The most often used format is 4:2:2, which means that we use half as much bandwidth for color.  This means that if there is a sharp transition between two colors it will look softer than the same two transitions between two shades of gray. 

The format used for distribution of content to consumers—whether it is over digital broadcast, optical disc (SD or HD), or the Internet/IPTV—is 4:2:0. This means that we have a quarter of the resolution for color, as compared to black and white.  This is fair bit of compromise in color fidelity but given the fact that you probably did not know this fact but still enjoyed the high definition movies at home, the eye sensitivity factor is doing its job.

With me so far? Good, because we are not done yet. We still don’t know how many bits each color or black and white sample occupy. In the computer RGB world typically each color component has eight bits. In broadcast/professional video, we use either 10 or 8 bits with the former being preferred. For delivery to consumers the 8-bit format is the only one used.

In the case of 4:2:0 video encoding used in DVD, Blu-ray and Internet delivery then, each video sample takes 12 bits on the average, 8 bits for Luma and average of 4 bits for Chroma.  The color samples in reality are eight (8) bits each but since they are spaced out, they average to 4 bits.

Now we are ready to compute the data rate of our movie stream. Multiplying 8,294,400 pixels in an SD video by 12 bits of resolution and rounding, we get 100 Mbits/sec. To put this in context, the United States ATSC digital broadcast standard for high-definition video provides for 19 Mbits/sec, yet we just learned that standard-definition video in the uncompressed domain takes up more than five times as much to transmit! At the risk of stating the obvious, we are talking about much bigger numbers than audio.

Speaking of high-definition (HD) video, let’s compute its data rate. The highest approved resolution for today’s HD home delivery of video is 1920 x 1080.   This is called either 1080i or 1080p depending on whether each frame of video is sent as half a frame in each instance or full frame (interlaced or progressive).   Movies are stored progressively so let's assume that for simplicity.  Using the newly learned math, we get 2,073,600 pixels per frame or about 2 million. Comparing that to SD video, we see that we have six (6) times more resolution.  This is a pretty big step up but at two megapixels it still pales in comparison to even the cheapest digital still camera.

Continuing on with our math homework we arrive at 597 Mbits/sec for the total data rate of this 1080, 4:2:0 source at 24 frames/second. Converting this to bytes gets us 75 Megabytes/sec. Therefore, a two-hour movie takes 400 GBytes of storage if uncompressed! Now contrast this with just 9 GBytes available in DVD, and 25/50 Gigabytes in Blu-ray Disc (single or double layer), and you see that compression has to be your friend or we would never be able to delivery such resolution video to you at home. And unfortunately lossless video compression need not apply as we are way past 2:1 or 3:1 that can be provided there.

Once compressed, a typical movie encoded for DVD uses an average of about 5 Mbits/sec using MPEG-2 compression. Dividing this by the source data rate of 100 Mbits/sec we get 20:1 compression or 5 percent of the source data rate is represented in the final output! We should be thankful that video compression is as effective as it is.

Internet delivery of content uses more advanced video compression technology, such as VC-1/WMV-9 or MPEG-4 AVC. Unfortunately the efficiency is used to lower the data rate not to improve fidelity and severely so.  Typical bit rates used for SD video delivery may be around 2 Mbits/sec representing 50:1 compression. In this case, an incredible 98 percent of the source is thrown away!  Achieved video fidelity often falls short of DVD.

For high definition video on Blu-ray Disc, there is wide variation in data rates and movie sizes due to much larger disc capacity and the amount of auxiliary content that may exist. As a decent guess let’s use an average data rate of 20 Mbits/sec. This translates to 3% of the uncompressed 1080p/24 video source (20/600).  As with DVD though, pretty good quality can be achieved despite such high levels of compression due to tremendous amount of redundancy in video.  Imagine a person walking in front of a house.  The house never moves so we can transmit that once and keep repeating it.

Now let’s look at some communication speeds in order to figure out what quality we can fit in those channels. A 1.5 Mbit/sec DSL broadband Internet connection has a maximum data rate of what it states: 1.5 Mbits/sec. Recall that uncompressed SD video has a data rate of 100 Mbits/sec or 66 times higher. This means that a two-hour movie would take 132 hours to download without compression. At the 5 Mbits/sec compressed data rate used for DVD video, the time drops immensely to six hours. Using advanced video compression, the stream could be downloaded in real-time (i.e. two hours in our example) if encoded at 1.5 Mbits/sec or allow us to "stream" it in real time and watch the video without storing it.

1080 high definition video as encoded on Blu-ray Disc at 20 Mbit/sec average would overwhelm just most broadband connections today especially since it has unpredictable peaks as high as 48 Mbits/sec. The reduced rates used for HD video on the Internet are sober consequences of this.

So, there you have it. Amazing what you can do with “elementary math,” no?

Further reading: Digital Audio/Video and Communication Rates

Back to Articles