Understanding Analog and IP TV CCTV Cameras
By: Amir Majidimehr
One of the most significant technological changes in CCTV market is
the advent of so called “IP cameras.” Despite the fact this
technology has been available for quite a few years, there is scant
little objective information on what sets them apart from the analog
cameras that had powered the industry for decades before. The
complexity of what goes into design of an IP cameras is partly
responsible for this, requiring user to be fluent in everything from
advanced video compression to computers and networking.
Complicating matters is the confusing specifications and often
misleading terminology used to describe the basic performance of
CCTV systems. Simple terms like “lines are resolution” are used
where the intuitive meaning (how many lines there are in the
picture) actually is not what the metric measures! Add to this the
typical marketing hype and the picture becomes even muddier.
The purpose of this series of articles is to simplify these concepts
and distill them down to a level where you can make purchasing
decisions intelligently. While the coverage will be comprehensive,
significant simplification is applied as to make the concepts easy
to grasp, assuring that the proverbial “forest is seen from the
trees.”
With that introduction, now let’s take a “deep dive” into each
technology and what sets them apart.
Analog Camera Overview
As with any imaging device, the analog CCTV camera has a sensor
which captures the video image. The resolution of the sensor varies
but for reasons which will be described later, it is limited to
720x575. This is 720 pixels across the screen (horizontal
resolution) and 575 up and down (vertical resolution).
The video is captured at 60 intervals called “fields” and
transmitted to the receiver. Two fields together are called a
“frame.” This is called interlaced transmission. More on this later.
To get the video out of the CCTV camera into a recording and display
device, a single coax cable is used. To maintain compatibility with
analog televisions (and hence make it easier to use off the shelf
products for display and recording), the signal that comes out of
the camera complies with broadcast television standards.
There are two popular analog standards in the world for television:
NTSC (e.g. as used in North America and Japan) and PAL (used in many
other countries, especially in Europe). There is also SECAM but it
is not a common standard in CCTV world.
First thing to understand about NTSC or PAL is that the number of
horizontal lines that make up the picture (i.e. the vertical
resolution) is fixed by the specific standard. Let me repeat this
again: the number of lines is fixed and every source must transmit
that many lines to be compliant with the standard. As a result, when
you see a specifications for the number of lines a CCTV camera has,
it does NOT refer to vertical resolution which is capped by the
standard.
In the case of NTSC, the standard calls for 525 lines and for PAL,
625. However, not every line carries picture information. In
reality, the viewable number of lines is 480 for NTSC and for PAL,
575. Note that you may see variations of these numbers such as 486
for NTSC. This is due to some people rounding the number and others
not. For the purposes of this article let’s stay with the rounded
numbers as the extra accuracy doesn’t mean much in practice anyway.
Now let’s look at the horizontal resolution. Here the picture
becomes muddy, pun intended. What does resolution mean in the case
of an analog system which does not care about individual pixels of
light on your display?
If you look up the spec for an analog CCTV camera, you often see a
resolution specified in the form of “lines.” Could this be the
horizontal resolution of the camera as the name seems to imply?
Well, no! Before we can understand the definition of lines, we need
to dig more into the broadcast standard.
The NTSC TV transmission system relies on a display that has
elongated pixels and has an aspect ratio of 4:3. In other words, the
image is wider than it is tall. What does this have to do with the
“line” specification? Well, someone decided that the horizontal
resolution needs to be expressed in relation to vertical resolution,
trying to show what the horizontal resolution would have been if the
TV were square. I know this sounds strange but please don’t shoot
the messenger! I am only here to explain things not justify them.
Fortunately, the conversion from actual pixels to “lines” is much
simpler than understanding the motivation for it. Multiply the
horizontal resolution in pixels by 3 and divide by 4 to arrive a
number of lines. So for example, if you have 100 pixels in the
horizontal dimension, you only have 75 “lines of resolution” (100 x
3 / 4 = 75). In this regard, the rating understates the true
resolution of the system.
Be sure to not confuse the term “line” used in analog TV systems
separate from similar term used to describe different profiles of
the high definition TV (HDTV) standard. In that world the pixels are
square so lines is the same as resolution. But confusingly, the
rating refers to vertical resolution rather than horizontal! I know
this all may sound confusing. To keep things straight, just consider
“lines” as a metric for analog cameras. In higher resolution formats
such as HDTV and IP Cameras, true pixel resolution is used so there
is no confusion there.
An interesting question then becomes what is the highest resolution
that can be achieved in an analog camera? For this, we can look at
the highest standard in analog TV standard and that is what is used
in a television studio at say, a major network (or used to be before
transition to HDTV). There, we find that when analog TV is
processed, it is done in digital domain at a horizontal resolution
of 720 pixels. That number then sets the upper bounds for an analog
CCTV camera which is usually considerably inferior to the units used
in broadcast television.
Now you see why I mentioned that the maximum resolution of any
analog camera is 720x575. Even in a broadcast setting, we cannot
exceed the number of vertical lines in the standard, which is 575 in
case of PAL. The counterpart for NTSC is 720x480. Yes, PAL has
higher resolution but displays fewer fields per second (50 versus 60
for NTSC). If you have an analog camera which supports both
resolutions, you may want to opt for PAL setting to extract a bit
more resolution out of the camera in vertical dimension. Converting
the horizontal resolution of the broadcast camera in pixels to line
rating we get 540 (720 x 3 / 4 = 540). You may have seen this spec
advertised for analog CCTV cameras and now you know where it comes
from.
What does it mean if you see a number higher than 540? There can be
two reasons for that. One, the specification is in pixels in which
case, it can be up to 720. People working for camera companies often
get these metrics confused. Assuming the spec is indeed in “lines”
then it simply indicates the resolution of the sensor, NOT what you
can extract from it after the signal is digitized and sent out. This
means that extra resolution is wasted. Its only benefit is some
noise reduction.
Of course, nothing stops anyone from putting lower resolution
sensors in the camera and indeed, this is often done. Examine the
spec and if the line rating is less than 540, then the resolution is
lower than the highest it could be.
As they say, “but wait, there is more!” Turns out even the 540 line
spec is grossly overstated. So far we have been talking about the
sensor resolution and compliances of it with the standard. But there
is another part of the standard which deals with transmission of the
same over the air. You might wonder why we would care about that
part. After all, we are sending our video signal over a coax wire.
Well, the standard used over the coax wire in CCTV applications is
the same as what would be put on air by a network.
To make it easier (and reduce power consumption of the transmitter)
the standard allows that the signal to be reduced in bandwidth. I
will not bore you with the engineering details but there is a handy
rule that for every “Megahertz” of bandwidth for a radio signal, we
can carry 80 “lines” of video resolution. So to carry 540 lines of
the broadcast TV signal, we would need 540/80 = 6.75 MHz of
bandwidth. If you look into the specifications for NTSC however, you
see that the standard only allows 4.28 MHz So it goes without
saying that we are not able to transmit 540 lines (or 720 pixels).
To figure out what resolution we can transmit, we simply multiply
4.28 MHz by 80 and arrive at a maximum resolution of 340 lines for
NTSC (rounding down for simplicity). Yes, you read that right. The
camera which advertises 540 lines of resolution, cannot achieve more
than 340 once you look at the image that comes out of it over coax.
What the vendor is advertising is the raw resolution of the sensor
used to capture the video, not what can actually be achieved in a
real system when the output is viewed over that coax wire. That
extra bit of resolution cannot be extracted out of the camera. It is
simply lost as soon as the video leaves the camera.
Using the bit of math we have learned so far, we can translate 340
lines back into pixels. The result is 450 pixels of resolutions (340
x 4 / 3 = 450), again rounding down. Are we there yet? Can we assume
that our total pixel resolution is 450x480 for NTSC and 450x575 for
PAL? Well, not quite! We need to re-examine the vertical resolution
because that is not what it seems either!
In order to reduce the amount of data that needs to be transmitted,
both NTSC and PAL employ a poor man’s form of video compression
called “interlace.” NTSC updates the picture on your display 60
times a second (PAL does so 50 times per second). But instead of
sending all of those 450x480 pixels in every instance, the system
transmits every other line in each transmission. These are the
fields mentioned earlier. The actual resolution then in each field
is 450x240 for NTSC, sent 60 times a second. At the receiving end,
we don’t display each field separately but rather, combine two
fields into one frame and display that. In the old analog TVs, this
was done by relying on your eye average the two fields being drawn
at their respective positions. In case of digital TVs and computer
monitors, they are combined in memory and then displayed as a whole.
In either case, it is important to note that the transmission occurs
at 60 fields of half vertical resolution, not 30 full resolution
frames. The same works for PAL except that the field rate is 50.
What does this mean in real life? Well, if you mount your analog
NTSC CCTV camera on a solid mount with zero vibrations and point it
at a static scene with nothing whatsoever moving in it (think of a
wall), then the maximum resolution of an interlaced system is the
same as progressive (where we transmit full frames of video all at
once). So the fact that the system is interlaced doesn’t impact us
as at all and we have a vertical resolution of 480. Reason is that
it doesn’t matter that we captured and transmitted the subject at
different times. Nothing moved 1/60th of a second later so we
preserved the full resolution of the image.
Now what happens if a car goes by? Well, now the camera captures odd
an even lines of that car in separate intervals (fields). Since the
camera is moving, when we sample the image 1/60th of a second later,
the pixels are no longer lined up with where they were in the last
field. The display then mixes these two and what you get is half the
resolution of the previous example in vertical dimension. The visual
artifact is jagged lines (every other line appearing to be out of
sync with the previous one).
Best way to think of this is to consider that analog TV standards
have variable vertical resolution. When nothing moves, they have
their maximum resolution (480 and 575 for NTSC and PAL respectively
at the sensor). But when there is high motion, you drop down by
half. And if there is slow motion, then you are somewhere in
between.
You may have heard of ways to de-interlacing an analog video signal.
The techniques vary based on sophistication and complexity of
implementation. One simple technique is averaging those vertical
pixels, resulting in softer images but without jaggies. At the end
of the day, it is very hard to undo the effects of interlace for
video source material. Yes, you may have heard that de-interlacing
works well in case of playing a movie in a DVD player but that is
because the source there is a movie and as such, was a progressive
source. Such is not the case for live TV.
Putting it all together, your analog NTSC camera can have a maximum
resolution ranging from 450x240 to 450x480. Total number of pixels
therefore ranges from 0.1 megapixels to 0.2 megapixels. No matter
what someone tries to do, and how much money they put in the design
of the analog TV camera, they cannot improve on this number. Period!
Needless to say, the low resolution forces you to be much more
careful in camera position and lens selection. A wide angle lens on
an analog camera covering a large field is unlikely to be able to
capture detail that is recognizable because the resolution simply is
not there.
Note that up to now we have been generous and assumed a perfect
transmission system from camera to the capture device. Such is not
the case with analog signals. Despite being shielded, the coax cable
can still pick up noise as can the analog capture hardware in the
recorder. The noise does more damage than you may intuit. One of the
enemies of video compression used in video recorders is noise. It
represents randomness which is very difficult to reduce in size. End
result is that the added noise results in recordings which may
suffer from more compression artifacts.
Are we done yet? Sorry to say no. On top of everything already
discussed we have to consider the fact that analog TV standards have
imperfections which introduce artifacts of their own. So called
“decoding errors” manifests in such things as false color where a
black and white image will bleed some color that is not in the
source. This is very visible in analog CCTV captures.
As you see compliance with analog TV standards severely limits what
we can do with CCTV cameras. The system works remarkably well for a
50+ year old standard but is nowhere near ideal for an application
where recognition of detail (e.g. license plate of someone’s face)
is paramount as opposed to enjoyment of a movie or TV programming
where fidelity may play second fiddle to the entertainment value.
So what is the solution? Simple: cut the cord with respect to
compliance with broadcast standard. We have a simple “point to
point” system where both ends are under our control. So we really
don’t need to use a universal broadcast TV standard, especially one
this old. Even the broadcast world has abandoned analog TV by
switching to all digital system with much improved resolution (in US
at least). In our world, that means “IP cameras.”
IP Cameras
An IP camera has an image sensor much like the analog camera.
However, once it has captured its image, it transmits it as “data”
over a network connection. That data is in the form of compressed
video frames sent over standardized networking protocol used for
computer applications and that is where it gets its name. “IP”
stands for Internet Protocol which is the low-level language used to
transmit data between computers in your home and the Internet. What
this implies then is that the IP camera is like a little computer
that you connect to, to access your video. Indeed, IP cameras are
computers and run operating systems not all that different from your
PC. Where they differ is that they are fixed function and their
programming cannot be extended by the user.
The fact that the camera uses IP for transmission is not very
important. What is important that we are no longer bound by the
broadcast standard. In theory, we could now have any resolution we
wanted. You could as easily envision a camera with 10,000x2x000
pixels as you can 800x800.Let’s drill into different technologies
used an IP camera and their impact on system functionality and
performance.
Sensor
Lowest end IP cameras use same sensors as analog TVs. In
other words, they have a resolution of 720x480 or 720x576. Some go
as far as even using interlaced sensors. While interlace is a fact
of life in analog camera, we cannot think of any reason to tolerate
them in IP world where interlace only hurts the image fidelity. So
where possible, avoid using interlaced IP cameras and instead, opt
for units with “Progressive” sensors. You can find this fact in fine
print of camera spec. If not, ask them or avoid the brand
altogether. It is a bad sign that they would not be forthcoming with
this information.
As the resolution climbs above broadcast level, the sensor type will
always be progressive.
By convention, IP camera companies advertise the resolution in
“megapixels.” To arrive at megapixels, simply multiply the
horizontal resolution by vertical and divide by one million. If a
camera has 1280x720 resolution, it would have 0.9 million pixels but
this is often rounded to one megapixels.
A useful feature of some cameras is the ability to capture a subset
of sensor data. Since an IP camera tends to have a lot more
resolution than its analog counterpart, we can still have ample
resolution left for the “area of interest,” allowing us to save hard
disk space in our recorder.
To put the resolution of the sensor in perspective, let’s look at
the specs for other types of video standards in use today:
DVD
The DVD Format was designed to deliver the same resolution used in
broadcast world for analog TV. So it has the same resolution of
720x480 for NTSC and 720x575 for PAL. You may have noticed how much
sharper and better the DVD quality is versus watching analog TV off
air (and analog cable). This shows you how much degradation
compliance with NTSC/PAL can cause!
Of note, DVD players may have
“S-video” and component outputs. Using these types of interconnect,
you are able to achieve higher resolutions than using the standard
single cable coax connection. S-Video requires two cables (one for
color and the other for black and white) and component three (one
for black and white and two for “color difference” signals).
However, neither one of these is in common use in CCTV world (Axis
has one camera model with component output for video previews). And
neither is the digital standard called HDMI used on newer
“upscaling” DVD players. The latter has severe length limitations
which would make it an unlikely choice for CCTV. But we digress.
Let’s compute the total resolution for DVD by multiplying its
horizontal and vertical numbers together. This gives us 350,000
pixels for NTSC and 414,000 for PAL (rounding for convenience).
Divide these by one million to get the “megapixel” rating of 0.35
for NTSC and 0.41 for PAL. In other words, even the best form of
standard definition video, free of NTSC/PAL limitations, has much
less resolution than even a camera phone!
Admittedly, the quality of
those pixels is far above a camera phone but you get the picture,
pun intended! Using the above numbers, a one megapixel IP camera will
deliver three times more pixels than NTSC DVD. Note that this is NOT
three times more pixels in either dimension: that would result in
nine times higher resolution. Rather, we have square root of three
or 1.7 times more pixels in either dimension. This is a good time to
also talk about why some IP cameras come in “VGA” resolution. VGA
refers to a specific PC resolution of 640x480. This resolution is
also considered “square pixel” version of NTSC video.
You might
think that a VGA resolution IP camera would be inferior to its full
resolution analog counterpart. But such is not the case since the
VGA resolution is transmitted all the way to the receiver, devoid of
NTSC/PAL artifacts or reduction of resolution. Indeed, most people
are shocked by how much cleaner a VGA IP camera image can be
compared to even the best analog CCTV cameras even though the market
specs indicate not.
High Definition Television (HDTV)
The US digital TV standard comes in various flavors but the most
common are “720p” and “1080i/p.” “P” means progressive and “i”
interlaced. So 1080i means 1920x1080 resolution in interlaced format
which is used for most broadcast HDTV signals. 1080p has the same
resolution as 1080i but as the name indicates, is a progressive
format. It cannot be used in broadcast HDTV but is used in Blu-ray
Disc format. Doing the math again, 720p translates to roughly one
megapixel. And 1080i/p translates into two megapixels. So even
though we have made quite a jump from NTSC/PAL formats in moving to
HDTV, we are way short of state-of-the-art in sensor resolution as
you will see below.
Point and Shoot Cameras
These cameras come in various resolutions but even a $100 one is
likely to boast 3-5 million pixels. Many come at resolutions above
these. Isn’t remarkable that such a cheap camera has more resolution
than HDTV and Blu-ray Disc to say nothing of multiples of an analog
CCTV? Yes, it is capturing still images but many also support video
these days.
Professional still image cameras
These cameras show us where we could go as far as resolution. As of
this writing, high volume professional (DSLR) cameras boast
resolution above 20 Megapixels and specialized units exceed 60
Megapixels. Even lower end cameras (under $1000) now have
resolutions above 10 megapixels and some even support real-time
video capture and compression (although limited to 1080p today).
What’s more, these cameras have superb dynamic range and
sensitivity. This is due to use of much larger sensors than what is
used in CCTV cameras. But there is no reason why they could not be
adapted to CCTV applications (although the cost of both the cameras
and lenses would go up appreciably).Of note, a number of pro cameras
such as Canon’s entire DSLR range use CMOS sensors, debunking the
myth that CMOS sensors used in IP cameras is somehow inferior when
it comes to low-light performance.
So what is the extra resolution good for? For one, it gives you the
ability to zoom into the image much more without it turning into a
soft and fuzzy image. Detail like a license plate will be much more
recognizable at 3 megapixels, versus 0.3.
Turning the above upside down, you can choose to have the same
resolution but have it cover much wider area. The same 3 megapixel
camera can cover the same area as three analog cameras and still
have more resolution to boot. Of course, details matter as far as
lens selection and positioning but as far as pure resolution is
concerned, we can save a lot of cost in camera installation by using
fewer cameras.
Note that sensor resolution is not the only metric for image
quality. Lens quality and low-light ability can impact effective
resolution. For example a lens that is soft in the corners is likely
to offset the increased sensor resolution in that area. As the
resolution goes up, it becomes progressively more important to pay
attention to these details.
On light gathering capability, all else being equal, as you increase
resolution the size of “photosites” (elements that capture light in
the sensor) get smaller resulting in higher noise figures. As noted
earlier, one can compensate for this by enlarging the size of the
sensor. The downside is that this also increases the camera cost
(and hence the reason you don’t see a $200 webcam come with a large
sensor). There is also special processing which can be done to
reduce noise although this tends to lower effective resolution of
the camera.
Note that just because two sensors are of equal size, it does not
mean that they perform the same. A quarter inch sensor may be as
good as a lower quality one that is one third inch. Lux ratings of
camera often lacks all the metrics needed to evaluate the camera
sensitivity (e.g. shutter speed). As a result nothing replaces
independent evaluation of the unit to gauge how well the camera
works in low light environments.
Video Compression
Uncompressed video takes considerable amount of
data to store and transmit. Even in standard definition, the numbers
can be huge. Take DVD. At just 720x480 resolution, times 24 frames a
second (used in movies), we are talking about 132 megabits/sec of
data. If you have a typical of broadband connection of say, 3
mbit/sec, your link is 40 times slower than what is needed to watch
DVDs without compression!
Luckily, video is very amenable to compression. Frames of video
themselves have a lot of redundancy in them as do sequence of
frames. Take a blue sky. Chances are a lot of pixels are the same
and can be described using fewer bits of data. We call this
“interaframe compression.” JPEG is a form of interaframe
compression. Send a sequence of JPEG frames and we call that Motion
JPEG or M-JPEG for short. JPEG is very cheap to implement and hence
the reason it is universally offered in IP cameras.
A much improved version of JPEG is JPEG-2000. As the name implies,
this is a later edition of still image technology and brings with it
much better compression ratio for the same image quality (about
2:1). JPEG-2000 also brings “scalable” compression technology
allowing you to extract subsets of the image fidelity in exchange
for less bandwidth. For example, when you are on your cell phone,
you could take a lower fidelity image but have it stream properly
with much lower bandwidth to your phone.
Another form of compression is “interframe” compression. This takes
advantage of redundancy between frames. MPEG-2, MPEG-4, H.264 (also
called MPEG-4 Part 10 or MPEG-4 AVC), and VC-1 are popular
compression standards of this type. At high level, these systems
perform a similar function to JPEG in compressing an individual
frame. But they also look to see if the current frame is similar to
the one before it. If so, then they only transmit what is different
and the decoder combines that information with the pervious frame to
display the image.
For example, imagine a person walking in front of a building. The
building is not changing in every frame. The only thing changing is
the pixels describing the person moving. The above systems divide
the screen into blocks and then track whether each block moves. If
it has, it then tells the decoder to move that square, rather than
having to retransmit the whole image. The decoder holds on to the
previous frame(s) in order to be able to perform this processing .
The amount of compression is not predictable and is picture
dependent. A static image achieves the highest level of compression.
A noisy night-time image with lots of motion will probably be the
lowest.
While a scheme like JPEG can provide 10 to 20 times data reduction
without a lot of fidelity loss, systems using MPEG-4 AVC can ratchet
this up to 50 or even 100 times compression. In a future article I
will talk more about video compression and what things to watch out
for there as there is no free lunch here.
Networking
Once we have a video image compressed, we need to transmit it where
it is going to be viewed or stored. The favorite method of physical
connection is an Ethernet port. Being the most common interconnect
scheme for computers for a number of decades, one gains incredible
economies of scale in this manner. And with advent of Power over
Ethernet (PoE), IP cameras can be powered using the same Ethernet
wire.
There is some folklore around lack of bandwidth to distribute video
over Ethernet. In reality the opposite is true. A typical IP camera
has a data rate of 2-3 megabits/sec. So even the old standby, 100
Mbit/sec Ethernet has ample bandwidth to carry the signal from many
cameras over the same wire. In reality, you would be using an
Ethernet switch meaning each camera gets its own private 100 mbit/sec
so there is no congestion at all. Yes, the final link to the
recorder needs to be able to capture data from all the cameras but
with advent of ultra-low cost gigabit Ethernet switches, there
really is no barrier to deployment of large number of IP cameras.
And of course, being “data” and digital, we are immune to noise over
the cable, unlike analog cameras.
Modern IP cameras provide a range of interfaces to extract the video
from them. The simplest form is an included web server in the camera
which you can connect to using any browser and view (usually motion
JPEG) videos directly.
While operating in the browser provides broad level of
compatibility, it can be limiting from functionality point of view.
For this reason, camera companies also provide plug-ins called
“ActiveX controls” in Windows lingo, which are little applications
that know how to talk to the camera. These controls are like the
Flash player used on the Internet to play audio/video streams. Next
method is through a software development kit (SDK). This is a
computer library that application developers use to talk to the
camera. The SDK is used for example by third-party DVR software to
capture video and control the camera. Without the SDK, third party
integration is not possible.
Many cameras have the ability to upload their videos directly to a
networked storage device, whether this is a NAS (Networked Attached
Storage) or a PC server. Others can email you select video frames,
or “ftp” the stream to an Internet server.
On the control front, that is done through software. Instead of
running wires to control the functions of a PTZ camera, you would
use the software interface to send the same commands to the camera,
saving wiring costs.
Summary
I hope you now have a better understanding of how severely the
performance of analog CCTV camera is capped by the way it has to
work (i.e. compliance with broadcast standards). By using a data
connection and computer networking, IP cameras provide much
better performance with no real limitations for future growth in
resolution or other capabilities. The increased picture detail
allows one to save money by installing fewer cameras, or gain a
level of detail that simply is not achievable through an analog
camera. The fact that the camera can be accessed directly without
the need for any special software is the icing on the cake.
Back to Articles