Multi-Core CPUs
By Amir Majidimehr
I still remember the day…. I was in an executive meeting at
Microsoft where the “bad news” came to life. CPU companies -- the
people who make the brains of our computers -- were going to take a
step backward by adding “more cores” to their processors. Bad news
you say? Yes, I said bad news. And bad news it was.
To understand why, we need to first examine life before multi-core
CPUs. This was the era of the CPUs constantly getting faster through
more efficient instructions execution but more importantly by
turning up the “clock speed.”
What is the “clock speed?” Digital circuits perform their duties on
every passing of a “tick.” Think it of the seconds hand on the
traditional clock in your house. Except that we are talking billions
of them per second today. CPUs weren’t always that fast. My first
“nice” computer was the Apple II. It ran at a whopping 1 MHz or one
million clock ticks per second. Today’s computers are two three
thousand times faster than that! Hence the reason we use Gigahertz
to describe the clock speed.
The speed increase came courtesy of semiconductor process which kept
shrinking the size of the transistors and with it giving us both
increased space for circuit logic and faster clock rate. As with
many marketing messages, “bigger is better” so the meteoric rise in
clock speed became an essential part of computer and CPU business.
In 2001, IBM introduced a new chapter in CPU design in the
form of a “dual core” processor called Power 4. It housed essentially
two identical CPUs in a signle package. Systems with multiple
CPUs existed for many years prior to that but they required multiple
physical chips and complex external electronics to interface them to
the common system resources such as memory and input/output devices.
By building two CPUs inside the same chip, the system design became
much simpler.
The bad news meeting was not about IBM doing this but Intel taking
the same route with the introduction of their dual-core part, code
named “840 Extreme Edition” in 2005. So for the first time, there
was a PC CPU with two cores inside one package. And with it,
ushering of the era of mass market computers sporting dual core
CPUs. Still wondering what the bad news was? If a manufacturer
doubles the number of cylinder in a car, no one thinks there is a
step backward in performance but unfortunately that was the case
here.
Two CPUs meant double the power consumption. Power consumption is
the enemies of a CPU. There is only so much heat that can be
dissipated in a personal computer. A CPU taking 60 watts of power
for example puts out as much heat as a 60 watt light bulb. Have you
ever felt how hot a bulb of that wattage can get? If not, don’t try
it. It is hot! Now imagine doubling that yet again with two cores.
The solution was to back off the clock speed as the technology
trickled down to higher volume version of Intel CPUs. You probably
still wondering how that can be a bad deal. If I told you the clock
speed went down 20% but you got two CPUs, you would think you still come
out ahead, right? 2 x 0.8 = 1.6 so you have 60% more total computing
power. Or do you?
The trick to the above equation working is the utilization of both CPU. If I can
only use 10% of the power of the second core, then I have actually
lost 10% overall execution power.
Question therefor becomes the ability of the programs you use to utilize
the power of more cores. Unfortunately writing programs that take
advantage of multiple execution units is considerably more difficult
than writing ones that only use one CPU core. The reason is that if
two CPUs are running the same program they are liable to corrupt
each other’s data without exceptional care on behalf of the software
developer.
A truism in software development is that most of the effort in
writing a program goes into troubleshooting it, not writing it
originally. Such work can be challenging as is. Add to it multiple
cores acting on the same program and the job becomes extremely
difficult. So all else being equal, a company or software developer
will opt to write programs for one core, not multiple.
Even when a program is designed to use more than one CPU core, it
doesn’t mean that it is able to fully utilize both. Let’s say your
word processor is designed that way. This won’t do you any good
because the word processor is always waiting on keystrokes. And
while waiting, it is not doing anything/not using the CPU. Once you
type a character, it does some work but speeding that up is of
little value since single core CPUs are plenty fast to keep up with
even the fastest typists. Therefore having more than one core act on
that task doesn’t result in any speed advantage for the user. Hence
the lack of motivation to create multiple-core versions of such
applications.
There are applications at the other extreme with insatiable appetite
for computing power. A common example is image/photo processing such
as Adobe Photoshop. That task lends itself beautifully to
multi-core optimization as the image area can be divided into
segments and each core can work on that without worrying about
stepping on the toes of any other. The larger the image, the more
cores can be utilized effectively and we get near linear speed up as
we add more cores.
Closer to home is encoding audio and video. The function of
compressing these data types is inherently CPU intensive. So a
number of implementations have come about that use two or more
cores. For example when you rip your music using Windows Media
Player with WMA audio codec, it allocates a core to each channel of
audio for stereo encoding. As a result the process goes twice as
fast than if you had one core. Alas, CPUs are so fast relative to
the speed of your optical drive that ripping speed does not change
overall. But should you be encoding from a faster source such as a
file on disc, the addition of a second core will likely speed up
encoding to some degree. The effect is even larger for video where
the computational requirements are orders of magnitude higher. For
this reason, you often see video encoding as one of the benchmarks for
multi-core CPUs.
By the way, we call applications that take advantage of
multi-core/multiple CPUs “multi-threaded.” A thread is a path of
execution in a program. Multi-threaded programs have multiple paths
that are executed by multiple cores. If you don’t have multiple
cores, the program still works as the one CPU jumps around executing
all the threads.
By now you should be seeing the reason there was sadness in that
Microsoft meeting. Hardly anything we built at the time other than
the operating system took advantage of multiple cores (the media
technology out of my team being the notable exception per above
remarks for WMA/WMV). Given the declining CPU clock frequency this
meant that we were taking a step backward, not forward.
In a recent forum discussion someone lamented about the slow speed
of their computer while executing the code for a control system
called “Crestron” (an automation system). He was asking what
computer to buy to speed up the task as it was taking many minutes
to run. Suggestions quickly poured in to get this multi-core CPU and
that multi-core CPU.
Having profiled the Crestron tool before and realizing that it was
not multi-threaded, I poured cold water over those suggestions. As
is the nature of forum discussions :), I immediate had a bunch of
folks jumping on me saying that was wrong. To get the point across,
I ran the Crestron tool concurrently with the running of the Windows
standard performance monitoring tool called, “perfmon.” Most people
don’t know about this little gem but it is one of the most useful
instrumentation tools for your PC. In our situation, we use it to
analyze how the CPU is utilized.
First let’s see what it does when I run it on my dual Core Sony
laptop:
Look at the total CPU usage that I have circled. It says 52%.
Rounding down for noise (operating system activity), it says 50% or
half the CPU resources are being used during that period. Since I
have two cores, we have pretty conclusive evidence that this program
is only using one core.
An astute observer would notice that what I just said is not true.
Both cores shown on the graphs to the right appear to show CPU core
activity. The reason for this is that the operating system lets the
two CPU cores fight over the same program and as a result, each gets
to run part of it. Hence the reason both seem busy to some
extent. But the key point remains: there is only enough work for
one CPU due to 50% total usage.
We can confirm our findings further by running the program on an
8-core CPU (really four cores with each consisting of two
mini-cores):
Now the total CPU usage drops to 13%. Multiply that by 8 and what do
you get? Essentially 100% meaning 1/8 of the total CPU
resources and hence, the equivalent of just one core. It is like
having your 8 cylinder engine turn off seven cylinders and operating
using one when running that program!
Run Perfmon on your system and watch its activity as you perform
your everyday tasks or your favorite number crunching program. You
will likely see the dire truth of the situation: most of the time
the system is using the equivalent of one core (if that).
In recently years, Intel and AMD have compensated partially for the
above problem by implementing so called TurboBoost features where if
you are not using more than one core at a time, the clock speed is
allowed to increase. This is an important feature and one to look
for in your future computer purchases.
An opinionated person might ask that since we can only use one core
at a time, why not sell us single core for half the money? Well, not
everything in life is that logical! :)
Back to Articles