Jul 07 2008

Latency and bandwidth

Published by mdanks at 10:45 pm under Code

A common discussion in computer architecture is bandwidth versus latency. Each is important, but it is critical to understand the tradeoffs for each.

Imagine a garden hose. Connected to the hose is a hot and cold water faucet. As a user, you can turn on more hot or cold water. Then, at some time later in the future, the water coming out of the end of the hose is that new temperature.

  • Bandwidth is how wide the hose is.
  • Latency is how long the hose is.

With bandwidth, it is comparable to how much water is flowing through the hose at any one moment. The bigger the hose is around, the more water you have. This is usually represented in computers as the processor power (such as FLOPS – Floating Point Operations Per Second).

With latency, it is comparable to how long you have to wait to have a temperature change occur on the output of the hose. If you add more hot water, the length of the hose determines how long it is until you feel the temperature change on the output.

Obviously, a perfect computer architecture would have high bandwidth and low latency. The hose would have a massive flow of water and the temperature would change as soon as you touched the faucet. However, there is a usually a tradeoff between bandwidth and latency. In most systems today, there seems to be a tendency for high bandwidth with high latency. As long as you do not need the result of an operation quickly, then you can do tremendous amounts of processing.

Ignoring out-of-order execution and branch prediction, take the following snippet of code:

float valC = valA * valB;
if (valC > 10.f)
{
    valD *= .1f;
}
doSomething(valD);

In this case, because the branch is dependent on the result of valC, the code is going to be bottlenecked by the latency. The branch cannot be evaluated until the result of valA * valB is known. If a multiply takes 10 cycles, then the CPU will stall for 10 cycles doing nothing waiting for the result. However, if you can “do work” while waiting for the result, then you can hide the latency and valA * valB will only appear to take 1 cycle:

float valC = valA * valB;
float otherC = otherB + 1.f;
float otherD = otherB * otherA;
// etc...do more work here...
if (valC > 10.f)
{
    valD *= .1f;
}
doSomething(valD);

This code will get more work done in just as much time as the other snippet because the latency is hidden.

No responses yet

Trackback URI | Comments RSS

Leave a Reply

You must be logged in to post a comment.