Jan 22 2008

Amdahl’s Law

Published by mdanks at 7:49 pm under Code, Games

Amdahl’s law is mentioned a lot these days, especially with the multi-processor consoles which are out in the marketplace. I have been thinking a lot about how it actually applies to a typical game tick-loop. The main thing with Amdahl’s law is that it needs to be applied with all of the information or you can get incorrect results.

Amdahl’s law is the following formula:

 

Where:

F = serial part

N = num cores

For example, if you had an application which was set up as follows:

Amdahl Pic 1

The typical application of Amdahl’s Law says that if the 6 part can be done in parallel, then we could get the following speed up:

F = 4 / 10 = .4

N = 1

1 / (.4 + (1 – .4) / 1) = 1

Amdahl Pic 1

So the Time == 10.

 

N = 2

1 / (.4 + (1 – .4) / 2) = 1.43

Amdahl Pic 2

So the Time == 7. From Amdahl’s Law: 10 / 1.43 = 7.

 

N = 3

1 / (.4 + (1 – .4) / 3) = 1.67

Amdahl Pic 3

So the Time == 6. From Amdahl’s Law: 10 / 1.67 = 6.

However, this isn’t how games normally work. Imagine that the purple section is the rendering phase and there is no dependency on the yellow section. This means that one could do the following:

Amdahl Pic 4

So the Time == 8.

This is obviously worse than what Amdahl’s Law tells us (remember that with N == 2, we could get a result of 7). However, look at what happens with N == 3.

Amdahl Pic 5

So the Time == 5.

With the straight forward use of Amdahl’s Law at N== 3, the result was 6. Technically, this result is actually to be expected, but it is harder to precompute. This is because the yellow portion is part of the parallelizable portion. However, it doesn’t contribute fully against the purple portion. If it did, we would end up with F == .2, which results in:

N = 3

1 / (.2 + (1 – .2) / 3) = 2.14 –> 10 / 2.14 = 4.67

which is not accurate.

With games, there is another complication. It does not make any sense to go faster than 60 fps…and if the game cannot reach 60 fps, then it will run at 30 fps. Let us say that 5 units == 60 fps. In the N == 3 case, we reached 60 fps if the rendering loop is split apart. But let us say that the art director wants “more stuff” and there is no way to reach 60 fps. This means that 10 units == 30 fps. By extending just the rendering portion (in other words, adding more polygons), the loop might look like:

So the Time == 10.

But the amount of “work” we are doing is 2 + 2 + 8 + 8 = 20. Compare this against the 60 fps version when we have 2 + 2 + 3 + 3 = 10. If there was another processor, then the 60 fps version is 2 + 2 + 3 + 3 + 3 = 13, while the 30 fps is 2 + 2 + 8 + 8 + 8 = 18 (another 5 units of work).

The reality is that in most games, the rendering loop can be completely separated from the main tick and you can get:

Amdahl Pic 7

which results in 2 + 2 + 10 + 10 = 24 units of work. And there are 6 units on the main tick processor which are not doing anything.

The main point is that you need to decide what the important parts of your frame are. If your goal is 60 or 30 fps, then you may have “extra” processing time. If you are simply trying to compute things as fast as possible, then you need to make sure that you understand what parts of your app are able to be made parrallel.

 

No responses yet

Trackback URI | Comments RSS

Leave a Reply

You must be logged in to post a comment.