Jan 22 2008
Amdahl’s Law
Amdahl’s law is mentioned a lot these days, especially with the multi-processor consoles which are out in the marketplace. I have been thinking a lot about how it actually applies to a typical game tick-loop. The main thing with Amdahl’s law is that it needs to be applied with all of the information or you can get incorrect results.
Amdahl’s law is the following formula:

Where:
F = serial part
N = num cores
For example, if you had an application which was set up as follows:

The typical application of Amdahl’s Law says that if the 6 part can be done in parallel, then we could get the following speed up:
F = 4 / 10 = .4
N = 1
1 / (.4 + (1 - .4) / 1) = 1

So the Time == 10.
N = 2
1 / (.4 + (1 - .4) / 2) = 1.43

So the Time == 7. From Amdahl’s Law: 10 / 1.43 = 7.
N = 3
1 / (.4 + (1 - .4) / 3) = 1.67

So the Time == 6. From Amdahl’s Law: 10 / 1.67 = 6.
However, this isn’t how games normally work. Imagine that the purple section is the rendering phase and there is no dependency on the yellow section. This means that one could do the following:

So the Time == 8.

So the Time == 5.
N = 3
1 / (.2 + (1 - .2) / 3) = 2.14 –> 10 / 2.14 = 4.67
which is not accurate.
With games, there is another complication. It does not make any sense to go faster than 60 fps…and if the game cannot reach 60 fps, then it will run at 30 fps. Let us say that 5 units == 60 fps. In the N == 3 case, we reached 60 fps if the rendering loop is split apart. But let us say that the art director wants “more stuff” and there is no way to reach 60 fps. This means that 10 units == 30 fps. By extending just the rendering portion (in other words, adding more polygons), the loop might look like:

So the Time == 10.
But the amount of “work” we are doing is 2 + 2 + 8 + 8 = 20. Compare this against the 60 fps version when we have 2 + 2 + 3 + 3 = 10. If there was another processor, then the 60 fps version is 2 + 2 + 3 + 3 + 3 = 13, while the 30 fps is 2 + 2 + 8 + 8 + 8 = 18 (another 5 units of work).
The reality is that in most games, the rendering loop can be completely separated from the main tick and you can get:

which results in 2 + 2 + 10 + 10 = 24 units of work. And there are 6 units on the main tick processor which are not doing anything.
Leave a Reply
You must be logged in to post a comment.