Superscalar
Previous: Branch Prediction
In the previous dicussions, it was assumed that each stage in the pipeline
takes the same amount of time as all other stages. However, in reality this
is not the case. Some stages will take longer than others. Just as an example,
assume that the Fetch, Decode, and Save stages each take 1 cycle to complete,
but that the Execute stage takes 2 cycles to complete. If this were the case,
then it would take 8 cycles (not 5) to complete an instruction using
pipelining. The reason is that the Execute stage would slow down all the
other stages. The other stages couldn't get the next instruction until the
Execute unit was ready to get the next instruction. In a pipeline, the
time spent in each stage is the same for all stages, and is determined by
the slowest stage.
Cycles |
Fetch
1 cycle |
Decode
1 cycle |
Execute
2 cycles |
Save
1 cycle |
1 |
Inst 1 |
|
|
|
2 |
Wasted |
|
|
|
3 |
Inst 2 |
Inst 1 |
|
|
4 |
|
Wasted |
|
|
5 |
|
Inst 2 |
Inst 1 |
|
6 |
|
|
|
7 |
|
|
Inst 2 |
Inst 1 |
8 |
|
|
Wasted |
9 |
|
|
|
Inst 2 |
|
Eventhough Fetch, Decode, and Save each take only 1 cycle to complete,
they must wait for Execute to complete before moving to the next instruction.
It would take 8 cycles to complete the first instruction, and then an instruction
would complete every 2 cycles after that. |
Cycles |
Fetch
1 cycle |
Decode
1 cycle |
Execute 1
2 cycles |
Execute 2
2 cycles |
Save
1 cycle |
1 |
Inst 1 |
|
|
|
|
2 |
Inst 2 |
Inst 1 |
|
|
3 |
Inst 3 |
Inst 2 |
Inst 1 |
|
4 |
Inst 4 |
Inst 3 |
Inst 2 |
|
5 |
|
Inst 4 |
Inst 3 |
Inst 1 |
6 |
|
|
Inst 4 |
Inst 2 |
7 |
|
|
|
Inst 3 |
8 |
|
|
|
Inst 4 |
|
If there were now two separate execution units that could work independently
of each other, then it would be possible to speed up the pipeline. Each execution
unit takes two cycles to complete, but the units can be overlapped. All the
odd instructions would wind up in execution unit 1, and all the even instructions
would wind up in execution unit 2. It now takes 5 cycles to complete the
first instruction, and then an instruction completes every cycle after that. |
Previous: Branch Prediction