Superscalar

Previous: Branch Prediction

In the previous dicussions, it was assumed that each stage in the pipeline takes the same amount of time as all other stages. However, in reality this is not the case. Some stages will take longer than others. Just as an example, assume that the Fetch, Decode, and Save stages each take 1 cycle to complete, but that the Execute stage takes 2 cycles to complete. If this were the case, then it would take 8 cycles (not 5) to complete an instruction using pipelining. The reason is that the Execute stage would slow down all the other stages. The other stages couldn't get the next instruction until the Execute unit was ready to get the next instruction. In a pipeline, the time spent in each stage is the same for all stages, and is determined by the slowest stage.
Cycles Fetch
1 cycle
Decode
1 cycle
Execute
2 cycles
Save
1 cycle
1 Inst 1
2 Wasted
3 Inst 2 Inst 1
4 Wasted
5 Inst 2 Inst 1
6
7 Inst 2 Inst 1
8 Wasted
9 Inst 2
Eventhough Fetch, Decode, and Save each take only 1 cycle to complete, they must wait for Execute to complete before moving to the next instruction. It would take 8 cycles to complete the first instruction, and then an instruction would complete every 2 cycles after that.
Cycles Fetch
1 cycle
Decode
1 cycle
Execute 1
2 cycles
Execute 2
2 cycles
Save
1 cycle
1 Inst 1
2 Inst 2 Inst 1
3 Inst 3 Inst 2 Inst 1
4 Inst 4 Inst 3 Inst 2
5 Inst 4 Inst 3 Inst 1
6 Inst 4 Inst 2
7 Inst 3
8 Inst 4
If there were now two separate execution units that could work independently of each other, then it would be possible to speed up the pipeline. Each execution unit takes two cycles to complete, but the units can be overlapped. All the odd instructions would wind up in execution unit 1, and all the even instructions would wind up in execution unit 2. It now takes 5 cycles to complete the first instruction, and then an instruction completes every cycle after that.

Previous: Branch Prediction