An answer:

It avoids "pipeline squash". When we branch away, we have already fetched and decoded the word after the branch instruction. If we branch away and don't execute that word, we have to discard the work in progress and start over with the destination word. This loses a cycle because an instruction isn't ready to execute when we're ready for it. If instead we execute an instruction in the delayed branch slot, then we can put the branch instruction one instruction earlier, and avoid the pipeline squash, and still execute one instruction per cycle.


[exam] [CSC 258 additional problems] [main course page]