

















|                                  | Loop                                                                  | Unrolling (a                                                                                                                 | issume i                                                  | no pipe                                                    | elining)                                                                                                        | ۲   |
|----------------------------------|-----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------|------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|-----|
| Loop:                            | L.D<br>ADD.D<br>S.D<br>DADDU<br>L.D<br>ADD.D<br>S.D<br>DADDU<br>BNE   | F0, 0(R1)<br>F4, F0, F2<br>F4, 0(R1)<br>IR1, R1, #-8<br>F0, 0(R1)<br>F4, F0, F2<br>F4, 0(R1)<br>IR1, R1, #-8<br>R1, R2, Loop | Loop:                                                     | L.D<br>L.D<br>ADD.D<br>ADD.D<br>DADDU<br>S.D<br>S.D<br>BNE | F0, 0(R1)<br>F6, -8(R1)<br>F4, F0, F2<br>F8, F6, F2<br>IR1, R1, #-16<br>F4, 16(R1)<br>F8, 8(R1)<br>R1, R2, Loop |     |
| Save                             | e 0.5 instru                                                          | uction per iteration                                                                                                         | Sa                                                        | ave 1 inst                                                 | ruction per iteration                                                                                           |     |
| • Nee<br>• Car<br>• Wha<br>• Not | ed to wor<br>n reorder<br>at limits t<br>e that loc<br>for $i = x(i)$ | ry about bounda<br>the statements<br>he number of tin<br>p iterations were<br>1, 100<br>= $x(i) + c$                         | ry cases (s<br>if we use ac<br>nes we unro<br>e independo | trip minir<br>Iditional<br>oll a loop<br>ent               | ng??)<br>registers.<br>?<br>(1                                                                                  | 10) |





























| Examp       | ole:    |           |      |       |         |           |            |         |       | ۲           |
|-------------|---------|-----------|------|-------|---------|-----------|------------|---------|-------|-------------|
|             | Instruc | tion      |      | Issue | Read of | pp. E     | xec. Com   | pleted  | Write | result      |
|             | L.D     | F6, 34(F  | 2)   | х     | Х       |           | х          |         |       | x           |
| Instruction | L.D     | F2, 45(F  | (3)  | Х     | Х       |           | Х          |         |       |             |
| statue      | MUL.D   | F0, F2, F | 4    | Х     |         |           |            |         |       |             |
| Sidius      | SUB.D   | F8, F6, F | 2    | Х     |         |           |            |         |       |             |
|             | DIV.D   | F10, F0,  | F6   | Х     |         |           |            |         |       |             |
|             | ADD.D   | F6, F8, F | 2    |       |         |           |            |         |       |             |
|             | Linit   | Buev      | On   | Ei    | Ci      | Εk        | Oi         | Ok      | Di    | Dk          |
|             | Unit    | Busy      | - Op | 50    | 1]      | IK        | Qj         | QK      |       |             |
|             | Integer | res       | Load | F2    | R3      | - 4       |            |         | res   |             |
| Func. unit  | Mult1   | Yes       | Mult | F0    | F2      | ⊦4        | Int.       |         | No    | Yes         |
| status      | Mult2   | No        |      |       |         |           |            |         |       |             |
|             | Add     | Yes       | Sub  | F8    | F6      | F2        |            | Int.    | Yes   | No          |
|             | divide  | Yes       | Div  | F10   | F0      | F6        | Mult1      |         | No    | Yes         |
| Deviator    |         |           |      | Γ4    | ГС      | <u>го</u> | <b>F10</b> | <b></b> |       | <b>F</b> 20 |
| Register    | Funa II | FU        | FZ   | F4    | FO      | F0        | FIU        | FIZ     |       | F30         |
| status      | Func. U | IVIUIt1   | int. |       |         | Add       | DIV        |         |       |             |
|             |         |           |      |       |         |           |            |         |       | (25)        |

|               | Instruc | tion      |      | Issue | Read o | p. Ex | kec. Com | pleted | Write | result |
|---------------|---------|-----------|------|-------|--------|-------|----------|--------|-------|--------|
|               | L.D     | F6, 34(R  | 2)   | Х     | Х      | •     | Х        | •      | >     | X      |
| la chu chi ca | L.D     | F2, 45(R  | (3)  | х     | х      |       | х        |        | )     | x      |
| Instruction   | MUL.D   | F0, F2, F | 4    | х     |        |       |          |        |       |        |
| status        | SUB.D   | F8, F6, F | 2    | Х     |        |       |          |        |       |        |
|               | DIV.D   | F10, F0,  | F6   | Х     |        |       |          |        |       |        |
|               | ADD.D   | F4, F8, F | 2    |       |        |       |          |        |       |        |
|               | Unit    | Busy      | Op   | Fi    | Fi     | Fk    | Qi       | Qk     | Ri    | <br>Rk |
|               | Integer | Yes       | Load | F2    | R3     |       | ,        |        | Yes   |        |
| Func. unit    | Mult1   | Yes       | Mult | F0    | F2     | F4    |          |        | Yes   | Yes    |
| status        | Mult2   | No        |      |       |        |       |          |        |       |        |
|               | Add     | Yes       | Sub  | F8    | F6     | F2    |          |        | Yes   | Yes    |
|               | divide  | Yes       | Div  | F10   | F0     | F6    | Mult1    |        | No    | Yes    |
| Register      |         | F0        | F2   | F4    | F6     | F8    | F10      | F12    |       | F30    |
| status        | Func. U | Mult1     |      |       |        | Add   | Div      |        |       |        |

|             | Instruc | tion      |      | Issue | Read op | . E | xec. Com | pleted | Write | result |
|-------------|---------|-----------|------|-------|---------|-----|----------|--------|-------|--------|
|             | L.D     | F6, 34(F  | R2)  | х     | х       |     | Х        |        | >     | <      |
| Instruction | L.D     | F2, 45(F  | R3)  | х     | х       |     | Х        |        | >     | <      |
| instruction | MUL.D   | F0, F2, F | -4   | х     | Х       |     | Х        |        |       |        |
| Slalus      | SUB.D   | F8, F6, F | 2    | Х     | Х       |     | Х        |        | >     | <      |
|             | DIV.D   | F10, F0,  | F6   | Х     |         |     |          |        |       |        |
|             | ADD.D   | F4, F8, F | 2    | Х     | Х       |     | X        |        |       |        |
|             | Unit    | Busy      | Ор   | Fi    | Fj      | Fk  | Qj       | Qk     | Rj    | Rk     |
|             | Integer | No        |      |       |         |     |          |        |       |        |
| Func. unit  | Mult1   | Yes       | Mult | F0    | F2      | F4  |          |        | Yes   | Yes    |
| status      | Mult2   | No        |      |       |         |     |          |        |       |        |
|             | Add     | Yes       | add  | F4    | F8      | F2  |          |        | Yes   | Yes    |
|             | divide  | Yes       | Div  | F10   | F0      | F6  | Mult1    |        | No    | Yes    |
| Register    |         | F0        | F2   | F4    | F6      | F8  | F10      | F12    |       | F30    |
| status      | FU      | Mult1     |      | Add   |         |     | Div      |        |       |        |









| Three stages of control                                                                                                       | ۲          |
|-------------------------------------------------------------------------------------------------------------------------------|------------|
| • Issue                                                                                                                       |            |
| <ul> <li>If a reservation station is available for the needed functional unit</li> <li>read ready operands</li> </ul>         |            |
| <ul> <li>for operands that are not ready, rename the register to the reservatior<br/>station that will produce it,</li> </ul> | l          |
| <ul> <li>Store/load operands are issued if a buffer (reservation station) is available</li> </ul>                             | <b>)</b> . |
| Execution.                                                                                                                    |            |
| <ul> <li>Monitor the CDB for the operand that is not ready,</li> </ul>                                                        |            |
| <ul> <li>When both operands are available, execute.</li> </ul>                                                                |            |
| <ul> <li>If more than one station per unit, only one unit can start execution.</li> </ul>                                     |            |
| <ul> <li>Do not start execution before previous branches have completed.</li> </ul>                                           |            |
| • Write result.                                                                                                               |            |
| <ul> <li>Write to CDB (and to registers) may be a structural hazards if only one<br/>CDB bus.</li> </ul>                      |            |
| <ul> <li>Make the reservation station (the functional unit) available.</li> <li>(32)</li> </ul>                               |            |



Write result Instruction Execute Issue L.D F6, 34(R2) Х Х Х F2, 45(R3) Х Х L.D Х MUL.D F0, F2, F4 SUB.D F8, F2, F6 Х DIV.D F10, F0, F6 Х ADD.D F6, F8, F2 Х Name Ор Vj Vk Qj Qk Α Busy Load1 no Load2 Load 45+Reg[R3] Y Y Add1 Sub Mem[34+Reg[R2]] Load2 Add2 Y Add Add1 Load2 Add3 no Mul Reg[F4] Load2 Mult1 Υ Υ Mult2 Div Mem[34+Reg[R2]] Mult1 F0 F2 F4 F6 F8 F10 F12 F30 Qi Mult1 load2 Add2 Add1 Mult2 (34)

| Instr | uction      |     | lssu    | e         | Exe  | ecute    |     | Write | e res | ult |     |
|-------|-------------|-----|---------|-----------|------|----------|-----|-------|-------|-----|-----|
| L.D   | F6, 34(F    | R2) | Х       |           |      | Х        |     |       | х     |     |     |
| L.D   | F2, 45(F    | R3) | Х       |           |      | Х        |     |       | x     |     |     |
| MUL.I | D F0, F2, I | =4  | Х       |           |      | Х        |     |       |       |     |     |
| SUB.I | D F8, F2, F | -6  | Х       |           |      | Х        |     |       | Х     |     |     |
| DIV.D | F10, F0,    | F6  | Х       |           |      |          |     |       |       |     |     |
| ADD.I | D F6, F8, I | -2  | X       |           |      | X        |     |       | Х     |     |     |
| Name  | Busy        | Ор  | ,       | /j        |      | Vk       |     | Qj    | Q     | k   | Α   |
| _oad1 | no          |     |         |           |      |          |     |       |       |     |     |
| _oad2 | no          |     |         |           |      |          |     |       |       |     |     |
| Add1  | no          |     |         |           |      |          |     |       |       |     |     |
| Add2  | no          |     |         |           |      |          |     |       |       |     |     |
| Add3  | no          |     |         |           |      |          |     |       |       |     |     |
| Mult1 | Y           | Mul | Mem[45- | +Reg[R3]] | F    | Reg[F4]  |     |       |       |     |     |
| Mult2 | Y           | Div |         |           | Mem[ | 34+Reg[R | 2]] | Mult1 |       |     |     |
|       | F0          | F2  | F4      | F6        | F8   | F10      | F12 | 2     |       |     | F30 |
| Qi    | Mult1       |     |         |           |      | Mult2    |     |       |       |     |     |
|       |             |     |         |           |      |          |     |       |       | ()  | E)  |



| Instru | ction   |     | Issue     |    | Exe     | cute | Writ  | e re | sult      |
|--------|---------|-----|-----------|----|---------|------|-------|------|-----------|
| L.D    | F0, 0(R | 1)  | Х         |    |         | х    |       |      |           |
| MUL.D  | F4, F0, | F2  | Х         |    |         |      |       |      |           |
| S.D    | F4, 0(R | 1)  | Х         |    |         |      |       |      |           |
| L.D    | F0, 8(R | 1)  | Х         |    |         | Х    |       |      |           |
| MUL.D  | F4, F0, | F2  | Х         |    |         |      |       |      |           |
| S.D    | F4, 8(R | 1)  | Х         |    |         |      |       |      |           |
| Name   | Busy    | Ор  | Vj        |    | Vk      | Qj   | Qk    |      | Α         |
| Load1  | У       | ld  |           |    |         |      |       | I    | Reg[R1]+0 |
| Load2  | У       | ld  |           |    |         |      |       | I    | Reg[R1]+8 |
| store1 | У       | sd  | Reg[R1]+0 |    |         |      | Mult1 |      |           |
| store2 | у       | sd  | Reg[R1]+8 |    |         |      | Mult2 |      |           |
| Add    | no      |     |           |    |         |      |       |      |           |
| Mult1  | Y       | Mul |           |    | Reg[F2] | Load | 1     |      |           |
| Mult2  | Y       | Mul |           |    | Reg[F2] | Load | 2     |      |           |
|        | F0      | F2  | F4        | F6 | F8      | F10  | F12   |      | F30       |
| Qi     | Load2   |     | store 2   |    |         |      |       |      |           |
|        |         |     |           |    |         |      |       |      | (37)      |







## Example:

- Assume that MUL.D just entered the ROB
- MUL.D is at the head of the ROB ready to commit
- SUB.D and ADD.D have been in the ROB before MUL.D

0

- DIV.D is in RES station waiting for the result of MUL.D

| Instruc | ction       | Issued | Execute | In ROB | committed |
|---------|-------------|--------|---------|--------|-----------|
| L.D     | F6, 34(R2)  | Х      | х       | Х      | x         |
| L.D     | F2, 45(R3)  | Х      | Х       | Х      | x         |
| MUL.D   | F0, F2, F4  | Х      | Х       | Х      |           |
| SUB.D   | F8, F2, F6  | Х      | х       | Х      |           |
| DIV.D   | F10, F0, F6 | Х      | х       |        |           |
| ADD.D   | F6, F8, F2  | х      | х       | Х      |           |

|          |    |     |     | Res     | ervatio  | n station:  | s status |           |       |       | C     |
|----------|----|-----|-----|---------|----------|-------------|----------|-----------|-------|-------|-------|
| Name     | Βι | isy | Ор  | V       | /j       |             | Vk       | Qj        | Qk    | Dest. | A     |
| Load1    | n  | 10  |     |         |          |             |          |           |       |       |       |
| Load2    | n  | 0   |     |         |          |             |          |           |       |       |       |
| Add1     | `  | Y   | sub | Mem[45+ | Reg[R3]] | Mem[34      | 4+Reg[R2 | ]]        |       | ROB4  |       |
| Add2     | `  | Y   | add |         |          | Mem[4       | 5+Reg[R3 | ]] ROB4   | ŀ     | ROB6  |       |
| Add3     | n  | 0   |     |         |          |             |          |           |       |       |       |
| Mult1    | `  | Y   | Mul | Mem[45+ | Reg[R3]] | Re          | eg[F4]   |           |       | ROB3  |       |
| Mult2    | `  | Y   | Div |         |          | Mem[34      | 4+Reg[R2 | ]] ROB3   | 3     | ROB5  |       |
|          |    | Na  | me  | Busy    | Instru   | uction      | State    |           | Dest. | value | <br>; |
|          |    | RO  | 31  | no      | L.D      | F6, 34(R2)  | Commi    | t         | F6    | XXX   |       |
| ROB      |    | RO  | 32  | no      | L.D      | F2, 45(R3)  | Commi    | t         | F2    | xxx   |       |
| status   |    | RO  | 33  | yes     | MUL.D    | F0, F2, F4  | Write re | esult     | F0    | ххх   |       |
|          |    | RO  | 34  | yes     | SUB.D    | F8, F2, F6  | Write re | esult     | F8    | ххх   |       |
|          |    | RO  | 35  | yes     | DIV.D    | F10, F0, F6 | Issued/  | executing | F10   |       |       |
|          |    | RO  | 36  | yes     | ADD.D    | F6, F8, F2  | Write re | esult     | F6    | XXX   |       |
| Register | r  |     | F0  | F2      | F4       | F6          | F8       | F10       | F12   | F30   | 0     |
| status   |    | Qi  | ROB | 3       |          | ROB6        | ROB4     | ROB5      |       |       |       |
|          |    |     |     |         |          |             |          |           |       | (42)  |       |





| Common name                  | lssue<br>structure | Hazard<br>detection   | Scheduling               | Distinguishing<br>characteristic                                          | Examples                                                                             |
|------------------------------|--------------------|-----------------------|--------------------------|---------------------------------------------------------------------------|--------------------------------------------------------------------------------------|
| Superscalar<br>(static)      | Dynamic            | Hardware              | Static                   | In-order execution                                                        | Mostly in the<br>embedded space:<br>MIPS and ARM,<br>including the ARM<br>Coretex A8 |
| Superscalar<br>(dynamic)     | Dynamic            | Hardware              | Dynamic                  | Some out-of-order<br>execution, but no<br>speculation                     | None at the present                                                                  |
| Superscalar<br>(speculative) | Dynamic            | Hardware              | Dynamic with speculation | Out-of-order execution with speculation                                   | Intel Core i3, i5, i7;<br>AMD Phenom; IBM<br>Power 7                                 |
| VLIW/LIW                     | Static             | Primarily<br>software | Static                   | All hazards determined<br>and indicated by compiler<br>(often implicitly) | Most examples are in<br>signal processing,<br>such as the TI C6x                     |
| EPIC                         | Primarily static   | Primarily<br>software | Mostly static            | All hazards determined<br>and indicated explicitly<br>by the compiler     | Itanium                                                                              |









|                                                                                                                                               | Loop                                                                                                       | Unrolling                                                                  | in VLIM                                     | 1                                                      | Č                                               |
|-----------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|---------------------------------------------|--------------------------------------------------------|-------------------------------------------------|
| Memory<br>reference 1                                                                                                                         | Memory<br>reference 2                                                                                      | FP<br>operation 1                                                          | FP<br>op. 2                                 | Int. op/<br>branch                                     | Clock                                           |
| L.D F0,0(R1)<br>L.D F10,-16(R1)<br>L.D F18,-32(R1)<br>L.D F26,-48(R1)<br>S.D F4,0(R1)<br>S.D F12,-16(R1)<br>S.D F20,-32(R1)<br>S.D F28,-0(R1) | L.D F6,-8(R1)<br>L.D F14,-24(R1)<br>L.D F22,-40(R1)<br>S.D F8,-8(R1)<br>S.D F16,-24(R1)<br>S.D F24,-40(R1) | ADD.D F4,F0,F2<br>ADD.D F12,F10,F2<br>ADD.D F20,F18,F2<br>ADD.D F28,F26,F2 | ADD.D F8,F6,<br>ADD.D F16,F<br>ADD.D F24,F2 | F2<br>14,F2<br>22,F2<br>DADDI R1,R1,-4<br>BNEZ R1,Loop | 1<br>2<br>3<br>4<br>5<br>6<br>7<br>48<br>8<br>9 |
| <ul> <li>Unrolled 7</li> <li>7 results in</li> <li>Average: 2</li> <li>Note: Need</li> </ul>                                                  | times<br>9 clocks, or<br>2.5 ops per clo<br>d more registo                                                 | 1.3 clocks per it<br>ock, 50% efficie<br>ers in VLIW (15                   | eration.<br>ncy<br>vs. 11 in SS             | 5)                                                     |                                                 |







| Consid    | der san   | ne exampl    | e (as last              | t slide)            |                  |                         |                 |
|-----------|-----------|--------------|-------------------------|---------------------|------------------|-------------------------|-----------------|
| Iteration | Instructi | on           | Issued at               | Executes            | Mem access       | Write CDB               | comments        |
| 1         | L.D       | F0, 0(R1)    | 1                       | 2 —                 | → 3 <u> </u>     | → 4                     | First issue     |
| 1         | ADD.D     | F4, F0, F2   | 1                       | 5 🗲                 |                  | 8                       | Wait for L.D    |
| 1         | S.D       | F4, 0(R1)    | 2                       | 3                   | 9 🕇              |                         | Wait for ADD.D  |
| 1         | DADDIL    | J R1, R1, -8 | 2                       | 3 —                 |                  | 4                       |                 |
| 1         | BNE       | R1, R2, L1   | 3                       | 5 🔶                 |                  |                         | wait for DADDIU |
| 2         | L.D       | F0, 0(R1)    | <i>∕</i> 4 <sup>3</sup> | ×6 4                | × 5              | <i>1</i> 8 <sup>6</sup> | No wait for BNE |
| 2         | ADD.D     | F4, F0, F2   | 4                       | <b>∕</b> 9 7        |                  | 12 10                   | Wait for L.D    |
| 2         | S.D       | F4, 0(R1)    | <i>/</i> 5 4            | <i>7</i> <b>√</b> 5 | A3 10            |                         | Wait for ADD.D  |
| 2         | DADDIL    | J R1, R1, -8 | 5                       | 6                   |                  | 7                       |                 |
| 2         | BNE       | R1, R2, L1   | <i>∕</i> 6 5            | <b>∕8</b> 6         |                  |                         | wait for DADDIU |
| 3         | L.D       | F0, 0(R1)    | 16                      | ∕g 7                | 10 <sup>8</sup>  | 11 9                    | No wait for BNE |
| 3         | ADD.D     | F4, F0, F2   | <b>π</b> 6              | ,12 10              |                  | <i>1</i> 5 13           | Wait for L.D    |
| 3         | S.D       | F4, 0(R1)    | <i>×</i> 87             | <i>1</i> 0 8        | 16 <sup>14</sup> |                         | Wait for ADD.D  |
| 3         | DADDIL    | J R1, R1, -8 | ×87                     | ⁄9 8                |                  | 70 <sup>9</sup>         |                 |
| 3         | BNE       | R1, R2, L1   | ∕g 8                    | 11 <sup>9</sup>     |                  |                         | wait for DADDIU |

























