PowerPC 603: A power aware processor

Microprocessor overview

 

The PowerPC 603 microprocessor is the low-power version of the PowerPC family, but it is a high-performance processor, which is comparable in performance to present-day high-end personal computer and workstation processors. The PowerPC is designed for high-performance and low power.

 

The design team for the PowerPC used the sample traces generated by RISC System/6000 machines for evaluating trade-offs, and employed a formal VLSI design methodology.  The architecture only took 18 months from design to working silicon.

 

The PowerPC 603 is composed of five execution units: branch, integer, floating-point, load/store, and system register; and a pair of on-chip 8KB instruction and data caches.

The pipeline contained four stages.

 

Fetch: Since the 603 is a super-scalar microprocessor, it is able to issue more than one instruction per cycle to the execution units. The instruction fetch unit retrieves two instructions from the instruction cache per cycle and places them in a six-entry instruction queue.  Branch instructions are not put in the instruction queue, but rather are forwarded to a special branch processing unit for resolution.

 

Decode/Source: In dispatch, the dispatch unit reads the bottom two entries of the instruction queue (or alternately the special branch resolution queue), then reads available source operands, and finally allocates renaming registers and sends instructions to the corresponding execution units.  Register renaming helps avoid stalls on write-after-write and write-after-read hazards. If the dispatch unit finds an execution unit busy, it does not dispatch the instruction but stalls. The instructions use the reservation stations for each execution unit to avoid stalls due to operand dependencies. Reservation stations hold the instruction until all operands are available. This will not affect the subsequent instructions to be dispatched to other execution units. The branch unit decodes instructions upstream from those being decoded by the dispatcher.  Therefore for taken branches, the instruction queue usually contains enough instructions to keep the dispatcher busy until the new instruction stream is fetched.

 

Execute: The execution units execute instructions and write results to destination registers. If a given result is the source operand of another instruction, they use forwarding. When any instruction finishes, it notifies the completion buffer. Exceptions result in their instruction being tagged.  Detailed execution unit behavior is discussed below:

 

  1. The branch unit executes most branch instructions in a single clock. It has its own facilities to compute the branch target address. PowerPC executes conditional branches using the value of the counter-register (CTR) and any one of 32 condition register (CR) bits.  If the CTR value is unavailable, the branch unit and instruction fetch are stalled. If a CR bit is unavailable, the branch unit predicts either branch taken or branch not taken paths depending on the bits in the branch opcode. These predicted instructions are tagged as speculative and proceed down the pipeline normally. The unavailable CR bit is checked every cycle. When the CR bit becomes available if the branch was mispredicted the branch unit flushes all speculative tagged instructions from the pipeline. Otherwise, if the branch was predicted correctly, speculative tags are cleared.
  2. The integer unit processes integer arithmetic, logical, and bit-field instructions. 
  3. The load/store unit handles load and store instructions to and from both the integer and floating-point registers. It contains a dedicated adder to calculate effective addresses. The load/store unit is fully pipelined, so the load can be dispatched at the rate of one per clock cycle, but since the store needs to check for memory protection violation, stores cannot be executed in a fully pipelined manner.
  4. The floating-point unit is fully pipelined and supports denormalized numbers in hardware.
  5. The system register unit processes conditional-register-logical instructions.

 

Completion: The completion buffer tracks instruction execution. The completion buffer logic writes the contents of any renamed registers into the architectural registers and deallocates renamed registers for future use. If the completion buffer logic detects an exception, it will flush the pipeline and initiate exception handling. Otherwise, it removes completed instructions from the completion buffer. Since the completion logic removes all instructions in program order, exceptions are fully precise.

 

Memory System

 

 The PowerPC 603 has two 8KB, 2-way set associative, 32-byte per line on-chip caches, one for instructions and one for data. The data cache uses copy-back or write-through strategies. Both caches use a least recently used (LRU) replacement policy. On a cache hit, the instruction cache provides two instructions per cycle, and the data cache can provide up to one double-word of data to the load/store unit per cycle. On a cache miss, cache clocks are filled in the burst fill, which is performed as a “critical-double-word-first” operation; the critical double word is written to the cache and forwarded to the required unit simultaneously, thus minimizing stalls due to the cache fill latency.

For address translation, the 603 provides two four-entry, fully associative block address translation registers and 2-way set associative translation look-aside buffers (TLBs) for instructions and data. A hash function is used for TLB replacement. When a TLB miss happens, the processor raises an exception to call a procedure named the “tablewalk” handler. The handler walks through the page table to load the necessary page table entry into the TLB.

 

Bus Interface

 

The 603’s bus interface unit (BIU) receives requests for bus operations from the instruction and data caches. Memory accesses can occur in single-beat (1 to 8 bytes) and four-beat burst (32 bytes) data transfers when the bus is configured as 64 bits. The BIU provides address queues, and prioritization and bus control logic. It is composed of separate 32-bit address and 64-bit data buses. Also, the 603 supports address pipelining, where the address tenure of a new transaction is allowed to begin before the data tenure of the current transaction has completed.  The 603 bus interface also supports split transactions, where the address and data tenures can be controlled by different masters. The 603 bus protocol supports address retry which is used by snooping masters to interrupt other masters’ transactions on the bus given multiple processes. 

 

Debug Features

 

The 603 has a JTAG/IEEE boundary scan interface to make the board-level testing convenient and a special interface to allow the external processor to access the memory or registers. Pipeline status information can be displayed to track the instruction stream in real time by setting a special mode. The instruction address breakpoint is provided for helping with software debugging.

 

We were expecting the microprocessors with low power consumption to also have high performance.  The PowerPC 603 microprocessor was designed to realize the ideal of low power consumption.  It allows the system designer to control energy consumption through both hardware and software means and provides automatic internal power management.

 

Two outstanding features of PowerPC 603 Power Management (PM) are its dynamic power management (DPM) and its static power states.  Dynamic power management is used to power up and down the individual execution units depending on the instruction stream.  For example, if no floating-pint instructions are being executed, the floating-point unit is automatically powered down.  Each execution unit has an independent clock input, so that stopping the clock to an execution unit will eliminate its power consumption.  DPM is enabled by setting a bit in an internal  hardware register.  Static power states designate four distinct modes: Full On, Doze, Nap, and Sleep.  The PowerPC 603 provides an interrupt for power management: System Management Interrupt (SMI), and a decrement timer to determine the Nap or Doze mode. Each mode has it own predefined amount of time to act as a trigger. Using the decrementer interrupt (DI), the corresponding mode will return first to Full On mode and then switch to the other mode.

 


 

PM mode

Functioning units

Activation method

Full on Wake up method

Full on

(with DPM)

Function units are clocked only when needed.

 

 

Doze

Most function units disabled, bus snooping and decrementer still enabled.

Control by software

External asynchronous interrupts

Decrementer interrupts

 

Reset

Nap

Most function units disabled, including bus snooping. decrementer still enabled.

Control by software

And hardware

External asynchronous interrupts

Decrementer interrupts

 

Reset

Sleep

All function units disabled, including bus snooping and decrementer.

Control by software

And hardware

 

External synchronous interrupts

 

Reset

   

 

We know that there are four kinds of power management mode, how can the PM mode change from one to another?

 

The PM mode transactions are controlled either by the operating system (OS) or by the chipset hardware.  These two control methods are slightly different.  When the OS startup process begins to run, one of the power-saving modes is selected by setting the appropriate power management mode bits. The proper power management mode is selected according to whether snooping and decrementer are needed. If neither are needed, then Sleep mode is selected; if only decrementer is needed Nap mode is selected; if snooping is also needed, Doze mode is chosen. After setting the proper power management bit, the OS creates an idle process, setting it to run at the lowest priority. When all other processes are idle, the process is executed to place the 603 in the selected mode. When the decrementer or an external timer expire and interrupt the processor, the 603 switches to Full On mode and processes the interrupt routine. The OS scheduler is called by the routine to update the priorities.