Graduation date: 2007
The amount of instruction level parallelism (ILP) that can be exploited depends
greatly on the size of the instruction window and the number of in-flight instructions
the processor can support. However, this requires a register file with a large set of
physical registers for renaming and multiple ports to provide register accesses to
several instructions at once. The number of registers and ports a register file must
contain will increase as the next generation wide-issue processors take advantage of
more ILP, which will also increase its access time, area, and power dissipation. This
paper proposes a method called Dynamic Register Caching, which uses a small, fast
register cache along with a slow full register file in a single-level configuration, and
splits the porting requirement between the two with each one capable of supplying
values to FUs. This reduces the miss penalty found in previous multi-level schemes to
just the access time of the full register file. The proposed method uses In-Cache bits
and Register-port Select logic to keep track of operands in the register cache and the
availability of free ports on both register files, and a simple instruction steering
mechanism to determine which register file will supply values to instructions. This
technique of dynamically steering instructions requires slightly more logic to
implement, but incurs no additional delay and insures that load balance is a non-issue.
Our study based on SimpleScalar microarchitecture simulation shows that the
proposed scheme provides on average 15~22% improvement in IPC, 47~56%
reduction in total area, and 23~49% reduction in power compared to a monolithic
register file.