Low power micro-architecture techniques based on loop instructions reuse


Since power consumption became a fundamental aspect not only in system integration but also in system reliability, research work has been focused to reduce energy in the front end of the processors. All this research has in common the fact that it takes advantage of loop iterations to reuse the work previously performed at fetch, decode, or both levels. The proposed techniques are effective, for example, in embedded applications, which are characterized by spending a large fraction of the execution time on small program loops. Some of them depend on instruction profiling for static identification and preloading of identified loops [1], while others employ hardware for dynamic identification of simple [2] or more complex [3] loops. More recently, a mechanism was proposed to utilize the Reorder Buffer (ROB) as a storage location for previously decoded instructions [4]. Power is saved by gating off the fetch and the decode logic by directly feeding instructions from the ROB into the rename and issue stages.

In the present work, we propose to efficiently exploit dynamic micro-architecture techniques to reuse loop instructions in order to decrease energy consumption. The main idea behind these techniques is the usage of simple data structures associated to loop caches or to the ROB, that allows a dynamic storage of information about the loops. Such data structures can then be exploited to guide the loop execution. Even in the ROB case, this information will allow to execute several iterations of a loop without performing loop unrolling. Moreover, further instruction level parallelism (ILP) can be exploited by dispatching more loop instructions in parallel, for example, by packing and dispatching them in parallel as long instruction words. The architectural improvements will be simulated on the Simplescalar, also using the Wattch simulator version and the CACTI. The proposed control and memory structures will be also described in VHDL and synthesized for an actual CMOS technology, in order to evaluate the cost and the maximum operating frequency.

The proposed work will result from the collaboration between TUDelft and TULisbon, requiring some financial support for: the stay of a PhD Portuguese student in Delft for a period of about 6 months; and for a travel and a short stay of a Portuguese senior researcher in Delft. The total estimated financial support needed from HiPEAC is about 9000€.

[1] - A. Ross, S. Cotterell, F. Vahid, Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example, IEEE Computer Architecture Letters,
Volume 1 , Issue 1, 2002.
[2] – B. Moyer, J. Arends, Instruction fetch energy reduction using loop caches for embedded applications with small tight loops, International Symposium on Low Power Electronics and Design, 1999, 267-269.
[3] - J. Rivers, S. Asaad, J. Wellman, J. Moreno, Reducing Instruction Fetch Energy with Backwards Branch Control Information and Buffering, Symposium on Low Power Electronics and Design, 2003, pp.322-325.
[4] C. Yang and A. Orailoglu, Power-efficient Instruction Delivery through Trace Reuse,
Conference on Parallel Architectures and Compilation Techniques, 2006, pp.192-201.


Research cluster

Requested: € 9000

Requested: € 0

A PhD student will visit Delft for a period of about 6 months; and for a travel and a short stay of a Portuguese senior researcher in Delft. The total estimated financial support needed from HiPEAC is about 9000€.


Requested: 12 month(s)

GAYDADJIEV Georgi (Delft University of Technology) (--member--)
KAXIRAS Stefanos (University of Patras) (--member--)

Leonel Sousa, INESC, Portugal