PhD Research Summary


Research Problem

Current embedded systems for multimedia applications like mobile and hand-held devices, are typically battery operated. Therefore, low energy is one of the key design goals of such systems. Many such systems often rely on Very Long Instruction Word (VLIW) Application Specific Instruction Set Procssors (ASIPs). However, power analysis of such processors indicate that a significant amount of power is consumed in the instruction caches. Loop buffering or L0 buffering is an effective scheme to reduce energy consumption in the instruction memory hierarchy. In a typical multimedia application, significant amount of execution time is spent in small program segments. Hence, by storing them in a small L0 buffer instead of the big instruction cache, energy can be reduced.

While the reduction by L0 buffering is substantial, further optimizations are still necessary to ensure high energy efficiency in the future processors. With optimizations applied on different aspects of the processor like the datapath, register files and data memory hierarchy, the overall processor energy reduces. However the instruction memory energy, including the L0 buffer, is bound to increase or remain substantial. Of the two main contributors of energy consumption in the instruction memory hierarchy, the L0 buffers are the main bottleneck.

Approach

Our approach to solve this problem is to incorporate a fully clustered (distributed) instruction memory hierarchy. L0 buffers are partitioned and each partition is grouped with certain functional units in the datapath to form L0 clusters. Similarly, L1 (instruction) cache is partitioned and grouped with certain L0 clusters to form L1 clusters. Each cluster has its own local controller which enables a cluster to operate autonomously to a certain extent.

Various aspects of this memory hierachy are being investigated. Namely,

  • The basic operation of the hierarchy
  • Automatic generation or formation of L0 (L1) clusters
  • Compiler Scheduling for L0 (L1) clusters
  • Synchronicity between L0 clusters and datapath (partitioned register files) clusters
  • Support for execution of multiple loops in parallel (Simultaneous Loop Threading)

Associated tools

People I worked with

  • Advisors
    • Henk Corporaal, TU Eindhoven, The Netherlands
    • Francky Catthoor, IMEC vzw, Leuven, Belgium
    • Geert Deconinck, K.U.Leuven, Belgium
  • Fellow PhD Students
    • Tom Vander Aa
    • Francisco Barat

Related Publications: Journals & Book Chapter

Related Publications: Conferences & Workshops

Related Projects

  • MESA2 under MEDEA+ Program (European Project)
  • Artist: Network of Exellence