School of Informatics - University of Edinburgh Institute for Computing Systems Architecture - School of Informatics
Institute for Computing
Systems Architecture
EnCore Processors - Codename Calton and Codename Castle

The EnCore microprocessor family is a configurable and extendable implementation of the ARCompact® instruction-set architecture. It is designed to fulfill the following aims and objectives:

  • Low-complexity and low gate-count design.
  • Highest operating frequency in its class.
  • Lowest possible dynamic energy consumption. EnCore has a target of 99% of flip-flops automatically clock-gated using typical Cadence and Synopsys synthesis tools.
  • Best-in-class CPI (cycles-per-instruction), with most non-memory operations achieving single-cycle latency, and no more than one load-delay slot.
  • Minimized branch penalties, resulting in an overall Dhrystone performance of at least 1.4 DMIPS / MHz.
  • Easy configurability of cache architectures and support for either 16 or 32 GPRs. These features have the most significant impact on both performance and die area for processors in this class.
  • Clean design, easily maintained and extended.

These design objectives are addressed through the use of a relatively short pipeline, comprising four main stages plus a fifth stage devoted solely to the write-back of results. The design supports separate instruction and data caches, each of which can be configured in terms of size, associativity and block size.

The design of any processor involves trade-offs between logical complexity and CPI, as each CPI-improving technique requires the investment of additional logic. There is also a trade-off between CPI and operating frequency, as CPI improvements often lengthen critical paths. EnCore achieves a comparatively high operating frequency of around 375 MHz in a standard TSMC 0.13 µm G process. This requires streamlined control logic particularly in areas such as instruction alignment, zero-overhead loop management, and the handling of complex but infrequent data cache events. It also relies, where possible, on the sharing of data-path elements for multiple purposes. This helps to minimize logical complexity, and yields gate counts as low as 20-24 kgates for a typically configured core.

Facts and Figures

Characteristic Unit 130nm
(G)
90nm
(G)
65nm
(LP)
Die Area* sq.mm 0.15 0.07 0.04
Fmax (worst case) MHz 250 350 400
Dynamic Power** mW/MHz 0.07 0.02 0.01
Static Power mW 0.46 0.99 0.02
CPU Power at Fmax** mW 17.96 7.99 4.02
AA Battery CPU runtime at Fmax days 2.8 6.3 12.5
* CPU Die Area excludes Cache RAMs
** Power figures include 8KB I-Cache & D-Cache RAMs
in silicon in design projection

Presentations on EnCore

Slides from an invited talk on EnCore at the HiPEAC Industrial Workshop in Wroclaw, Poland, on 26th October 2009: