School of Informatics - University of Edinburgh Institute for Computing Systems Architecture - School of Informatics
Institute for Computing
Systems Architecture

Most of the work involving automatic instruction set extension generation just inserts assembly into the source-code at the exact point that an extension instruction was found. This is sufficient for the case where a new processor is being created to run a single, stable piece of code. There are, however, several scenarios where this isn't sufficent, for example:

  • A general purpose processor is required that will have to be specialised for serveral disjoint tasks.
  • The software has been updated after the specialised processor has been manufactured.
  • A specialised processor must be used to perform a new or additional task to what it was original designed for but the extension hardware must be used to achieve acceptable performance.

Thus we have been creating a compiler that can take a description of a set of extension programs and attempt to use them with any program, regardless of its similarity to the program used to generate the extensions.

Our methodology is based on extending GCC. As extension instructions are much larger and complex than standard RISC instructions a graph-based instruction selection method instead of a standard tree-based approach. An example of an extension instruction, generated from an FFT benchmark, is shown below.

An example instruction with 4-inputs and 2-outputs.

To exploit these instructions we used a graph-sub-graph isomorphism checker to find where these instructions may be used inside every basic block. Such as in the example below where a simpler instruction is used in a very small basic block.

An example instruction that has been matched to a small basic block.

Generally it is necessary to use multiple extension instructions in critical basic blocks, which also requires a sensible register allocation assignment. An example basic block using multiple extension instructions is shown below.

The shape of a basic block that has been mapped to four separate
  extension instructions, two of which are entirely parallel and another two have
  internal dependencies.