Research: High-Speed Simulation

Instruction set simulators are indispensable tools in both ASIP design space exploration and the software development and optimisation process for existing platforms. A functional simulator is a focal point of the tool-flow for embedded systems and ASIC design, acting as a Golden Reference model for the complete system. Hence, a functional simulator has three primary uses, each with distinct and sometimes conflicting requirements, as illustrated below:
High Speed Simulation - Goals
To meet these three requirements, the PASTA project developed a functional simulator that has a high-speed JIT compilation capability, a cycle-accurate modeling capability and yet maintains a precise model of the architectural state of the processor it simulates. It can therefore be used as a back-end target for a debugger, to assist in software development, as well as providing a Golden Reference Model to our co-simulation environment, and providing detailed cycle counts and other performance measurements.
The use of just-in-time (JIT) dynamic binary translation (DBT) techniques allows us to create very high speed functional simulators capable of simulating an embedded system at speeds approaching (or even exceeding) real time. The simulator developed within this activity is used extensively by the PASTA team, particularly for co-simulation and the development of the CoSy compiler.

JIT Translation

The simulator operates by interpreting and profiling the target code over a short period of time, called an epoch. At the end of each epoch, frequently executed blocks of target code are translated to the host architecture, using an optimising compiler backend. The result of dynamic binary translation can be seen in the example below, which shows a single basic block (from the Linux kernel):

Below we see the sequence of host instructions created when the target sequence is translated.

Results

When operating in JIT mode, using individual basic blocks as the translation unit, we see simulation rates in the range 260-730 native MIPS. The measurements shown below were taken on an Intel Xeon 5160 3.0 GHz server with 32KB I/D caches and 4MB L2 cache per dual-core CPU.
The simulator has persistent translations, allowing it to learn how to speed up the simulation of each application by keeping useful translations from one run to the next. The chart below shows how the simulation rate, during the booting of a Linux kernel, increases over a sequence of 7 runs.

Interestingly, the speed of the simulator on a high-end Xeon server will typically be 4 times greater than the speed of a full implementation of the EnCore processor when running in an FPGA, and will be comparable to the real-time speed of a silicon implementation.

Publications

N.P. Topham and D. Jones
High Speed CPU Simulation using JIT Binary Translation
Proceedings of the 3rd Annual Workshop on Modeling, Benchmarking and Simulation, held in conjunction with ISCA-34, San Diego CA, June 2007.
D. Jones and N.P. Topham
High Speed CPU Simulation using LTU Dynamic Binary Translation
Proceedings of the 4th International HiPEAC Conference, Paphos, Cyprus, Jan. 25028, 2009. LNCS 5409, Springer 2009, ISBN 978-3-540-92989-5.

Simulation

High-Speed Simulation using JIT Compilation

JIT Translation

Results

Publications

PASTA Activities

EnCore Tools

HW Systems

EnCore Processor

M.Sc. and UG Projects

Research Areas