Instruction set simulators are indispensable tools in both
ASIP design space exploration and the software development
and optimisation process for existing platforms.
A functional simulator is a focal point of the tool-flow
for embedded systems and ASIC design, acting as a
Golden Reference model for the complete system.
Hence, a functional simulator has three primary uses, each
with distinct and sometimes conflicting requirements,
as illustrated below:
To meet these three requirements, the PASTA project developed a
functional simulator that has a high-speed JIT compilation
capability, a cycle-accurate modeling capability and yet
maintains a precise model of the architectural state of the
processor it simulates. It can therefore be used as a back-end
target for a debugger, to assist in software development, as
well as providing a Golden Reference Model to our
co-simulation environment,
and providing detailed cycle counts and other performance measurements.
The use of just-in-time (JIT) dynamic binary translation (DBT)
techniques allows us to create very high speed functional simulators
capable of simulating an embedded system at speeds approaching
(or even exceeding) real time. The simulator developed within this
activity is used extensively by the PASTA team, particularly
for co-simulation and the
development of the CoSy compiler.
JIT Translation
The simulator operates by interpreting and profiling the target code over a short period of time, called an epoch. At the end of each epoch, frequently executed blocks of target code are translated to the host architecture, using an optimising compiler backend. The result of dynamic binary translation can be seen in the example below, which shows a single basic block (from the Linux kernel):Below we see the sequence of host instructions created when the target sequence is translated.
Results
When operating in JIT mode, using individual basic blocks as
the translation unit, we see simulation rates in the range
260-730 native MIPS. The measurements shown below were taken
on an Intel Xeon 5160 3.0 GHz server with 32KB I/D caches
and 4MB L2 cache per dual-core CPU.
The simulator has persistent translations, allowing it to
learn how to speed up the simulation of each
application by keeping useful translations from one run to
the next. The chart below shows how the simulation rate,
during the booting of a Linux kernel, increases over a
sequence of 7 runs.
Interestingly, the speed of the simulator on a high-end Xeon server
will typically be 4 times greater than the speed of a full implementation
of the EnCore processor when running in an FPGA, and will be comparable
to the real-time speed of a silicon implementation.
Publications
- N.P. Topham and D. Jones
High Speed CPU Simulation using JIT Binary Translation
Proceedings of the 3rd Annual Workshop on Modeling, Benchmarking and Simulation, held in conjunction with ISCA-34, San Diego CA, June 2007. - D. Jones and N.P. Topham
High Speed CPU Simulation using LTU Dynamic Binary Translation
Proceedings of the 4th International HiPEAC Conference, Paphos, Cyprus, Jan. 25028, 2009. LNCS 5409, Springer 2009, ISBN 978-3-540-92989-5.