Instruction set simulators are indispensable tools in both ASIP design space exploration and the software development and optimisation process for existing platforms. Despite the recent progress in improving the speed of functional instruction set simulators cycle-accurate simulation is still prohibitively slow for all but the most simple programs. This severely limits the applicability of cycle-accurate simulators in the performance evaluation of complex embedded applications. In the PASTA project we explore a novel approach, namely the prediction of cycle counts based on information gathered during fast functional simulation and prior training. We have evaluated our approach against a cycle-accurate ARM v5 architecture simulator and a set of approx. 300 benchmarks. We demonstrate it is capability of providing highly accurate performance predictions with an average error of less than 5.8% at a fraction of the time for cycle-accurate simulation.
Overview
Cycle-approximate instruction set simulation (ISS) is based on a two-staged approach to constructing a performance predictor: A Training Stage in which a set of training/benchmarking programs are profiled and a later Deployment Stage, where the performance of previously unseen programs is predicted. An illustration of these two stages is shown in the above diagram. During the training stage a set of benchmark applications is executed and profiled on both the cycle-accurate and the functional ISS. For each program the exact cycle count and the values of the various counters maintained in the functional ISS are collected and together they form a single data point (y, x). As soon as this data is available for all programs in the training set the Regression Solver calculates the regression coefficients according to a predefined regression model and stores them for later use. On entering the deployment stage the cycle-accurate simulator is not used any more, but all simulations are performed by the functional simulator. As before, this simulator is used to generate a characteristic profile, i.e. the vector x, of the program under examination. The regression model with the previously calculated coefficients is then used as a predictor and evaluated at the point x, resulting in the predicted cycle count y for the new program.
Results
The selection of an appropriate regression model (linear, polynomial,
various non-linear functions etc.) is a design parameter and has a
critical impact on how well the regression function can describe the
observed data. Our hypothesis is that the cycle count is a
near-linear function in the various counters maintained by the ISS. In
order to test this hypothesis we compare the predicted cycle counts
y* with observed cycle counts y performing regression
over the entire data set (all programs and all counters except the
observed cycle count).
The above graph shows the close match between the observed and
calculated cycle counts. The data points are concentrated near the
ideal straight line with only a very few exceptional outliers,
especially at the lower end of the scale. We have calculated the
residuals and have found an average error of 4.5%.
A more detailed breakdown of the distribution of errors is shown in
the second graph, where the relative error frequency is plotted in a
histogram against the percentage error interval. This diagram shows
that the vast majority of programs can be described with a very small
error. In fact, for about 50% of all programs the error is less than
1%. For 75% of all programs the error is less than 8% and only three
programs have an error of 15%. The maximum error, however, is
relatively large with a value of 26.1%.
Publications
- B. Franke
Fast Cycle-Approximate Instruction Set Simulation
Proceedings of the Workshop on Software & Compilers for Embedded Systems (SCOPES 2008), March 2008, Munich, Germany. - D. Powell and B. Franke
Using Continuous Statistical Machine Learning to Enable High-Speed Performance Prediction in Hybrid Instruction-/Cycle-Accurate Instruction Set Simulators
Proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES-ISSS), October 2010, Grenoble, France.