School of Informatics - University of Edinburgh Institute for Computing Systems Architecture - School of Informatics
Institute for Computing
Systems Architecture

Compilation for Dual Memory Banks

Dual memory banks provide extra memory bandwidth to DSP applications and enable simultaneous access to two operands if the data is partitioned appropriately. Fully automated and compiler integrated approaches to data partitioning and memory bank assignment have, however, found little acceptance by DSP software developers. In the PASTA project we have investigated a novel source-level approach that is more programmer friendly. Our scheme is based on soft graph coloring and highly adaptive heuristics generated by genetic programming. We have evaluated our scheme on an Analog Devices TigerSHARC TS-101 DSP and achieved speedups of up to 57% on 13 UTDSP benchmarks.

Overview

Digital signal processors are domain specific microprocessors optimised for embedded digital signal processing applications. The demand for high performance, low power and low cost has led to the development of specialised architectures with many non-standard features exposed to the programmer. With the recent trend towards more complex signal processing algorithms and applications high-level programming languages, in particular C, have become a viable alternative to the predominant assembly coding of earlier days. This, however, comes at the price of efficiency when compared to hand-coded approaches.

Optimising compiler technology has played a key role in enabling high-level programming for DSPs. Many of the newly developed approaches to code generation for specialised DSP instructions, DSP specific code optimisation and instruction scheduling have transitioned out of the research labs and into product development and production.

The situation, however, is different with compiling techniques targeting one of the most distinctive DSP features: dual memory banks. Designed to enable the simultaneous fetch of two operands of, e.g., a multiply-accumulate operation they require careful partitioning and mapping of the data to unfold their full potential. While the dual memory bank concept has found active interest in the academic community this work does not seem to have found its way into production compilers. Instead, DSP specific language extensions of the ISO C language such as DSP-C and Embedded C that shift the responsibility for data partitioning and mapping to the programmer are widely embraced by industry. We believe this is partly due to the fact that fully automated and compiler integrated approaches to memory bank assignment ignore that programmers require control over the mapping of certain variables, for example, used for I/O buffering and tied to a specific bank. Additionally, programmers would frequently like to specify a partial mapping to achieve a certain effect on particular regions of code, and leave the rest to the compiler. To our knowledge, none of the previously published memory bank assignment schemes allows for this level of interaction.

Here we follow a different approach, namely explicit memory bank assignment as a source-level transformation operating on ISO C as input language and generating output in DSP-C. Next to its inherent portability the advantage of this high-level approach is the ease with which manual pre-assignment of variables, i.e. coercing them into a specific user-directed bank, can be accomplished. On the other hand, a high-level approach like the one presented in this paper needs to address the difficulty of having to cope with "unpredictable" later code optimisation and generation stages that may interact with the earlier bank assignment.

Results

Fast Cycle Approximate Simulation - Overview
We compared our soft colouring technique against a standard ILP colourer. The results of this are shown in the figure above, where the ranges of both the ILP and the soft colouring results are shown. Here we see that despite not being guaranteed to solve the colouring model optimally soft colouring does just as well as the ILP colourer, finding almost exactly the same range of results. However, the range is still quite wide so we attempt to constrain the assignments by adding the additional automatic node. The automatic node does shorten the range of solutions for both ILP and soft colouring, but not in the same way. For ILP colouring mostly good results are eliminated, for soft colouring mostly bad results are eliminated. Also, the automatic node allows soft colouring to always find the truly optimal solution for fir_256_64 and to find better solutions than the ILP colourer for lpc and spectral.

Publications