In this talk I describe several techniques that we developed to support the generation of high quality code for the Cell Broadband Engine, addressing some of its key challenges and advantages. The architecture of the Cell Broadband Engine developed jointly by Sony, Toshiba, and IBM, represents a new direction in processor design. In addition to a PowerPC-compatible Power Processor Element (PPE) the Cell architecture features an array of eight Synergistic Processor Elements (SPEs) supporting a new SIMD instruction set. Each SPE consists of a Synergistic Processor Unit (SPU) and a memory-flow controller (MFC).
Load and store instructions of an SPE access a local store of 256KB private to the SPE. The SPE instruction text itself must reside within its local store as well. DMA operations provided by the MFC enable the SPE to copy data between its local store and main storage. Applications making optimal use of the Cell architecture will comprise of both PPE and SPE components, requiring tool-chain support for working with two different instruction set architectures and ABIs in an integrated fashion.
The techniques we developed and present include compiler-assisted accesses to PPE memory from SPE programs, automatic vectorization enhancements for the SPE, and support for large SPE programs using compiler and linker partitioning coupled with an overlay mechanism. These techniques alleviate the task of programming for Cell by automating the use of SIMD instructions and abstracting the use of DMA transfers between local stores and main memory for both code and data. I will describe how we incorporated these techniques into the GNU Compiler Collection (GCC), GNU ld linker, and runtime libraries.
This is a joint work with Ira Rosen, David Edelsohn, Ben Elliston, Alan Modra, Dorit Nuzman, Ulrich Weigand, and Ayal Zaks.
Lecture slides in PDF format
Back to the Club's homepage