User login |
Enhancing CellSs programming model: tailoring compilation flags through non-numerical applicationsWith the appearance of new multicore processor architectures, there is a need for new programming paradigms, especially for heterogeneous devices such as the Cell Broadband Engine™ (Cell/B.E.) processor. CellSs [SC06, IBM_RD07] is a programming model that addresses the automatic exploitation of functional parallelism from a sequential application with annotations. The focus is on the flexibility and simplicity of the programming model. Although the concept and programming model are general enough to be extended to other devices, the CellSs implementation considered in this cluster proposal has been tailored to the Cell/B.E. device (a more generic implementation called SMPSs is also available for SMPs or homogeneous multicore processors). CellSs is distributed as Open Source, under GPL/LPGL license and it is downlodable from: http://www.bsc.es/plantillaH.php?cat_id=372 The approach is based in a source to source compiler, which given a sequential application in C with CellSs annotations generates two different C file. The first file corresponds to the main program of the application, and should be compiled with a PPE compiler to generate a PPE object. The second file corresponds to the code that will be executed under request of the main program in the SPEs. This file must be compiled with an SPE compiler to obtain a SPE object, that will be linked with the SPE libraries to obtain a SPE executable. The CellSs environment currently uses for these purposes the GNU Compiler Collection (GCC) which provides compilers both for the Cell/B.E. PPE and SPE processor cores. The experience that we have on the Cell/B.E. programming indicates that it is of paramount importance to optimize the code to be run in the SPEs. These optimizations include loop unrolling, vectorization, pre-fetching of data from memory to local registers, etc. In the cluster we will focus in the optimization of the low level code generator for the SPEs. The code generated by the CellSs translator has to be ultimately fed to a compiler on whose ability to generate optimized instruction schedules for the SPE depends the actual performance achieved by the program. The main issues that will be addressed potentially include vectorization and efficient handling of branches, according to the requirements observed from porting the relevant applications to Cell. It is foreseen to use the Expert System for Tuning Optimizations (ESTO). ESTO automates the process of identifying and tuning the optimal set of optimization options used when generating performance-optimized programs. A typical use is for tuning, for maximum performance, the optimization options used by a compiler when creating an application. ESTO is currently geared toward tuning GCC and Post-Link Optimization for Linux® on POWER®. The objective is to both analyse the impact of compilation flags in the master and worker code (CellSs runtime) and in the user code (specially vectorization flags for the tasks run in the SPEs). The outcome of the cluster can be on one hand, a better set of flags for compilation of the CellSs runtime and a set of recommendations for the better use of GCC compilation flags from CellSs users. According to recent studies, non-numerical applications will be of paramount importance in future multicore chips. For this reason, this cluster proposal also includes the development of non-numerical applications that will demonstrate the programmability of CellSs and will be used as benchmarks to check the compilation optimization options. One of the applications to port to CellSs is Paramedir, which is a non graphical version of Paraver, a browser for traced performance data developed and distributed by BSC. Paramedir accepts the same traces and configuration files as Paraver, but instead of graphically displaying a view or table it just emits the computed metrics to a text file. Internally Paramedir organizes the trace records in lists, some of them as consecutive elements in memory some of them with pointers. Although most of the traversals are sequential, it will be necessary to investigate the most appropriate data layouts and possible blocking structures. A typical computation pattern is to accumulate metrics on large matrices with one row per thread. This is thus an application with intensive data handling and limited by memory bandwidth. The irregular access patterns and list traversals is a challenge for the Cell/B.E. characteristics thus making this application a very important test case representative of many other applications that take larger amount of cycles in laptops. Besides, we plan to port or develop from scratch bioinformatic applications, specially Molecular Dynamics applications. These applications will be provided by the Life-Science department at BSC or taken from those available as Open Source, such as NAMD or GROMACS. [SC06] Pieter Bellens, Josep M. Perez, Rosa M. Badia and Jesus Labarta. CellSs: A Programming Model for the Cell BE Architecture . in proceedings of the ACM/IEEE SC 2006 Conference, , November 2006. Research cluster Requested: € 52560 Granted: € 30000 Requested: € 28800 Granted: € 14400 The cluster proposal includes two fellowships of 14,400 Euros each for to PhD students and 1,200 Euros for the registration of each of the students to the PhD program. Travel: we plan to have meetings every 3 months, possibly in conjunction 4 people traveling x 4 meetings/year x 1,000 euros/trip = 16,000 Euros Requested: 12 month(s) Granted: 0 month(s), starting on: Thu, September 27, 2007 MENDELSON Bilha (IBM) (--member--) VALERO Mateo (UPC) (--member--) AYGUADE Eduard (Barcelona Supercomputing Center) (--member--) BADIA Rosa M. (Barcelona Supercomputing Center) (--member--) ZAKS Ayal (IBM) (--member--) Jesus Labarta, and two PhD students: Pieter Bellens + a second one to identify.
|