User login |
Efficient Exploitation of Multiple Levels of Parallelism for Video Codec Applications975 / Accepted / Finalized / Final evaluation Video applications are becoming very important workload in multiple computing environments, ranging from embedded systems to Internet servers. A new generation of video compression systems have been defined (including codecs like H.264/AVC and VC-1) in order to provide higher levels of video quality and compression efficiency that emerging multimedia applications are demanding. However the coding performance of these new codecs comes at the price of computational complexity. Additionally the proliferation of different video codecs and the constant addition of new extensions requires the use of programmable processors instead of application specific hardware. However traditional processors in embedded and desktop domains are not capable to provide the required performance. SIMD extensions to general purpose (or media) processors were able to provide the required performance for previous video applications by exploiting Data Level Parallelism (DLP). However in order to provide the required performance for the emerging video codec applications new processor architectures are being proposed for exploiting multiple levels of parallelism. These include: fine grain DLP using SIMD ISA extensions, coarse grain DLP by exploiting macroblock, slice and/or frame level parallelism, function level parallelism by performing task level pipelining, and finally transfer level parallelism for overlapping the computations with the memory transfers. For exploiting simultaneously these multiple levels of parallelism it is necessary to change the way in which the video codec applications are programmed and achieve an optimal trade-off between video quality (and bit rate), coding and decoding delay, parallelization overhead, code portability and performance. In order to achieve such a balance there are different issues of architecture and software parallelization that need to be addressed and that are open areas for research: - To perform a detailed analysis of the different parallelization strategies that can be applied for a software implementation of the H.264/AVC codec for emerging architectures like Cell Broadband Engine, Trimedia and others. And to identify the limitations of these architectures for exploiting effectively the different levels of parallelism. And finally, to find the most appropriate programming models from the perspective of the applications and architectures. - To analyze and to implement efficient versions of the most time consuming kernels of the H.264/AVC (motion compensation, motion estimation, DCT, deblocking filter) using SIMD extensions. - Analyze the different alternatives for exploiting the macroblock level parallelism present in the H.264/AVC codec. For this it is necessary to analyze the scheduling of threads, the size and shape of the data partitions, and try to achieve a good trade-off between the data locality and memory transfers. Related to that it is important to analyze the communication overhead related to the dependencies between the partitions, the load balancing and the scalability of this kind of parallelism with the number of threads, processors, and the resolution of the input video. - Analyze the applicability of exploiting the slice and frame level parallelism in the H.264/AVC codec, taking into account the problem of coding and decoding delay and the trade-off between quality and performance. - To study the different strategies for exploiting the function level parallelism in the H.264/AVC codec by performing a task level pipelining. There are some issues related with this approach, like how to achieve a good load balancing and scalability while reducing the communication overhead. - To analyze the impact of the memory hierarchy in the parallelization process and to propose different techniques that can be applied for augmenting the availability of the data and instructions when they are required. The involved researchers will meet to discuss their views on these topics, to share their experiences with different platforms and software implementations of the H.264/AVC codec, and to identify specific topics in which a further collaborative research work can be done. An interesting possibility for sharing the research efforts in this field would be to have an open common software base. A discussion on this possibility would take place on one of the proposed meetings. Research cluster Requested: € 19000 Requested: € 0 The budget covers the travel and daily allowances for 2 meetings (1 in TU Delft and 1 in UPC) for all the participating researchers. And additionally covers the travel and daily allowances for a short visit (3 days) of two of the researchers to the NXP labs. Requested: 12 month(s) VALERO Mateo (UPC) (--member--) RAMIREZ Alex (UPC) (--member--) ALVAREZ Mauricio (UPC) (--phd student--) VASSILIADIS Stamatis (Delft University of Technology) (--member--) JUURLINK Ben (Delft University of Technology) (--colleague--) DURANTON Marc (NXP) (--member--) Roberto R. Osorio, University of Santiago, Spain Roberto Osorio, from U. of Santiago de Compostela will be provided "courtesy" funds to join the cluster meetings in order to become an active HiPEAC participant collaborating in this topic.
|