Download PDFOpen PDF in browserUCMoH: a Unified Learning-Based Cost Model for Tensorized Program Tuning in Heterogeneous Acceleration ClustersEasyChair Preprint 139754 pages•Date: July 15, 2024AbstractTensorized programs use various hardware intrinsics on heterogeneous accelerators to improve tensor computation performance. The wave of hardware customization introduces massive hardware accelerators and intrinsics, prompting deep learning compilers(DLCs) to explore tensorized program tuning to effectively leverage these hardware intrinsics. At the core of program tuning relies the design of the cost model, but currently there is still a lack of cost models specifically designed for tensorized programs, which severely hampers the co-optimization of DLCs and heterogeneous accelerators. To the best of our knowledge, we propose the first unified cost model for tensorized program tuning by introducing a unified feature representation and unified transfer prediction strategy. To meet training and testing requirements, we constructed a dataset dedicated to tensorized program tuning. UCMoH significantly improves adaptability to diverse execution environments and enables flexible transfer prediction while ensuring high accuracy. We will apply UCMoH to the tuning framework in the heterogeneous acceleration cluster. Keyphrases: Heterogeneous Acceleration Cluster, Tensorized Program Tuning, cost model, lifelong learning
|