Username   Password       Forgot your password?  Forgot your username? 

 

Reducing Energy Cost of Multi-Threaded Programs on NUMA Architectures

Volume 14, Number 6, June 2018, pp. 1201-1212
DOI: 10.23940/ijpe.18.06.p11.12011212

Hao Fanga, Liang Zhub, and Xiangyu Lia

aSchool of Computer Science, Wuhan Donghu University, Wuhan, 430212, China
bChina Ship Development and Design Center, Wuhan, 430064, China

(Submitted on March 6, 2018; Revised on April 12, 2018; Accepted on May 26, 2018)

Abstract:

Many recent data center servers are built with NUMA (Non-Uniform Memory Access) characteristics. Accessing remote memory generally takes longer time than accessing local memory. There are a lot of research works that discuss the performance improvement of NUMA multi-core systems. However, rare research work considers reducing the energy cost of NUMA multi-core systems. This work studies reducing energy cost of multi-threaded programs on NUMA architectures using DVFS (Dynamic Voltage and Frequency Scaling) adjustment strategy. We consider three factors of the multi-threaded programs which influence the energy saved by our DVFS adjustment strategy. These three factors are: (1) the memory access intensity of parallel programs; (2) the proportion of remote memory access; (3) the ratio between remote and local memory access latency. In addition, we propose two DVFS adjustment strategies to save the energy cost of multi-threaded programs. The energy-saving effect of these two DVFS adjustment strategies is influenced by these three factors. Two DVFS adjustment strategies can save maximally 20% and 39.2% of total energy when considering one factor and 33.3%, 48.1% of total energy when considering two factors, respectively.

 

References: 28

        1. A. E. Abdallah, E. E. Abdallah, F. Hanandeh, A. Aljammal and E. Al-Daoud, “Power Aware Ant Colony Routing Algorithm for Mobile Ad-hoc Networks,” International Journal of Software Engineering and Its Applications, vol.9, no.12, pp.197-212, 2015
        2. S. Albers, “Energy-Efficient Algorithms,” Communications of the ACM, vol.53, no.5, pp.86-96, 2010
        3. E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang, “The Design of OpenMP Tasks,” IEEE Transactions on Parallel and Distributed Systems, vol.20, no.3, pp.404-418, 2009
        4. A. Bhattacharjee and M. Martonosi, “Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors,” In ACM SIGARCH Computer Architecture News, vol.37, pp.290-301, 2009
        5. C. Bienia, S. Kumar, J. P. Singh, and K. Li, “The Parsec Benchmark Suite: Characterization and Architectural Implications,” Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), New York, NY, USA, pp.72-81, 2008
        6. H. R. Boveiri, “An Efficient Task Priority Measurement for List-Scheduling in Multiprocessor Environments,” International Journal of Software Engineering and Its Applications, vol.9, no.5, pp.233-246, 2015
        7. S. Blagodurov, S. Zhuravlev, A. Fedorova, and A. Kamali, “A Case for NUMA-Aware Contention Management on Multicore Systems,” Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT), New York, NY, USA, pp.557-558, 2010
        8. Q. Cai, J. González, R. Rakvic, G. Magklis, P. Chaparro, and A. González, “Meeting Points: Using Thread Criticality to Adapt Multicore Hardware to Parallel Regions,” Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT), New York, NY, USA, pp.240-249, 2008
        9. M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth, “Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems,” Proceedings of the 18th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), New York, NY, USA, pp.381-394, 2013
        10. T. Dey, W. Wang, J. Davidson, and M. Soffa, “Characterizing Multi-threaded Applications Based on Shared-Resource Contention,” Proceedings of the 2011 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp.76-86, 2011
        11. D. Hackenberg, D. Molka, and W. E. Nagel, “Comparing Cache Architectures and Coherency Protocols on x86-64 Multicore SMP Systems,” Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), New York, NY, USA, pp.413-422, 2009
        12. Y. S. Hwang and J. M. Jeong, “Design of a Multi-Threaded Image Signal Processor with a Multi-Bank Cache Memory,” International Journal of Software Engineering and Its Applications, vol.10, no.9, pp.1-8, 2016
        13. V. A. Korthikanti and G. Agha, “Energy-Performance Trade-off Analysis of Parallel Algorithms,” USENIX Workshop on Hot Topics in Parallelism, 2010
        14. N. B. Lakshminarayana, J. Lee, and H. Kim, “Age Based Scheduling for Asymmetric Multiprocessors,” Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC), New York, NY, USA, pp.25:1-25:12, 2009
        15. Lin Y, Zhu X, Zheng Z, et al. “The individual identification method of wireless device based on dimensionality reduction and machine learning”. Journal of Supercomputing, No.5, pp.1-18(2017).
        16. Lin Y, Wang C, Ma C, et al: “A new combination method for multisensor conflict information,” Journal of Supercomputing, Vol.72, No.7, pp. 2874-2890 (2016)
        17. Yun Lin, Chao Wang, Jiaxing Wang, Zheng Dou. “A Novel Dynamic Spectrum Access Framework Based on Reinforcement Learning for Cognitive Radio Sensor Networks”. Sensors, Vol.16, No.10, pp. 1-22(2016).
        18. Q. Lu, C. Alias, U. Bondhugula, T. Henretty, S. Krishnamoorthy, J. Ramanujam, A. Rountev, P. Sadayappan, Y. Chen, H. Lin, and T.-F. Ngai, “Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors,” Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp.348-357, 2009
        19. Z. Majo and T. R. Gross, “Memory Management in NUMA Multicore Systems: Trapped Between Cache Contention and Interconnect Overhead,” Proceedings of the 2011 International Symposium on Memory Management (ISMM), New York, NY, USA, pp.11-20, 2011
        20. C. McCurdy and J. Vetter, “Memphis: Finding and Fixing NUMA-Related Performance Problems on Multi-core Platforms,” Proceedings of the International Symposium on Performance Analysis of Systems & Software (ISPASS), pp.87-96, 2010
        21. M. N. A. Rahman, A. F. A. Nasir, N. Mat and A. R. Mamat, “Image Segmentation using OpenMP and its Application in Plant Species Classification,” International Journal of Software Engineering and Its Applications, vol.9, no.5, pp.135-144, 2015
        22. Shi C, Dou Z, Lin Y, et al. “Dynamic threshold-setting for RF-powered cognitive radio networks in non-Gaussian noise”. Physical Communication, Vol. 27, No. 4, pp. 99-105, 2018
        23. C. Su, D. Li, D. S. Nikolopoulos, M. Grove, K. Cameron, and B. R. De Supinski, “Critical Path-Based Thread Placement for NUMA Systems,” ACM SIGMETRICS Performance Evaluation Review, vol.40, no.2, pp.106-112, 2012
        24. D. Tam, R. Azimi, and M. Stumm, “Thread Clustering: Sharing-Aware Scheduling on SMP-CMP-SMT Multiprocessors,” ACM SIGOPS Operating Systems Review, vol.41, no.3, pp.47-58, 2007
        25. V. Thind, B. Pandey, K. Kalia, T. Das and T. Kumar, “FPGA Based Low Power DES Algorithm Design and Implementation using HTML Technology,” International Journal of Software Engineering and Its Applications, vol.10, no.6, pp.81-92, 2016
        26. Wu Q, Li Y, Lin Y: “The application of nonlocal total variation in image denoising for mobile transmission,” Multimedia Tools & Applications, Vol.76, No.16, pp. 1-13 (2016)
        27. J. Yao, “Numatop: A Tool for Memory Access Locality Characterization and Analysis,” Intel Open Source Technology Center, 2013
        28. S. Zhuravlev, S. Blagodurov, and A. Fedorova, “Addressing Shared Resource Contention in Multicore Processors via Scheduling,” ACM SIGARCH Computer Architecture News, vol.38, no.1, pp.129-142, 2010

               

              Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

              Attachments:
              Download this file (IJPE-2018-06-11.pdf)IJPE-2018-06-11.pdf[Reducing Energy Cost of Multi-Threaded Programs on NUMA Architectures]594 Kb
               
              This site uses encryption for transmitting your passwords. ratmilwebsolutions.com