[1] H. H. Fu, J. F. Liao, J. Z. Yang, L. N. Wang, Z. Y. Song, X. M. Huang, et al., “The Sunway TahuLight Supercomputer: System and Applications,” Science China Information Sciences, Vol. 59, No. 7, pp. 1-16, 2016 [2] L. Han, “Research on Consistent Optimization Techniques of Parallel Decomposition for Distributed Memory Architecture,” PLA Information Engineering University, 2008 [3] M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S. W. Liao,M. S. Lam, “Interprocedural Parallelization Analysis in SUIF,” ACM Transactions on Programming Languages and Systems, Vol. 27, No. 4, pp. 662-731, 2005 [4] R. Allen and K. Kennedy, “Optimizing Compilers for Modern Architectures,” Morgan Kaufmann, 2002 [5] R. D. Venkatasubramanyam, “Array Access Analysis in Open64,” University of Houston, 2004 [6] N. Nethercote and J. Seward, “How to Shadow Every Byte of Memory used by a Program,” in Proceedings of the 3rd International Conference on Virtual Execution Environments, pp. 65-74, 2007 [7] Q. Zhao, D. Bruening,S. Amarasinghe, “Umbra: Efficient and Scalable Memory Shadowing,” inProceedings of the 8th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp. 22-31, 2010 [8] D. Berlin and D. Edelsohn, “High-Level Loop Optimizations for GCC,” inProceedings of GCC Developers Summit, pp. 37-54, 2004 [9] L. Y. Zeng, C. Q. Yang,C. Huang, “Analysis and Improvement of the GCC 4.1 Data Dependence Analyzer,” Computer Engineering and Science, Vol. 28, No. 10, pp. 104-106, 2006 [10] Q. S. Zhang, Y. Li,Z. D. Fan, “Automatic Parallelization for Loops Carried Data Dependence Between Iteration,” Journal of Chinese Mini-Micro Computer Systems, Vol. 35, No. 6, pp. 1293-1297, 2014 [11] “Cit: A GCC Plugin for the Analysis and Characterization of Data Dependencies in Parallel Programs,” (http://cas.et.tudelft.nl/ pubs/ Kumar_ DCIS _2013.pdf, last accessed on March 10, 2018 [12] Z. Wang, G. Tournavitis, B. Franke, M. F. P.O'boyle, “Integrating Profile-Driven Parallelism Detection and Machine-Learning-based Mapping,” ACM Transactions on Architecture and Code Optimization, Vol. 11, No. 1, pp. 1-26, 2014 [13] D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, et al., “The NAS Parallel Benchmarks,” International Journal of High Performance Computing Applications, Vol. 5, No. 3, pp. 63-73,1991 [14] J. L. Henning, “SPEC CPU2006 Benchmark Descriptions,” Acm Sigarch Computer Architecture News, Vol. 34, No. 4, pp. 1-17, 2006 [15] P. F. Huang, R. C. Zhao, Y. Yao,J. Zhao, “Parallel Cost Model for Heterogeneous Multi-Core Processors,” Journal of Computer Applications, Vol. 33, No. 6, pp. 1544-1547, 2013 [16] C. H. Liao, “A Compile-Time OpenMP Cost Model,” University of Houston, 2007 [17] Z. J.Guo and H. Liu, “A New Compiler Framework based on Superword Level Parallel,” International Journal of Performability Engineering, Vol. 14, No. 10, pp. 2511-2521, 2018 |