Return Instruction Identiﬁcation in Binary Code with Machine Learning

doi:10.23940/ijpe.19.03.p35.10531060

Int J Performability Eng ›› 2019, Vol. 15 ›› Issue (3): 1053-1060.doi: 10.23940/ijpe.19.03.p35.10531060

Return Instruction Identiﬁcation in Binary Code with Machine Learning

Jing Qiu and Guanglu Sun^*

School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China

Submitted on ; Revised on ;
Contact: sunguanglu@hrbust.edu.cn
About author:Jing Qiu received his B.S. degree from Harbin Institute of Technology in 2005 and his M.S. degree from Harbin University of Science and Technology in 2009. He received his Ph.D. from Harbin Institute of Technology in 2015. He currently works at Harbin University of Science and Technology. His main interests include binary code analysis and binary code deobfuscation.Guanglu Sun received his bachelor's degree, master's degree, and Ph.D. from the School of Computer Science and Technology at Harbin Institute of Technology. He was an assistant researcher at the Post-doctoral Mobile Station in Tsinghua University's Computer Science Department from 2008 to 2011. He was a visiting scholar at Northwestern University from 2014 to 2015. He is currently a professor in the School of Computer Science and Technology and the director of the Center of Information Security and Intelligent Technology at Harbin University of Science and Technology. He is also a senior member of the China Computer Federation and a member of IEEE. His current research interests include computer networks and security, machine learning, and intelligent information processing.

Abstract

Abstract: Binary code analysis is the main method for malware analysis. In this paper, the analysis is started by identifying return instructions to disassemble binary code. The return instruction identification problem is converted into a binary classification problem is a byte in binary code the first byte of a return instruction? The 32 bytes around a byte in binary code are considered the feature of the byte. A multilayer perceptron is employed to build the classification model. Then, the model is trained with 1,383 binaries from Windows XP SP3. The evaluation results on several open sources show that our approach is feasible and has high accuracy.

Key words: return instruction, binary code analysis, machine learning, reverse engineering

Jing Qiu and Guanglu Sun. Return Instruction Identiﬁcation in Binary Code with Machine Learning [J]. Int J Performability Eng, 2019, 15(3): 1053-1060.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

1. N. Rosenblum, X. Zhu, B. P. Miller, and K. Hunt, “Learning to Analyze Binary Computer Code,” in Proceedings of the Twenty-Third Conference on Artificial Intelligence (AAAI-08), Chicago, IL, 2008
2. T. Bao, J. Burket, M. Woo, R. Turner,D. Brumley, “Byteweight: Learning to Recognize Functions in Binary Code,” inProceedings of the 23^rd USENIX Conference on Security Symposium, pp. 845-860, 2014
3. S. Wang, P. Wang,D. Wu, “Semantics-Aware Machine Learning for Function Recognition in Binary Code,” in Proceedings of 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 388-398, 2017
4. J. Qiu, X. Su,P. Ma, “Identifying Functions in Binary Code with Reverse Extended Control Flow Graphs,” Journal of Software: Evolution and Process, Vol. 27, No. 10, pp. 793-820, 2015
5. “Keras: The Python Deep Learning Library,”(https://keras.io, last accessed on January 1, 2019)
6. “Tensorflow: An Open Source Machine Learning Framework for Everyone,”(https://www.tensorflow.org, last accessed on January 1, 2019)
7. “Exeinfo PE: A Packer, Compressor Detector,” (http://exeinfo.atwebpages.com, last accessed on January 1, 2019
8. J. Kinder and H. Veith, “Jakstab: A Static Analysis Platform for Binaries,” inProceedings of International Conference on Computer Aided Verification, pp. 423-427, Springer, 2008
9. “The MASM32 SDK,” (http://www.masm32.net/, last accessed on January 1, 2019
10. M. A Laurenzano, M. M. Tikir, L. Carrington,A. Snavely, Carrington, and A. Snavely, “Pebil: Efficient Static Binary Instrumentation for Linux,” in Proceedings of 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp. 175-183, 2010
11. S. Nanda, W. Li, L. Lam,T. Chiueh, “Bird: Binary Interpretation using Runtime Disassembly,” in Proceedings of the International Symposium on Code Generation and Optimization, pp. 358-370, IEEE Computer Society, 2006
12. B. Schwarz, S. Debray, G. Andrews, and M. Legendre, “PLTO: A Link-time Optimizer for the Intel IA-32 Architecture,” in Proceedings of 2001 Workshop on Binary Translation (WBT-2001), 2001
13. M. Prasad and T. Chiueh, “A Binary Rewriting Defense Against Stack based Buffer Overflow Attacks,” in Proceedings of USENIX Annual Technical Conference, General Track, pp. 211-224, 2003
14. M. Smithson, K. An, A. Kotha, K. Elwazeer, N. Giles,R. Barua, “Binary Rewriting Without Relocation Information,” University of Maryland, Tech. Rep, 2010
15. B. Schwarz, S. Debray,G. Andrews, “Disassembly of Executable Code Revisited,” in Proceedings of 2002 Ninth Working Conference on Reverse Engineering, pp. 45-54, IEEE, 2002
16. R. Wartell, Y. Zhou, K. W Hamlen, M. Kantarcioglu,B. Thuraisingham, “Differentiating Code from Data in x86 Binaries,”Machine Learning and Knowledge Discovery in Databases, pp. 522-536, Springer, 2011
17. C. Kruegel, W. Robertson, F. Valeur,G. Vigna, “Static Disassembly of Obfuscated Binaries,” in Proceedings of the 13th Conference on USENIX Security Symposium, Vol. 13, pp. 18, 2004
18. L. C. Harris and B. P. Miller, “Practical Analysis of Stripped Binary Code,” in Proceedings of Workshop on Binary Instrumentation and Applications (WBIA2005), 2005
19. “The IDA Pro Disassembler and Debugger,”(https://www.hex-rays.com/products/ida/index.shtml, last accessed on January 1, 2019)

[1]	C. Rohith Bhat and Madhusundar Nelson. Artificial Intelligence Based Credit Card Fraud Detection for Online Transactions Optimized with Sparrow Search Algorithm [J]. Int J Performability Eng, 2023, 19(9): 624-632.
[2]	Savita Khurana, Gaurav Sharma, and Bhawna Sharma. Hybrid Machine Learning Model for Load Prediction in Cloud Environment [J]. Int J Performability Eng, 2023, 19(8): 507-515.
[3]	K. Eswara Rao, Bala Murali Pydi, T. Panduranga Vital, P. Annan Naidu, U. D. Prasann, and T. Ravikumar. An Advanced Machine Learning Approach for Student Placement Prediction and Analysis [J]. Int J Performability Eng, 2023, 19(8): 536-546.
[4]	Babaljeet Kaur and Shalli Rani. Are the Customers Receiving Exact Recommendations from the E-Commerce Companies? Towards the Identification of Gray Sheep Users Using Personality Parameters [J]. Int J Performability Eng, 2023, 19(7): 425-433.
[5]	Kshitij Kumar Sinha, Manoj Mathur, and Arun Sharma. Suitability Index Prediction for Residential Apartments Through Machine Learning [J]. Int J Performability Eng, 2023, 19(7): 434-442.
[6]	Manpreet Kaur and Shalli Rani. Recommender System: Towards Identification of Shilling Attacks in Rating System Using Machine Learning Algorithms [J]. Int J Performability Eng, 2023, 19(7): 443-451.
[7]	Srishti Bhugra and Puneet Goswami. Exploratory Review of Machine Learning-Based Software Component Reusability Prediction [J]. Int J Performability Eng, 2023, 19(7): 452-461.
[8]	Harsha Gaikwad, Sanil Gandhi, Arvind Kiwelekar, and Manjushree Laddha. Analyzing Brain Signals for Predicting Students’ Understanding of Online Learning: A Machine Learning Approach [J]. Int J Performability Eng, 2023, 19(7): 462-470.
[9]	Rakesh Kumar, Sunny Arora, Ashima Arya, Neha Kohli, Vaishali Arya, and Ekta Singh. Ensemble Learning for Appraising English Text Readability using Gompertz Function [J]. Int J Performability Eng, 2023, 19(6): 388-396.
[10]	Pranshu Kumar Soni and Leema Nelson. PCP: Profit-Driven Churn Prediction using Machine Learning Techniques in Banking Sector [J]. Int J Performability Eng, 2023, 19(5): 303-311.
[11]	Ramneet Kaur, Deepali Gupta, and Mani Madhukar. Learner-Centric Hybrid Filtering-Based Recommender System for Massive Open Online Courses [J]. Int J Performability Eng, 2023, 19(5): 324-333.
[12]	Mahima Yadav and Ishan Kumar. Image Processing-Based Transliteration from Hindi to English [J]. Int J Performability Eng, 2023, 19(5): 334-341.
[13]	Harshita Batra and Leema Nelson. DCADS: Data-Driven Computer Aided Diagnostic System using Machine Learning Techniques for Polycystic Ovary Syndrome [J]. Int J Performability Eng, 2023, 19(3): 193-202.
[14]	Shobhanam Krishna and Sumati Sidharth. AI-Powered Workforce Analytics: Maximizing Business and Employee Success through Predictive Attrition Modelling [J]. Int J Performability Eng, 2023, 19(3): 203-215.
[15]	Bhagirath, Neetu Mittal, and Sushil Kumar. Impact of Real Time Fraud Prevention on Online Resale Platform using Machine Learning and Device Fingerprint Techniques [J]. Int J Performability Eng, 2023, 19(2): 94-104.

Return Instruction Identiﬁcation in Binary Code with Machine Learning

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended 0