Username   Password       Forgot your password?  Forgot your username? 


Return Instruction Identification in Binary Code with Machine Learning

Volume 15, Number 3, March 2019, pp. 1053-1060
DOI: 10.23940/ijpe.19.03.p35.10531060

Jing Qiu and Guanglu Sun

School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China

(Submitted on November 12, 2018; Revised on December 15, 2018; Accepted on January 16, 2019)


Binary code analysis is the main method for malware analysis. In this paper, the analysis is started by identifying return instructions to disassemble binary code. The return instruction identification problem is converted into a binary classification problem: is a byte in binary code the first byte of a return instruction? The 32 bytes around a byte in binary code are considered the feature of the byte. A multilayer perceptron is employed to build the classification model. Then, the model is trained with 1,383 binaries from Windows XP SP3. The evaluation results on several open sources show that our approach is feasible and has high accuracy.


References: 19

  1. N. Rosenblum, X. Zhu, B. P. Miller, and K. Hunt, “Learning to Analyze Binary Computer Code,” in Proceedings of the Twenty-Third Conference on Artificial Intelligence (AAAI-08), Chicago, IL, 2008
  2. T. Bao, J. Burket, M. Woo, R. Turner, and D. Brumley, “Byteweight: Learning to Recognize Functions in Binary Code,” in Proceedings of the 23rd USENIX Conference on Security Symposium, pp. 845-860, 2014
  3. S. Wang, P. Wang, and D. Wu, “Semantics-Aware Machine Learning for Function Recognition in Binary Code,” in Proceedings of 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 388-398, 2017
  4. J. Qiu, X. Su, and P. Ma, “Identifying Functions in Binary Code with Reverse Extended Control Flow Graphs,” Journal of Software: Evolution and Process, Vol. 27, No. 10, pp. 793-820, 2015
  5. “Keras: The Python Deep Learning Library,” (, last accessed on January 1, 2019)
  6. “Tensorflow: An Open Source Machine Learning Framework for Everyone,” (, last accessed on January 1, 2019)
  7. “Exeinfo PE: A Packer, Compressor Detector,” (, last accessed on January 1, 2019)
  8. J. Kinder and H. Veith, “Jakstab: A Static Analysis Platform for Binaries,” in Proceedings of International Conference on Computer Aided Verification, pp. 423-427, Springer, 2008
  9. “The MASM32 SDK,” (, last accessed on January 1, 2019)
  10. M. A Laurenzano, M. M. Tikir, L. Carrington, and A. Snavely, “Pebil: Efficient Static Binary Instrumentation for Linux,” in Proceedings of 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp. 175-183, 2010
  11. S. Nanda, W. Li, L. Lam, and T. Chiueh, “Bird: Binary Interpretation using Runtime Disassembly,” in Proceedings of the International Symposium on Code Generation and Optimization, pp. 358-370, IEEE Computer Society, 2006
  12. B. Schwarz, S. Debray, G. Andrews, and M. Legendre, “PLTO: A Link-time Optimizer for the Intel IA-32 Architecture,” in Proceedings of 2001 Workshop on Binary Translation (WBT-2001), 2001
  13. M. Prasad and T. Chiueh, “A Binary Rewriting Defense Against Stack based Buffer Overflow Attacks,” in Proceedings of USENIX Annual Technical Conference, General Track, pp. 211-224, 2003
  14. M. Smithson, K. An, A. Kotha, K. Elwazeer, N. Giles, and R. Barua, “Binary Rewriting Without Relocation Information,” University of Maryland, Tech. Rep, 2010
  15. B. Schwarz, S. Debray, and G. Andrews, “Disassembly of Executable Code Revisited,” in Proceedings of 2002 Ninth Working Conference on Reverse Engineering, pp. 45-54, IEEE, 2002
  16. R. Wartell, Y. Zhou, K. W Hamlen, M. Kantarcioglu, and B. Thuraisingham, “Differentiating Code from Data in x86 Binaries,” Machine Learning and Knowledge Discovery in Databases, pp. 522-536, Springer, 2011
  17. C. Kruegel, W. Robertson, F. Valeur, and G. Vigna, “Static Disassembly of Obfuscated Binaries,” in Proceedings of the 13th Conference on USENIX Security Symposium, Vol. 13, pp. 18, 2004
  18. L. C. Harris and B. P. Miller, “Practical Analysis of Stripped Binary Code,” in Proceedings of Workshop on Binary Instrumentation and Applications (WBIA2005), 2005
  19. “The IDA Pro Disassembler and Debugger,” (, last accessed on January 1, 2019)


        Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

        This site uses encryption for transmitting your passwords.