Int J Performability Eng ›› 2023, Vol. 19 ›› Issue (7): 471-480.doi: 10.23940/ijpe.23.07.p6.471480

Previous Articles     Next Articles

EDocDeDup: Electronic Document Data Deduplication Towards Storage Optimization

Me Me Khaing and N. Jeyanthi*   

  1. School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, India
  • Contact: * E-mail address:

Abstract: Understanding data deduplication in storage is essential for investigating the optimization of various data storage issues. For detecting and removing duplicate data, data deduplication has become an important and cost-effective optimization technique. Storage issues in the storage area for organizations exist, and if not conveyed to optimize there and then, a slower rate of storage capacity is expected. The proposed system (EdocDedup) addresses the aforementioned issue by applying data deduplication technique and implementing SHA-256 for hash value calculation and only keeping the unique hash values on an electronic document’s dataset containing word files, text files, html files, excel files, zip files, pdf files, and PowerPoint presentation files. By demanding the proposed technique there is a benefit in storage saved and a variety of duplicate files are explored efficiently. EdocDedup's performance is achieved through the use of user-uploaded files.

Key words: data deduplication, EdocDedup, electronic document files, SHA-256, hash, storage optimization