Username   Password       Forgot your password?  Forgot your username? 


Active Learning Method for Chinese Spam Filtering

Volume 13, Number 4, July 2017 - Paper 18 - pp. 511-518
DOI: 10.23940/ijpe.17.04.p18.511518

Guanglu Suna,b, Shaobo Lia, Teng Chena, Xuhang Lia, Suxia Zhua,b

aSchool of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China
bResearch Center of Information Security & Intelligent Technology, Harbin University of Science and Technology, Harbin, 150080, China

(Submitted on February 20, 2017; Revised on May 11, 2017; Accepted on June 15, 2017)


An active learning method is put forward to filter Chinese spam. In terms of training the filtering model, labeling all of the emails seems to be costly and time-consuming, while unlabeled emails can be easily accessed. Misclassification and a low-certainty method is proposed to reduce the number of labeled emails. The ROSVM model is also utilized as the online filtering model. The experimental results show that the proposed method not only decreases the number of training emails and the computational cost of spam filter, but also improves the accuracy of the filter.


References: 21

    1.    E. Blanzieri and A. Bryl, “Evaluation of the highest probability SVM nearest neighbor classifier with variable relative error cost,” University of Trento, 2007
    2.    G. V. Cormack and T. R. Lynam, “TREC 2005 spam track overview,” in Proceedings of the Fourteenth Text REtrieval Conference, pp. 500-274, 2005.
    3.    G. V. Cormack, “TREC 2007 spam track overview,” in Proceedings of the The Sixteenth Text REtrieval Conference, 2007
    4.    G. V. Cormack and T. R. Lynam, “Online supervised spam filter evaluation,” ACM Transactions on Information Systems, vol . 25, no. 3, pp. 11, 2007
    5.    M. Davy, “A review of active learning and co-training in text classification,” Trinity College Dublin, Department of Computer Science, 2005
    6.    S. J. Delany, M. Buckley and D. Greene, “SMS spam filtering: methods and data,” Expert Systems with Applications, vol. 39, no. 10, pp. 9899-9908, 2012
    7.    Y. Fu, X. Zhu and B. Li, “A survey on instance selection for active learning,” Knowledge and information systems, pp. 1-35, 2013
    8.    P. A. Graham, “A plan for spam,” Available from World Wide Web: http://www. paulgraham. com/spam. html, 2003
    9.    T. S. Guzella and W. M. Caminhas, “A review of machine learning approaches to spam filtering,” Expert Systems with Applications, vol. 36, no. 7, pp. 10206-10222, 2009
    10.    G. Hulten and J. Goodman, “Tutorial on junk e-mail filtering,” in ICML, July 2004
    11.    W. Liu and T. Wang, “Active learning for online spam filtering,” Information Retrieval Technology, pp. 555-560, 2008
    12.    W. Liu and T. Wang, “Online active multi-field learning for efficient email spam filtering,” Knowledge and Information Systems, vol. 33, no. 1, pp. 117-136, 2012
    13.    N. PéRez-DíAz, D. Ruano-OrdáS, F. Fdez-Riverola and J. R. MéNdez, “SDAI: An integral evaluation methodology for content-based spam filtering models,” Expert Systems with Applications, vol. 39, no. 16, pp. 12487-12500, 2012
    14.    B. Settles, “Active learning literature survey,” University of Wisconsin, Madison, vol. 52, no. 55-66, pp. 11, 2010
    15.    D. Sculley and G. Wachman, “Relaxed online SVMs for spam filtering,” in The Thirtieth Annual ACM SIGIR Conference Proceedings, 2007
    16.    D. Sculley. “Online active learning methods for fast label efficient spam filtering,” in Proceedings of CEAS, 2007
    17.    I. Santos, C. Laorden, B. Sanz and P. G. Bringas , “Enhanced topic-based vector space model for semantics-aware spam filtering” Expert Systems with applications, vol. 39, no. 1, pp. 437-444, 2012
    18.    O. Saad, A. Darwish and R. Faraj, “A survey of machine learning techniques for Spam filtering,” International Journal of Computer Science and Network Security, vol.12, no. 2, pp. 66, 2012
    19.    S. Shalev-Shwartz, “Online learning and online convex optimization," Foundations and Trends® in Machine Learning, vol. 4, no. 2, pp. 107-194, 2012
    20.    J. Wang, P. Zhao, S. C. Hoi and R. Jin, “Online feature selection and its applications,” IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 3, pp. 698-710, 2014
    21.    B. Zhou, Y. Yao and J. Luo, “Cost-sensitive three-way email spam filtering,” Journal of Intelligent Information Systems, vol. 42, no. 1, pp. 19-45, 2014



      Click here to download the paper.

      Please note : You will need Adobe Acrobat viewer to view the full articles.Get Free Adobe Reader

      This site uses encryption for transmitting your passwords.