Chinese Word Segmentation based on Bidirectional GRU-CRF Model

doi:10.23940/ijpe.18.12.p16.30663075

Abstract

Abstract:

As an effective model for processing time series data, the recurrent neural network has been widely used in the problem of sequence tagging tasks. In order to solve the typical sequence tagging task of Chinese word segmentation, in this paper we propose an improved bidirectional gated recurrent unit conditional random field (BI-GRU-CRF) model based on the gated recurrent unit (GRU) neural network. This network is more easily trained than the LSTM neural network. This method can not only effectively utilize text information in two directions through bidirectional gated recurrent units, but also obtain the globally optimal tagging sequence as a result by considering the correlation between neighbor tags through the conditional random field. In this paper, experiments are carried out on the common evaluation set (PKU, MSRA, CTB) with the four-tag-set and six-tag-set respectively. The results show that the BI-GRU-CRF model has high performance in Chinese word segmentation, and the six-tag-set can improve the performance of the network.

Key words: recurrent neural network, BI-GRU-CRF, Chinese word segmentation, sequence tagging

Jinli Che, Liwei Tang, Shijie Deng, and Xujun Su. Chinese Word Segmentation based on Bidirectional GRU-CRF Model [J]. Int J Performability Eng, 2018, 14(12): 3066-3075.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

Figures/Tables 9

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Table 1

Table 2

Table 3

References 25

[1]	C.N. Huang and H. Zhao , “Chinese Word Segmentation: A Decade Review,” Journal of Chinese Information Processing, Vol. 21, No. 3, pp. 8-19, May 2017 doi: 10.1360/aas-007-0315
[2]	G.H. Feng and W. Zhen , “Review of Chinese Automatic Word Segmentation,” Science Library and Information Service, Vol. 55, No. 2, pp. 41-45, January 2011
[3]	N.W. Xue , “Chinese Word Segmentation as Character Tagging,” Computational Linguistics and Chinese Language Processing, Vol. 8, No. 1, pp. 29-48, February 2003 doi: 10.3115/1119250.1119278
[4]	Q. Liu, H.P Zhang, H.K. Yu, X.Q. Cheng , “Chinese Lexical Analysis Using Cascaded Hidden Markov Model,” Journal of Computer Research and Development, Vol.41, No. 8, pp. 1421-1429, August 2004 doi: 10.1016/S0305-0548(02)00154-5
[5]	Z.Y. Qian, J.Z. Zhou, G.P. Tong, X.N. Sun , “Research on Automatic Word Segmentation and POS Tagging For Chu Ci b Based on HMM,” Library and Information Service, Vol. 58, No. 4, pp. 105-110, February 2014
[6]	R. Collobert and J. Weston ,“A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning,”in Proceedings of the 25th International Conference on Machine Learning, pp. 160-167, Helsinki, Finland, June 2008 doi: 10.1145/1390156.1390177
[7]	H. Zhao, C.N. Huang, M. Li, B.L. Lu , “Effective Tag Set Selection in Chinese Word Segmentation via Conditional Random Field Modeling,” in Proceedings of the 20th Pacific Asia Conference on Language Information and Computation, pp. 87-94, Wuhan, China,November 2006
[8]	Y.M Hou, H.Q. Zhou, Z.Y Wang , “Overview of Speech Recognition based on Deep Learning,” Application Research of Computers, Vol. 34, No. 8, pp. 2241-2246, August 2017
[9]	H.T. Lu and Q.C. Zhang, “Applications of Deep Convolutional Neural Network in Computer Vision,” Journal of Data Acquisition and Processing, Vol. 31, No. 1, pp. 1-17, January 2016 doi: 10.16337/j.1004-9037.2016.01.001
[10]	X.F. Xi and G.D. Zhou ,“A Survey on Deep Learning for Natural Language Processing,” ACTA Automatic Sinica, Vol. 42, No. 10, pp. 1445-1465, October 2016 doi: 10.16383/j.aas.2016.c150682
[11]	Y. Bengio, R. Ducharme, P. Vincent, and C.Jauvin , “A Neural Probabilistic Language Model,” Journal of Machine Learning Research, Vol. 3, No. 6, pp. 1137-1155, March 2003 doi: 10.1007/3-540-33486-6_6
[12]	X.Q. Zheng, H.Y. Chen, T.Y. Xu , “Deep Learning for Chinese Word Segmentation and POS Tagging,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 647-657,Seattle, Washington,USA, October 2013
[13]	X.C. Chen, X.P. Qiu, C.X. Zhu, X.J. Huang , “Gated Recursive Neural Network for Chinese Word Segmentation,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 567-572,Beijing, China, July 2015
[14]	X.C. Chen, X.P. Qiu, C.X. Zhu, P.F. Liu, X.J. Huang , “Long Short-Term Memory Neural Networks for Chinese Word Segmentation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing,pp.1197-1206, Lisbon, Portugal, September 2015
[15]	K. Cho, B. Van Merrienboer, C. Gulcehre , “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation>, ” in Proceedings of the 2014 Conference on EmpiricalMethods in Natural Language Processing, pp.1724-1734, Doha, Qatar, October 2014 doi: 10.3115/v1/D14-1179
[16]	R. Jozefowicz, W. Zaremba, I. Sutskever , “An Empirical Exploration of Recurrent Network Architectures,”in Proceedings of the 32nd International Conference on Machine Learning, pp. 2342-2350, Lille, France, July 2015
[17]	A. Graves and J. Schmidhuber ,“Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures,” Neural Networks, Vol. 18, No. 5-6, pp. 602-610, March 2005 doi: 10.1016/j.neunet.2005.06.042 pmid: 16112549
[18]	C. Jin, W.H. Li, C. Ji, X.Z. Jin, Y.B. Guo , “Bi-Directional Long Short-Term Memory Neural Networks for Chinese Word Segmentation,” Journal of Chinese Information Processing, Vol. 32, No. 2, pp. 29-37, February 2018
[19]	Y. Bengio, P. Simard, P. Frasconi , “Learning Long-Term Dependencies with Gradient Descent is Difficult,” IEEE Transactions on Neural Networks, Vol. 5, No. 2, pp. 157-166, February 2002 doi: 10.1109/72.279181 pmid: 18267787
[20]	R. Pascanu, T. Mikolov, Y. Bengio , “On the Difficulty if Training Recurrent Neural Networks,” in Proceedings of the 30th International Conference on Machine Learning, pp. 1301-1310,Atlanta, Georgia, USA, June 2013 doi: 10.1007/s12088-011-0245-8
[21]	S. Hochreiter and J. Schmidhuber ,“Long Short-Term Memory,” Neural Computation, Vol. 9, No. 8, pp. 1735-1780, November 1997
[22]	Z.H. Ren, H.Y. Xu, S.L. Feng, H. Zhou, J. Shi , “Sequence Labeling Chinese Word Segmentation Method based on LSTM Networks,” Application Research of Computers, Vol. 33, No. 5, pp. 1321-1326, November 2017
[23]	Y.S. Yao and Z. Huang, “Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation,” in Proceedings of the 23rd International Conference on Neural Information Processing, pp. 345-353,Kyoto, Japan, October 2016 doi: 10.1007/978-3-319-46681-1_42
[24]	T. Mikolov, K. Chen, G. Corrado, J. Dean , “Efficient Estimation of Word Representations in Vector Space,” Computer Science, 2013
[25]	H. Tseng, P. Chang, G. Andrew G, D. Jurafsky, C. Manning , “A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005>, ”in Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, pp. 168-171,Jeju, Korea, October 2005

Hyper-parameters	Values
Character embedding dimension	d=200
Dimension of hidden layer	h =128
Dropout rate	p =0.2
Initial learning rate	lr=0.002

Models	PKU			MSRA			CTB6
Models	P	R	F1	P	R	F1	P	R	F1
CRF(4tag)	0.878	0.856	0.867	0.881	0.863	0.872	0.883	0.866	0.875
GRU(4tag)	0.956	0.948	0.953	0.962	0.955	0.959	0.959	0.955	0.957
BI-GRU(4tag)	0.964	0.952	0.958	0.965	0.963	0.964	0.960	0.964	0.962
BI-GRU-CRF(4tag)	0.965	0.959	0.962	0.973	0.964	0.969	0.966	0.967	0.967
BI-GRU-CRF(6tag)	0.969	0.963	0.966	0.977	0.967	0.972	0.967	0.970	0.969

Models	PKU	MSRA	CTB6
Tseng[25]	0.950	0.964	—
Collobert [6]	0.946	—	0.894
Chen[14]	0.965	0.974	—
Yao[23]	0.965	0.976	—
BI-GRU-CRF(6tag)	0.966	0.972	0.969