| [1] F. Wu, H. Zhang,Y. Zhuang, “Learning Semantic Correlations for Cross-Media Retrieval,” inProceedings of IEEE International Conference on Image Processing, pp. 1465-1468, IEEE, 2007 [2] D. R. Hardoon, S. Szedmak,J. Shawe-Taylor, “Canonical Correlation Analysis: An Overview with Application to Learning Methods,” Neural Computation, Vol. 16, No. 12, pp. 2639, 2004
 [3] Y. Q. Jia, M. Salzmann,T. Darrell, “Learning Cross-Modality Similarity for Multinomial Data,” inProceedings of IEEE International Conference on Computer Vision, pp. 2407-2414, 2011
 [4] C. C. Kang, et al., “Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval,” IEEE Transactions on Multimedia, Vol. 17, No. 3, pp. 370-381, 2015
 [5] J. F. He, et al., “Cross-Modal Retrieval by Real Label Partial Least Squares,” inProceedings of ACM on Multimedia Conference ACM, pp. 227-231, 2016
 [6] X. Chang and Y. Yang, “Semisupervised Feature Analysis by Mining Correlations Among Multiple Tasks,” IEEE Transactions on Neural Networks & Learning Systems, Vol. 28, No. 10, pp. 2294-2305, 2016
 [7] H. Zhang, Y. Liu,Z. Ma, “Fusing Inherent and External Knowledge with Nonlinear Learning for Cross-Media Retrieval,” Neurocomputing, Vol. 119, No.16, pp. 10-16, 2013
 [8] H. Zhang and X. Liu, “Cross-Media Semantics Mining based on Sparse Canonical Correlation Analysis and Relevance Feedback,” inProceedings of Pacific-Rim Conference on Advances in Multimedia Information Processing, pp. 759-768, Springer-Verlag, 2012
 [9] Y. X. Wang, H. Zhang,F. Yang, “A Weighted Sparse Neighbourhood-Preserving Projections for Face Recognition,”IETE Journal of Research, pp. 1-10, 2017
 [10] H. X. Zhang, L. Cao,S. Gao, “A Locality Correlation Preserving Support Vector Machine,” Pattern Recognition, Vol. 47, No. 9, pp. 3168-3178, 2014
 [11] J. H. Yan, et al., “Joint Graph Regularization based Modality-Dependent Cross-Media Retrieval,”Multimedia Tools & Applications, No. 6, pp. 1-19, 2017
 [12] X. Liang, Y. Wei, X. Shen, et al., “Proposal-Free Network for Instance-Level Object Segmentation,”IEEE Transactions on Pattern Analysis and Machine, 2015
 [13] Y. H. Xiao, et al., “Topographic NMF for Data Representation,” IEEE Transactions on Cybernetics, Vol. 44, No. 10, pp. 1762, 2014
 [14] X. Liang, Y. Wei, L. Lin, et al., “Learning to Segment Human by Watching YouTube,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, No. 7, pp. 1462-1468, 2017
 [15] X. Zhai, Y. Peng,J. Xiao, “Learning Cross-Media Joint Representation with Sparse and Semisupervised Regularization,” IEEE Transactions on Circuits & Systems for Video Technology, Vol. 24, No. 6, pp. 965-978, 2014
 [16] X. Zhai, Y. Peng,J. Xiao, “Heterogeneous Metric Learning with Joint Graph Regularization for Cross-Media Retrieval,” inProceedings of Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 1198-1204, 2013
 [17] X. Zhai, Y. Peng,J. Xiao, “Cross-modality Correlation Propagation for Cross-Media Retrieval,” inProceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2337-2340, 2012
 [18] X. Zhai, Y. Peng,J. Xiao, “Effective Heterogeneous Similarity Measure with Nearest Neighbors for Cross-Media Retrieval,” inProceedings of International Conference on Advances in Multimedia Modeling Springer-Verlag, pp. 312-322, 2012
 [19] Y. Wei, Y. Zhao, Z. Zhu, et al., “Modality-Dependent Cross-Media Retrieval, ” ACM Transactions on Intelligent Systems & Technology, Vol. 7, No. 4, pp. 1-13, 2016
 [20] D. H. Le, “Pseudo-Label: The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks,” in Proceedings of the 2013 ICML Workshop on Challenges in Representation Learning,  pp. 1-4, 2013.
 [21] W. Wang, R. Arora, K. Livescu, et al., “On Deep Multi-View Representation Learning,” inProceedings of International Conference on Machine Learning, pp. 1083-1092, 2015
 [22] A. Karpathy, A. Joulin,L. Fei-Fei, “Deep Fragment Embeddings for Bidirectional Image Sentence Mapping,” Advances in Neural Information Processing Systems, pp. 1889-1897, 2015
 [23] D. Yu, L. Deng,G. E. Dahl, “Roles of Pre-Training and Fine-Tuning in Context-Dependent DBN-HMMs for Real-World Speech Recognition,” inProceedings of Nips Workshop on Deep Learning & Unsupervised Feature Learning, 2010
 [24] X. Zhang, B. He,T. Luo, “Training Query Filtering for Semi-Supervised Learning to Rank with Pseudo Labels,” World Wide Web-Internet & Web Information Systems, Vol. 19, No. 5, pp. 833-864, 2016
 [25] N. Rasiwasia, J. C. Pereira, E. Coviello, et al., “A New Approach to Cross-Modal Multimedia Retrieval,” inProceedings of ACM International Conference on Multimedia, pp. 251-260, 2010
 [26] Y. Ke and R. Sukthankar, “PCA-SIFT: A More Distinctive Representation for Local Image Descriptors,” inProceedings of IEEE Computer Society Conference on Computer Vision & Pattern Recognition, pp. 506-513, 2004
 [27] D. M. Blei, A. Y. Ng,M. I. Jordan, “Latent Dirichlet Allocation,”Journal of Machine Learning Research Archive, No. 3, pp. 993-1022, 2003
 [28] C. Rashtchian, P. Young, M. Hodosh, et al., “Collecting Image Annotations using Amazon's Mechanical Turk,” in Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Association for Computational Linguistics, pp. 139-147, 2010
 [29] L. Zheng, Y. Zhao, S. Wang, et al., “Good Practice in CNN Feature Transfer,” arXiv preprint, arXiv:1604.00133, pp. 1-9 2016
 [30] Y. Gong, Q. Ke, M. Isard, et al., “A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics,” International Journal of Computer Vision, Vol. 106, No. 2, pp. 210-233, 2014
 [31] D. W. Jacobs, H. Daume, A. Kumar,A. Sharma, “Generalized Multiview Analysis: A discriminative Latent Space,” inProceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2160-2167, IEEE Computer Society, 2012
 |