Int J Performability Eng ›› 2018, Vol. 14 ›› Issue (2): 245-258.doi: 10.23940/ijpe.18.02.p6.245258
• Original articles • Previous Articles Next Articles
Lianyin Jiaa, Lulu Zhanga, Guoxian Yub, Jinguo Youa, Jiaman Dinga, and Mengjuan Lic
Lianyin Jia, Lulu Zhang, Guoxian Yu, Jinguo You, Jiaman Ding, and Mengjuan Li. A Survey on Set Similarity Search and Join [J]. Int J Performability Eng, 2018, 14(2): 245-258.
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
A. Arasu, V. Ganti, and R. Kaushik, "Efficient Exact Set-Similarity Joins," presented at the Proceedings of the 32nd international conference on Very large data bases, Seoul, Korea, 2006. | |
H. Bast and I. Weber, "Type Less, Find More: Fast Autocompletion Search with a Succinct Index," in International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006, pp. 364-371. | |
R. J. Bayardo, Y. Ma, and R. Srikant, "Scaling up All Pairs Similarity Search," presented at the Proceedings of the 16th international conference on World Wide Web, Banff, Alberta, Canada, 2007. | |
P. Bouros, S. Ge, and N. Mamoulis, "Spatio-Textual Similarity Joins," Proc. VLDB Endow., vol. 6,no. 1, pp. 1-12, 2012. | |
A. Z. Broder, M. Charikar, A. M. Frieze, and M. Mitzenmacher, "Min-Wise Independent Permutations," J. Comput. Syst. Sci., vol. 60,no. 3, pp. 630-659, 2000. | |
S. Burrows, S. M. M. Tahaghoghi, and J. Zobel, "Efficient Plagiarism Detection for Large Code Repositories," Softw. Pract. Exper., vol. 37,no. 2, pp. 151-175, 2007. | |
S. Chaudhuri, V. Ganti, and R. Kaushik, "A Primitive Operator for Similarity Joins in Data Cleaning," presented at the Proceedings of the 22nd International Conference on Data Engineering, 2006. | |
A. S. Das, M. Datar, A. Garg, and S. Rajaram, "Google News Personalization: Scalable Online Collaborative Filtering," presented at the Proceedings of the 16th international conference on World Wide Web, Banff, Alberta, Canada, 2007. | |
D. Deng, G. Li, and J. Feng, "A Pivotal Prefix Based Filtering Algorithm for String Similarity Search," presented at the Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, Utah, USA, 2014. | |
D. Deng, G. Li, H. Wen, and J. Feng, "An Efficient Partition Based Method for Exact Set Similarity Joins," Proc. VLDB Endow., vol. 9,no. 4, pp. 360-371, 2015. | |
J. Feng, J. Wang, and G. Li, "Trie-Join: A Trie-Based Method for Efficient String Similarity Joins," The VLDB Journal, vol. 21,no. 4, pp. 437-461, 2012. | |
A. Gionis, P. Indyk, and R. Motwani, "Similarity Search in High Dimensions Via Hashing," presented at the Proceedings of the 25th International Conference on Very Large Data Bases, 1999. | |
L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava, "Approximate String Joins in a Database (Almost) for Free," presented at the Proceedings of the 27th International Conference on Very Large Data Bases, 2001. | |
S. Helmer, R. Aly, T. Neumann, and G. Moerkotte, "Indexing Set-Valued Attributes with a Multi-Level Extendible Hashing Scheme," presented at the Proceedings of the 18th international conference on Database and Expert Systems Applications, Regensburg, Germany, 2007. | |
S. Helmer and G. Moerkotte, "A Performance Study of Four Index Structures for Set-Valued Attributes of Low Cardinality," The VLDB Journal The International Journal on Very Large Data Bases, vol. 12,no. 3, pp. 244-261, 2003. | |
M. Henzinger, "Finding near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms," in International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006, pp. 284-291. | |
A. Ibrahim and G. H. L. Fletcher, "Efficient Processing of Containment Queries on Nested Sets," presented at the Proceedings of the 16th International Conference on Extending Database Technology, Genoa, Italy, 2013. | |
P. Indyk and R. Motwani, "Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality," presented at the Proceedings of the thirtieth annual ACM symposium on Theory of computing, Dallas, Texas, USA, 1998. | |
Y. Ishikawa, H. Kitagawa, and N. Ohbo, "Evaluation of Signature Files as Set Access Facilities in Oodbs," SIGMOD Rec., vol. 22,no. 2, pp. 247-256, 1993. | |
L. Jia, J. Xi, M. Li, Y. Liu, and D. Miao, "Eti: An Efficient Index for Set Similarity Queries," Frontiers of Computer Science, vol. 6,no. 6, pp. 700-712, 2012. | |
Y. Jiang, D. Deng, J. Wang, G. Li, and J. Feng, "Efficient Parallel Partition-Based Algorithms for Similarity Search and Join with Edit Distance Constraints," presented at the Proceedings of the Joint EDBT/ICDT 2013 Workshops, Genoa, Italy, 2013. | |
O. Kaser and D. Lemire, "Compressed Bitmap Indexes: Beyond Unions and Intersections," Softw. Pract. Exper., vol. 46,no. 2, pp. 167-198, 2016. | |
C. Kim and K. Shim, "Supporting Set-Valued Joins in Nosql Using Mapreduce," Information Systems, vol. 49,no. pp. 52-64, 2015. | |
L. Kolb, A. Thor, and E. Rahm, "Don't Match Twice: Redundancy-Free Similarity Computation with Mapreduce," presented at the Proceedings of the Second Workshop on Data Analytics in the Cloud, New York, New York, 2013. | |
J. P. Kumar and P. Govindarajulu, "Duplicate and near Duplicate Documents Detection: A Review," European Journal of Scientific Research, vol. 32,no. pp. 1450-216, 2009. | |
C. Li, J. Lu, and Y. Lu, "Efficient Merging and Filtering Algorithms for Approximate String Searches," presented at the Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, 2008. | |
G. Li, D. Deng, and J. Feng, "A Partition-Based Method for String Similarity Joins with Edit-Distance Constraints," ACM Transactions on Database Systems, vol. 38,no. 2, pp. 1-33, 2013. | |
G. Li, D. Deng, J. Wang, and J. Feng, "Pass-Join: A Partition-Based Method for Similarity Joins," Proc. VLDB Endow., vol. 5,no. 3, pp. 253-264, 2011. | |
G. Li, J. He, D. Deng, and J. Li, "Efficient Similarity Join and Search on Multi-Attribute Data," presented at the Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, 2015. | |
W. Mann, N. Augsten, and P. Bouros, "An Empirical Evaluation of Set Similarity Join Techniques," Proc. VLDB Endow., vol. 9,no. 9, pp. 636-647, 2016. | |
A. Metwally and C. Faloutsos, "V-Smart-Join: A Scalable Mapreduce Framework for All-Pair Similarity Joins of Multisets and Vectors," Proc. VLDB Endow., vol. 5,no. 8, pp. 704-715, 2012. | |
L. A. Ribeiro, T. H, #228, and rder, "Generalizing Prefix Filtering to Improve Set Similarity Joins," Inf. Syst., vol. 36,no. 1, pp. 62-78, 2011. | |
C. Rong, C. Lin, Y. N. Silva, J. Wang, W. Lu, and X. Du, "Fast and Scalable Distributed Set Similarity Joins for Big Data Analytics," in IEEE International Conference on Data Engineering, 2017, pp. 1059-1070. | |
C. Rong, W. Lu, X. Wang, X. Du, Y. Chen, and A. K. H. Tung, "Efficient and Scalable Processing of String Similarity Join," IEEE Trans. on Knowl. and Data Eng., vol. 25,no. 10, pp. 2217-2230, 2013. | |
C. Rong, T. Xu, and X. Du, "Partition-Based Set Similarity Join," Jorunal of Computer Research and Development, vol. 49,no. 10, pp. 2066-2076, 2012. | |
M. Sahami and T. D. Heilman, "A Web-Based Kernel Function for Measuring the Similarity of Short Text Snippets," presented at the Proceedings of the 15th international conference on World Wide Web, Edinburgh, Scotland, 2006. | |
S. Sarawagi and A. Kirpal, "Efficient Set Joins on Similarity Predicates," presented at the Proceedings of the 2004 ACM SIGMOD international conference on Management of data, Paris, France, 2004. | |
A. D. Sarma, Y. He, and S. Chaudhuri, "Clusterjoin: A Similarity Joins Framework Using Map-Reduce," Proc. VLDB Endow., vol. 7,no. 12, pp. 1059-1070, 2014. | |
V. Satuluri and S. Parthasarathy, "Bayesian Locality Sensitive Hashing for Fast Similarity Search," Proc. VLDB Endow., vol. 5,no. 5, pp. 430-441, 2012. | |
Y. N. Silva, W. G. Aref, and M. H. Ali, "The Similarity Join Database Operator," in IEEE International Conference on Data Engineering, 2010, pp. 892-903. | |
E. Spertus, M. Sahami, and O. Buyukkokten, "Evaluating Similarity Measures: A Large-Scale Study in the Orkut Social Network," presented at the Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, Chicago, Illinois, USA, 2005. | |
M. Theobald, J. Siddharth, and A. Paepcke, "Spotsigs: Robust and Efficient near Duplicate Detection in Large Web Collections," presented at the Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore, Singapore, 2008. | |
R. Vernica, M. J. Carey, and C. Li, "Efficient Parallel Set-Similarity Joins Using Mapreduce," presented at the Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, Indianapolis, Indiana, USA, 2010. | |
S. Wandelt, D. Deng, S. Gerdjikov, S. Mishra, P. Mitankin, M. Patil, et al., "State-of-the-Art in String Similarity Search and Join," SIGMOD Rec., vol. 43,no. 1, pp. 64-76, 2014. | |
S. Wandelt, J. Starlinger, M. Bux, and U. Leser, "Rcsi: Scalable Similarity Search in Thousand(S) of Genomes," Proc. VLDB Endow., vol. 6,no. 13, pp. 1534-1545, 2013. | |
J. Wang, J. Feng, and G. Li, "Trie-Join: Efficient Trie-Based String Similarity Joins with Edit-Distance Constraints," Proc. VLDB Endow., vol. 3,no. 1-2, pp. 1219-1230, 2010. | |
J. Wang, T. Kraska, M. J. Franklin, and J. Feng, "Crowder: Crowdsourcing Entity Resolution," Proc. VLDB Endow., vol. 5,no. 11, pp. 1483-1494, 2012. | |
J. Wang, G. Li, and J. Feng, "Can We Beat the Prefix Filtering?: An Adaptive Framework for Similarity Join and Search," presented at the Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, Arizona, USA, 2012. | |
C. Xiao, J. Qin, W. Wang, Y. Ishikawa, K. Tsuda, and K. Sadakane, "Efficient Error-Tolerant Query Autocompletion," Proc. VLDB Endow., vol. 6,no. 6, pp. 373-384, 2013. | |
M. Yu, G. Li, D. Deng, and J. Feng, "String Similarity Search and Join: A Survey," Front. Comput. Sci., vol. 10,no. 3, pp. 399-417, 2016. | |
J. Zhai, Y. Lou, and J. Gehrke, "Atlas: A Probabilistic Algorithm for High Dimensional Similarity Search," presented at the Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, Athens, Greece, 2011. | |
J. Zhou, X. Nie, L. Qin, and J. Zhu, "Web Clustering Based on Tag Set Similarity," Journal of Computers, 2011. |
No related articles found! |
|