Ensemble Learning for Appraising English Text Readability using Gompertz Function

doi:10.23940/ijpe.23.06.p4.388396

Abstract

Abstract: To fulfill individuals' informational demands, text readability is crucial. The assessing necessity of text readability is rising as a result of the enormous increase of contemporary content. An ensemble approach to learning utilizing the Gompertz function is suggested to assess the readability of English writings in light of word, sentence, and text arrangement. The conventional approach of measuring the readability of English literature depends excessively on the capacity of artificial experts to identify characteristics, which restricts its applicability. It becomes increasingly challenging to manually identify deep features due to the diversity and volume of text being used, as well as the readability assessment characteristics that must be extracted, and it is simple to add redundant or unnecessary characteristics, which hurts the effectiveness of the framework. For this study, the authors experimented with 25,000 English sentences. Furthermore, they were classified by Flesch-Kincaid and annotated into seven distinct readability categories. The study proposes an ensemble based model that employs five machine learning models as its base classifiers. The outcomes produced by the suggested ensemble based model are outstanding and reliable. The suggested model had an accuracy, precision, recall and F-score of 90.58%, 0.9545, 0.9467 and 0.9506, respectively on the test set. The created model may be applied in educational settings for tasks like language acquisition and evaluating an individual's reading and writing skills.

Key words: classification, ensemble learning, Gompertz function, machine learning, readability

Rakesh Kumar, Sunny Arora, Ashima Arya, Neha Kohli, Vaishali Arya, and Ekta Singh. Ensemble Learning for Appraising English Text Readability using Gompertz Function [J]. Int J Performability Eng, 2023, 19(6): 388-396.

Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks

References

1. Galliussi J., Perondi L., Chia G., Gerbino W., andBernardis P.Inter-letter spacing, inter-word spacing, and font with dyslexia-friendly features: testing text readability in people with and without dyslexia. Annals of dyslexia, vol. 70, no. 1, pp. 141-152, 2020
2. Liu H., Li S., Zhao J., Bao Z., andBai X.Chinese teaching material readability assessment with contextual information. In International Conference on Asian Language Processing (IALP) (pp. 1-6). Singapore, 2017
3. Kodym, O.,Hradiš, M.TG 2: text-guided transformer GAN for restoring document readability and perceived quality. International Journal on Document Analysis and Recognition (IJDAR), vol. 25, no. 1, pp. 15-28, 2022
4. Nahatame S.Text readability and processing effort in second language reading: A computational and eye‐tracking investigation. Language learning, vol. 71, no. 4, pp. 1004-1043, 2021
5. Chen, X.,Meurers, D.Word frequency and readability: Predicting the text‐level readability with a lexical‐level attribute. Journal of Research in Reading, vol. 41, no. 3, pp. 486-510, 2018
6. Alemán Carreón, E., Mendoza España, H., Nonaka, H., and Hiraoka, T. Differences in Chinese and Western tourists faced with Japanese hospitality: a natural language processing approach. Information Technology and Tourism, vol. 23, pp. 381-438, 2021
7. Alejos D., Tregubenko P., andKumar A.Preservation of life is not easy: readability text analysis of patient information on fertility preservation options. Clinical Lymphoma, Myeloma and Leukemia, 19, S269, 2019
8. Sung Y., Lin W., Dyson S., Chang K., andChen Y.Leveling L2 texts through readability: Combining multilevel linguistic features with the CEFR. The Modern Language Journal, vol. 99, no. 2, pp. 371-391, 2015
9. Esfahani B., Faron A., Roth K., Grimminger P., andLuers J.Systematic readability analysis of medical texts on websites of German university clinics for general and abdominal surgery. Zentralblatt fur Chirurgie, vol. 141, no. 6, pp. 639-644, 2015
10. O'Toole, J.,King, R. A matter of significance: can sampling error invalidate cloze estimates of text readability? Language Assessment Quarterly, vol. 7, no. 4, pp. 303-316, 2010
11. Gecit Y.The Evaluation of High School Geography 9 and High School Geography 11 Text Books with Some Formulas of Readability. Educational Sciences: Theory and Practice, vol. 10, no. 4, pp. 2205-2220, 2010
12. Benjamin R.Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review, vol. 24, pp. 63-88, 2012
13. Marti, U.,Bunke, H.Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. International Journal of Pattern Recognition and Artificial Intelligence, vol. 15, no. 1, pp. 65-90, 2001
14. Jones D., Gibson E., Shen W., Granoien N., Herzog M., Reynolds D., andWeinstein C.Measuring human readability of machine-generated text: three case studies in speech recognition and machine translation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'05) (vol. 2, pp. II-569). Philadelphia, PA, USA, 2005
15. Shams, R.,Mercer, R.Classifying spam emails using text and readability features. In 2013 IEEE 13th International Conference on Data Mining (pp. 747-756). Dallas, TX, USA, 2013
16. Xia M., Kochmar E., andBriscoe T.Text readability assessment for second language learners. arXiv preprint arXiv:1906.07580, 2019
17. Crossley S., Skalicky S., andDascalu M.Moving beyond classic readability formulas: New methods and new models. Journal of Research in Reading, vol.42, no. 3-4, pp. 541-561, 2019
18. Tseng H.C., Chen H.C., Chang K.E., Sung Y.T. and Chen B.An innovative bert-based readability model. In Innovative Technologies and Learning: Second International Conference, ICITL 2019, Tromsø, Norway, 2019, Proceedings 2 (pp. 301-308). Springer International Publishing, 2019, December
19. Martinc M., Pollak S., andRobnik-Šikonja, M. Supervised and unsupervised neural approaches to text readability. Computational Linguistics, vol. 47, no. 1, pp. 141-179, 2021
20. Follmer D., Li P., andClariana R.Predicting expository text processing: causal content density as a critical expository text metric. Reading Psychology, vol. 42, no. 6, pp. 625-662, 2021
21. Pathak A., Batra S., andSharma V.An Assessment of the Missing Data Imputation Techniques for COVID-19 Data. In Proceedings of 3rd International Conference on Machine Learning, Advances in Computing, Renewable Energy and Communication: MARC 2021 (pp. 123-135). Singapore, 2022.
22. Pathak A., Batra S., andChaudhary H.Imputing Missing Data in Electronic Health Records. In Proceedings of 3rd International Conference on Machine Learning, Advances in Computing, Renewable Energy and Communication: MARC 2021 (pp. 246-258). Singapore, 2022
23. Drucker H., Burges C., Kaufman L., Smola A., andVapnik V.Support vector regression machines. In Advances in neural information processing systems, vol. 9, pp. 155-161, 1996
24. Ukil A.Intelligent systems and signal processing in power engineering. Berlin/Heidelberg, Germany: Springer Science and Business Media, 2007
25. Hsu, C.,Lin, C.A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks, vol. 13, no. 2, pp. 415-425, 2002
26. Yue S., Li P., andHao P.SVM classification: Its contents and challenges. Applied Mathematics-A Journal of Chinese Universities, vol. 18, pp. 332-342, 2003
27. Cover, T.,Hart, P.Nearest neighbor pattern classification. IEEE Transactions on Information Theory, vol. 13, no. 1, pp. 21-27, 1967
28. Jamjoom M., Alabdulkreem E., Hadjouni M., Karim F., andQarh M.Early Prediction for At-Risk Students in an Introductory Programming Course Based on Student Self-Efficacy. Informatica, vol. 45, no. 6, pp. 1-9, 2021
29. Paul B., Dey T., Das Adhikary, D., Guchhai, S., and Bera, S. A Novel Approach of Audio-Visual Color Recognition Using KNN. In Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2021 (pp. 78-92). Singapore, 2022
30. Batra, S.,Sachdeva, S.Organizing standardized electronic healthcare records data for mining. Health Policy and Technology, vol. 5, no. 3, pp. 226-242, 2016
31. Batra, S.,Sachdeva, S.Pre-processing highly sparse and frequently evolving standardized electronic health records for mining. In Handbook of Research on Disease Prediction Through Data Analytics and Machine Learning (pp. 8-21). IGI Global, 2021
32. Sachdeva S., Batra D., andBatra S. Storage Efficient Implementation of Standardized Electronic Health Records Data. In2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2020
33. Zhang C., Zhong P., Liu M., Song Q., Liang Z., andWang X.Hybrid Metric K-Nearest Neighbor Algorithm and Applications. Mathematical Problems in Engineering, pp. 1-15, 2022
34. Szabo F.The linear algebra survival guide: illustrated with Mathematica. Cambridge, MA, USA: Academic Press, 2015
35. Ng, A.,Jordan, M.On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes.In Advances in neural information processing systems, vol. 14, pp. 1-8, 2001
36. Quinlan J.Induction of decision trees. Machine Learning, vol. 1, pp. 81-106, 1986
37. Liaw, A.,Wiener, M.Classification and regression by randomForest. R news, vol. 2, no. 3, pp. 18-22, 2022
38. Waliszewski, P.,Konarski, J.A mystery of the Gompertz function. Fractals in Biology and Medicine, vol. 4, pp. 277-286, 2005
39. Batra S., Sharma H., Boulila W., Arya V., Srivastava P., Khan M. Z., andKrichen M.An Intelligent Sensor Based Decision Support System for Diagnosing Pulmonary Ailment through Standardized Chest X-ray Scans. Sensors, vol. 22, no. 19, 2022
40. Batra S., Khurana R., Khan M. Z., Boulila W., Koubaa A., andSrivastava P.A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records. Entropy, vol. 24, no. 4, pp. 533, 2022.
41. Solnyshkina M., Zamaletdinov R., Gorodetskaya L., andGabitov A.Evaluating text complexity and Flesch-Kincaid grade level. Journal of Social Studies Education Research, vol. 8, no. 3, pp. 238-248, 2017.

[1]	Janarthanan Sekar and Ganesh Kumar T. Hyperparameter Tuning in Deep Learning-Based Image Classification to Improve Accuracy using Adam Optimization [J]. Int J Performability Eng, 2023, 19(9): 579-586.
[2]	Aashita Rajput, Muskan Yadav, Sachin Yadav, Megha Chhabra, and Arun Prakash Agarwal. Patch-Based Breast Cancer Histopathological Image Classification using Deep Learning [J]. Int J Performability Eng, 2023, 19(9): 607-623.
[3]	C. Rohith Bhat and Madhusundar Nelson. Artificial Intelligence Based Credit Card Fraud Detection for Online Transactions Optimized with Sparrow Search Algorithm [J]. Int J Performability Eng, 2023, 19(9): 624-632.
[4]	Savita Khurana, Gaurav Sharma, and Bhawna Sharma. Hybrid Machine Learning Model for Load Prediction in Cloud Environment [J]. Int J Performability Eng, 2023, 19(8): 507-515.
[5]	K. Eswara Rao, Bala Murali Pydi, T. Panduranga Vital, P. Annan Naidu, U. D. Prasann, and T. Ravikumar. An Advanced Machine Learning Approach for Student Placement Prediction and Analysis [J]. Int J Performability Eng, 2023, 19(8): 536-546.
[6]	Babaljeet Kaur and Shalli Rani. Are the Customers Receiving Exact Recommendations from the E-Commerce Companies? Towards the Identification of Gray Sheep Users Using Personality Parameters [J]. Int J Performability Eng, 2023, 19(7): 425-433.
[7]	Kshitij Kumar Sinha, Manoj Mathur, and Arun Sharma. Suitability Index Prediction for Residential Apartments Through Machine Learning [J]. Int J Performability Eng, 2023, 19(7): 434-442.
[8]	Manpreet Kaur and Shalli Rani. Recommender System: Towards Identification of Shilling Attacks in Rating System Using Machine Learning Algorithms [J]. Int J Performability Eng, 2023, 19(7): 443-451.
[9]	Srishti Bhugra and Puneet Goswami. Exploratory Review of Machine Learning-Based Software Component Reusability Prediction [J]. Int J Performability Eng, 2023, 19(7): 452-461.
[10]	Harsha Gaikwad, Sanil Gandhi, Arvind Kiwelekar, and Manjushree Laddha. Analyzing Brain Signals for Predicting Students’ Understanding of Online Learning: A Machine Learning Approach [J]. Int J Performability Eng, 2023, 19(7): 462-470.
[11]	Neha Kohli and Tapas Kumar. Envisaging Alzheimer’s Disease Stage through Fuzzy Rank-Based Ensemble of Transfer Learning Models [J]. Int J Performability Eng, 2023, 19(6): 397-406.
[12]	Pranshu Kumar Soni and Leema Nelson. PCP: Profit-Driven Churn Prediction using Machine Learning Techniques in Banking Sector [J]. Int J Performability Eng, 2023, 19(5): 303-311.
[13]	Ramneet Kaur, Deepali Gupta, and Mani Madhukar. Learner-Centric Hybrid Filtering-Based Recommender System for Massive Open Online Courses [J]. Int J Performability Eng, 2023, 19(5): 324-333.
[14]	Mahima Yadav and Ishan Kumar. Image Processing-Based Transliteration from Hindi to English [J]. Int J Performability Eng, 2023, 19(5): 334-341.
[15]	Vaishali Arya and Tapas Kumar. Boosting X-Ray Scans Feature for Enriched Diagnosis of Pediatric Pneumonia using Deep Learning Models [J]. Int J Performability Eng, 2023, 19(3): 175-183.