Int J Performability Eng ›› 2025, Vol. 21 ›› Issue (5): 259-268.doi: 10.23940/ijpe.25.05.p3.259268
Previous Articles Next Articles
Latika Pinjarkara,*, Devanshu Sawarkara, Pratham Agrawala, Devansh Motgharea, and Nidhi Bansalb
Submitted on
;
Revised on
;
Accepted on
Contact:
* E-mail address: latika.pinjarkar@sitnagpur.siu.edu.in
Latika Pinjarkar, Devanshu Sawarkar, Pratham Agrawal, Devansh Motghare, and Nidhi Bansal. Multi Object Image Captioning via CNNs and Transformer Model [J]. Int J Performability Eng, 2025, 21(5): 259-268.
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
[1] Rinaldi A.M., Russo C., and Tommasino C., 2023. Automatic image captioning combining natural language processing and deep neural networks. [2] Xu K., Ba J., Kiros R., Cho K., Courville A., Salakhudinov R., Zemel R., and Bengio Y., 2015. Show, attend and tell: neural image caption generation with visual attention. InInternational Conference on Machine Learning, pp. 2048-2057. [3] Huang L., Wang W., Chen J., and Wei X.Y., 2019. Attention on attention for image captioning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4634-4643. [4] Zhou L., Palangi H., Zhang L., Hu H., Corso J., and Gao J., 2020. Unified vision-language pre-training for image captioning and vqa. In [5] Mokady R., Hertz A., and Bermano A.H., 2021. Clipcap: clip prefix for image captioning. [6] Qiu J., Lo F.P.W., Gu X., Jobarteh M.L., Jia W., Baranowski T., Steiner-Asiedu M., Anderson A.K., McCrory M.A., Sazonov E., and Sun M., 2023. Egocentric image captioning for privacy-preserved passive dietary intake monitoring. [7] Du R., Cao W., Zhang W., Zhi G., Sun X., Li S., and Li J., 2023. From plane to hierarchy: deformable transformer for remote sensing image captioning. [8] Wang W., Lai Q., Fu H., Shen J., Ling H., and Yang R., 2021. Salient object detection in the deep learning era: an in-depth survey. [9] Rennie S.J., Marcheret E., Mroueh Y., Ross J., and Goel V., 2017. Self-critical sequence training for image captioning. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7008-7024. [10] Ueda A., Yang W., and Sugiura K., 2023. Switching text-based image encoders for captioning images with text. [11] Hafeth D.A., Kollias S., and Ghafoor M., 2023. Semantic representations with attention networks for boosting image captioning. [12] Abdal Hafeth D., and Kollias S., 2024. Insights into object semantics: leveraging transformer networks for advanced image captioning. [13] Hossain M.Z., Sohel F., Shiratuddin M.F., and Laga H., 2019. A comprehensive survey of deep learning for image captioning. [14] Stefanini M., Cornia M., Baraldi L., Cascianelli S., Fiameni G., and Cucchiara R., 2022. From show to tell: A survey on deep learning-based image captioning. [15] Liu S., Bai L., Hu Y., and Wang H., 2018. Image captioning based on deep neural networks. In |
[1] | Pancham Singh, Updesh Kumar Jaiswal, Eshank Jain, Nikhil Kumar, and Vimlesh Mishra. A Novel Methodology Utilizing Modern CCTV Cameras and Software as a Service Model for Crime Detection and Prediction [J]. Int J Performability Eng, 2025, 21(2): 112-121. |
[2] | Seema Kalonia and Amrita Upadhyay. Comparative Analysis of Machine Learning Model and PSO Optimized CNN-RNN for Software Fault Prediction [J]. Int J Performability Eng, 2025, 21(1): 48-55. |
[3] | Manu Banga. Enhancing Software Fault Prediction using Machine Learning [J]. Int J Performability Eng, 2024, 20(9): 529-540. |
[4] | Aditya Dayal Tyagi, and Krishna Asawa. Influence Maximization in Social Network using Community Detection and Node Modularity [J]. Int J Performability Eng, 2024, 20(9): 552-562. |
[5] | Ekta Singh, and Parma Nand. Efficient Multi-Class Facial Emotion Recognition using YOLOv9: A Deep Learning Approach for Real-Time Applications [J]. Int J Performability Eng, 2024, 20(9): 581-590. |
[6] | Manpreet Singh, Gauri Jindal, Akshita Oberoi, and Rohan Dhangar. Improving Crime Detection Through Geo-MDA: A Hybrid Linear Regression Approach in Data Mining [J]. Int J Performability Eng, 2024, 20(8): 469-477. |
[7] | Nilesh Shelke, Deepali Sale, Sagar Shinde, Atul Kathole, and Rachna Somkunwar. A Comprehensive Framework for Facial Emotion Detection using Deep Learning [J]. Int J Performability Eng, 2024, 20(8): 487-497. |
[8] | Lakshya Vaswani, Sai Sri Harsha, Subham Jaiswal, and Aju D. Unravelling Complexity: Investigating the Effectiveness of SHAP Algorithm for Improving Explainability in Network Intrusion System Across Machine and Deep Learning Models [J]. Int J Performability Eng, 2024, 20(7): 421-431. |
[9] | Rohit Kumar Verma and Sukhvir Singh. A Hybrid Framework of Resource Allocation using Firefly and Deep Learning in Big Data Scheduling [J]. Int J Performability Eng, 2024, 20(6): 333-343. |
[10] | Priya Singh and Rajalakshmi Krishnamurthi. AgriGuard: IoT-Powered Real-Time Object Detection and Alert System for Intelligent Surveillance [J]. Int J Performability Eng, 2024, 20(4): 232-241. |
[11] | Ujjwal Deep, Sushant Kumar, and Kanika Singla. Integrating Deep Learning Architectures for Enhanced Human Action Recognition: An Ensemble Approach [J]. Int J Performability Eng, 2024, 20(4): 253-262. |
[12] | Atul Kumar and Gurpreet Singh Lehal. Layout Detection of Punjabi Newspapers using the YOLOv8 Model [J]. Int J Performability Eng, 2024, 20(3): 186-193. |
[13] | Shou-Yu Lee, Yu-Sheng Chu, Tzu-Wei Hsu, I-Hsiang Yu, and W. Eric Wong. Enhanced Recognition Approach for Herb Medicine using YOLOv8 in Medical Information Systems [J]. Int J Performability Eng, 2024, 20(12): 713-722. |
[14] | Meenakshi Chawla and Meenakshi Pareek. Hybridizing Intelligence: A Comparative Study of Machine Learning Algorithm and ANN-PSO Deep Learning Model for Software Effort Estimation [J]. Int J Performability Eng, 2024, 20(11): 668-675. |
[15] | Rohit Chandra Joshi, Aayush Juyal, Abhijeet Mishra, Avni Verma, and Kanika Singla. Deep Learning-Based Face Emotion Recognition: A Comparative Study [J]. Int J Performability Eng, 2024, 20(1): 1-9. |
|