Leveraging the Convolutional Neural Network (CNN) based on Deep Learning to Classify and Caption Images

Dhruv Khera


Vol 17, Jan-Jun, 2023

Date of Submission: 2023-01-02 Date of Acceptance: 2023-02-04 Date of Publication: 2023-03-09


The development of a deep learning-based image captioning system is the primary focus of this paper. In order for machines to comprehend and communicate the content of visual data, the aim of this paper is to generate descriptive textual captions for images. Convolutional neural networks (CNNs) for image feature extraction and recurrent neural networks (RNNs) for sequential language generation are utilized in the approach. Dataset collection, data preprocessing, CNN feature extraction, RNN-based captioning model implementation, model evaluation with metrics like BLEU score and METEOR, and results presentation are all included in the paper. An accessible image captioning system, extensive documentation, and a codebase that is well-documented are among the expected deliverables. Students learn about deep learning, computer vision, and natural language processing through this paper, which contributes to advancements in image comprehension and human-machine interaction with visual data The UK best omega replica watches online with Swiss movements are worth having!
For more detailed information about best quality audemars piguet fake watches UK, you can browse this website.


  1. Steven J Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel.2016“Self-Critical Sequence Training for Image Captioning”.
  2. A. Karpathy and L. Fei-Fei, Deep visual-semantic generating image descriptions. In CVPR, 2015.
  3. Jonathan Krause, Justin Johnson, Ranjay Krishna and Fei-Fei, 2016, “A Hierarchal Approach for generating descriptive neural networks”
  4. Xiaodan Liang, Zhiting Hu, Hao Zhang, Chuang Gan, and Eric P Xing. 2017. ―Recurrent topic-transition for visual paragraph generation.
  5. Mert Kilickaya, Aykut Erdem, Nazli Ikizler-Cinbis, and Erkut Erdem. 2016. Re-evaluating automatic metrics for image captioning.
  6. Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2017.ǁ Bottom-up and top-down attention for image captioning and vqaǁ.
  7. J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrel; Long-term recurrent convolutional networks for and description. In CVPR, 2015.
  8. Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2014. Show and tell: A neural im-age caption generator.
  9. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc VLe, Mohammad Norouzi, Wolfgang Machere y, Maxim Krikun, Yuan Cao, Qin Ga0, Klaus Macherrey, et at.2016. Google’s neural machine translation system: “Bridging the gap between human and machine translation”.
  10. A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C.Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: ―Generating sentences from images.
Download PDF