📞 +91-7667918914 | ✉️ iarjset@gmail.com
International Advanced Research Journal in Science, Engineering and Technology
International Advanced Research Journal in Science, Engineering and Technology A Monthly Peer-Reviewed Multidisciplinary Journal
ISSN Online 2393-8021ISSN Print 2394-1588Since 2014
IARJSET aligns to the suggestive parameters by the latest University Grants Commission (UGC) for peer-reviewed journals, committed to promoting research excellence, ethical publishing practices, and a global scholarly impact.
← Back to VOLUME 13, ISSUE 4, APRIL 2026

AI-BASED LARGE-SCALE IMAGE RETRIEVAL SYSTEM USING CLIP EMBEDDINGS AND COSINE SIMILARITY

Nandha M, Dr. C. Karpagavalli, Dr. M. Kaliappan, Dr. E. Mariappan

👁 2 views📥 0 downloads
Share: 𝕏 f in

Abstract: The exponential growth of digital image repositories across enterprise systems and the internet demands intelligent, scalable retrieval mechanisms capable of operating with high accuracy and efficiency. This paper presents a comprehensive AI-based large-scale image retrieval system that leverages the Contrastive Language-Image Pretraining (CLIP) model, specifically its Vision Transformer ViT-B/32 backbone, to extract rich 512-dimensional visual embeddings from images. The proposed system executes image indexing offline through batch processing, stores L2-normalized feature vectors, and performs real-time cosine similarity computation at query time to retrieve the top-K most visually similar images. Additionally, a Support Vector Machine (SVM) classifier trained on CLIP embeddings achieves 98.76% accuracy with a macro-average F1-score of 0.9804 across 27 image categories. The system is deployed as a responsive web application using the Flask framework, enabling end-users to perform real-time image-based searches through a browser interface. Comparative evaluation demonstrates that the proposed approach substantially outperforms all baseline methods including Dummy classifiers and Logistic Regression. The results confirm that deep visual embeddings derived from large-scale multimodal pretraining are highly effective for content-based image retrieval at scale. Keywords --- CLIP Embeddings, Content-Based Image Retrieval, Cosine Similarity, Vision Transformer, SVM Classification, Flask Deployment, Deep Visual Features, ViT-B/32, L2 Normalization.

How to Cite:

[1] Nandha M, Dr. C. Karpagavalli, Dr. M. Kaliappan, Dr. E. Mariappan, “AI-BASED LARGE-SCALE IMAGE RETRIEVAL SYSTEM USING CLIP EMBEDDINGS AND COSINE SIMILARITY,” International Advanced Research Journal in Science, Engineering and Technology (IARJSET), DOI: 10.17148/IARJSET.2026.13462

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.