Lightweight Open-Vocabulary Object Detection: A Systematic Taxonomy, Comparative Evaluation, and Future Outlook

Authors

  • Zainal Rasyid Mahayuddin Center for Artiffcial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia, 43600 Bangi Selangor, Malaysia
  • Youlin Liu Center for Artiffcial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia, 43600 Bangi Selangor, Malaysia
  • Mohammad Faidzul Nasrudin Center for Artiffcial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia, 43600 Bangi Selangor, Malaysia

Keywords:

Open-Vocabulary Object Detection, lightweight deployment, knowledge distillation, Pseudo-label learning, lightweight architecture

Abstract

Open Vocabulary Object Detection (OVD) aims to break through the limitations of traditional detectors. Traditional detectors can only rely on a fixed set of categories. Although vision-language models like CLIP offer zero-shot recognition capabilities, there are still many problems in transferring their global semantics to regional-level detection. These problems include inaccurate spatial positioning, semantic bias and high computational overhead, etc., which greatly affect the actual deployment. This review, with the core perspective of "lightweight and deployable", systematically organized 46 OVD studies and classified the existing methods into three categories: false label learning, knowledge distillation, and architecture optimization. These three types of methods were compared under a unified evaluation setting, and the trade-offs among annotation cost, model compression and inference speed were analyzed. Afterwards, we summarized the key challenges that the OVD field still face, including semantic bias, weak ability to detect small targets, and insufficient cross-domain generalization. Finally, we discussed several new trends like dynamic prompts of Large Language Models (LLMs), adaptive distillation, and collaboration with Segment Anything Model (SAM), etc. It is hoped that this can provide a clear reference framework and research direction for building scalable and resource-friendly OVD systems.

Author Biographies

Zainal Rasyid Mahayuddin, Center for Artiffcial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia, 43600 Bangi Selangor, Malaysia

zainalr@ukm.edu.my

Youlin Liu, Center for Artiffcial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia, 43600 Bangi Selangor, Malaysia

liuyoulin_1020@163.com

Mohammad Faidzul Nasrudin, Center for Artiffcial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia, 43600 Bangi Selangor, Malaysia

mfn@ukm.edu.my

Downloads

Published

2025-12-05

Issue

Section

Articles