Download
Abstract
Data augmentation is vital for object detection tasks that require expensive bounding box annotations. Recent successes in diffusion models have inspired the use of diffusion-based synthetic images for data augmentation. However, existing works have primarily focused on image classification, and their applicability to boost object detection’s performance remains unclear. To address this gap, we propose a data augmentation pipeline based on controllable diffusion models and CLIP. Our approach involves generating appropriate visual priors to control the generation of synthetic data and implementing post-filtering techniques using category-calibrated CLIP scores. The evaluation of our approach is conducted under few-shot settings in MSCOCO, full PASCAL VOC dataset, and selected downstream datasets. We observe the performance increase using our augmentation pipeline. Specifically, the mAP improvement is+ 18.0%/+ 15.6%/+ 15.9% for COCO 5/10/30-shot,+ 2.9% on full PASCAL VOC dataset, and+ 12.4% on average for selected downstream datasets.
Figure X: Figure caption
Citation
@InProceedings{Fang_2024_WACV,
author = {Fang, Haoyang and Han, Boran and Zhang, Shuai and Zhou, Su and Hu, Cuixiong and Ye, Wen-Ming},
title = {Data Augmentation for Object Detection via Controllable Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
month = {January},
year = {2024},
pages = {1257-1266}
}