Maritime Object Detection Method Based on Self-Supervised Representation Learning
-
摘要: 为提升海上无人装备对海洋的感知与监测能力, 海面目标检测准确度的提升至关重要。但受复杂海况影响和传感器限制, 采集高质量海面目标样本困难, 导致大规模海面目标数据集缺乏, 使得基于深度学习的海面目标检测发展缓慢。为此, 文中将自监督表征学习引入海面目标检测领域, 利用动量对比自监督表征学习算法进行船舶特征学习, 从大规模无标签海面目标数据中挖掘船舶目标特征, 为后续进行基于更快的区域卷积神经网络的海面目标检测提供先验知识。实验结果表明, 借助于大规模无标签数据集, 文中提出的基于自监督表征学习的海面目标检测方法能够取得与有监督预训练方法相当的检测效果, 突破了有标注海面目标样本不足的限制。文中工作可为进一步研究基于深度学习的海洋智能感知问题提供参考。Abstract: To improve the perception and monitoring ability of marine unmanned equipment, boosting the performance of maritime object detection is critical. However, complex sea environments and limited sensors make it difficult to collect high-quality samples for a large-scale maritime dataset. This results in a dearth of large-scale sea surface target datasets, which in turn hampers the development of maritime object detection based on deep earning. To address this problem, this study introduces self-supervised representation learning into the field of maritime object detection. Specifically, a momentum-contrast based algorithm is proposed to conduct representation learning of ships, where the characteristics of ship targets are learned from large-scale unlabeled maritime data. This provides prior knowledge for subsequent maritime object detection based on Faster R-CNN. Experimental results show that with the aid of model pre-training on a large-scale unlabeled dataset in a self-supervised manner, the proposed maritime object detection method through self-supervised representation learning has a performance comparable with those that employ supervised model pre-training. The proposed method can thus overcome the limitations caused by an inadequate number of labeled maritime samples.
-
[1] Lin T Y, Maire M, Belongie S, et al. Microsoft Coco: Common Objects in Context[C]//European Conference on Computer Vision. Zurich: ETH, 2014: 740-755. [2] Everingham M, Van G L, Williams C K I, et al. The Pascal Visual Object Classes(VOC) Challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338. [3] Redmon J, Divvala S, Girshick R, et al. You Only Look Once: Unified, Real-time Object Detection[C]//Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 779-788. [4] Liu W, Anguelov D, Erhan D, et al. Ssd: Single Shot Multibox Detector[C]//European Conference on Computer Vision. Amsterdam: Springer, Cham, 2016: 21-37. [5] Lin T Y, Goyal P, Girshick R, et al. Focal Loss for Dense Object Detection[C]//Proceedings of The IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2980-2988. [6] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[C]//Advances in Neural Information Processing Systems. Montreal. Montreal: NIPS, 2015: 91-99. [7] He K, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of The IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2961-2969. [8] Shin H C, Lee K I, Lee C E. Data Augmentation Method of Object Detection for Deep Learning in Maritime Image[C]//2020 IEEE International Conference on Big Data and Smart Computing(BigComp). Busan: IEEE, 2020: 463-466. [9] Moosbauer S, Konig D, Jakel J, et al. A Benchmark for Deep Learning Based Object Detection in Maritime Environments[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Long Beach: IEEE, 2019: 916-925. [10] Devlin J, Chang M W, Lee K, et al. Bert: Pretraining of Deep Bidirectional Transformers for Language Understanding[EB/OL]. ArXiv, (2019-05-25)[2020-09-07]. https://arxiv.org/abs/1810.04805?context=cs. [11] Wu J, Wang X, Wang W Y. Self-supervised Dialogue Learning[EB/OL]. ArXiv, (2019-06-30)[2020-09-07]. https: //arxiv.org/abs/1907.00448. [12] Song K, Zhang W, Lu W, et al. Visual Object Tracking Via Guessing and Matching[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(11): 4182-4191. [13] Li P, Chen B, Ouyang W, et al. Gradnet: Gradient-guided Network for Visual Object Tracking[C]//Proceedings of the IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 6162-6171. [14] Lan X, Zhang W, Zhang S, et al. Robust Multi-modality Anchor Graph-based Label Prediction for RGB-infrared Tracking[J]. IEEE Transactions on Industrial Informatics, 2019. DOI: 10.1109/TII.2019.2947293. [15] Kingma D P, Welling M. Auto-encoding Variational Bayes[EB/OL]. ArXiv, (2014-05-01)[2020-09-07]. https:// arxiv.org/abs/1312.6114. [16] Burda Y, Grosse R, Salakhutdinov R. Importance Weighted Autoencoders[EB/OL]. ArXiv, (2015-11-07)[2020-09-07].https://www.arxiv-vanity.com/papers/1509.00519/. [17] Maal? L, Fraccaro M, Liévin V, et al. Biva: A Very Deep Hierarchy of Latent Variables for Generative Modeling[C]//Advances in Neural Information Processing Systems. Vancouver: NIPS, 2019: 6551-6562. [18] He K, Fan H, Wu Y, et al. Momentum Contrast for UnsuperVised Visual Representation Learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Virtual: IEEE, 2020: 9729-9738. [19] Chen T, Kornblith S, Norouzi M, et al. A Simple Framework for Contrastive Learning of Visual Representations[EB/OL]. ArXiv, (2020-07-01)[2020-09-07]. https:// arxiv.org/abs/2002.05709. [20] Deng J, Dong W, Socher R, et al. Imagenet: A Large-scale Hierarchical Image Database[C]//2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 248-255. [21] Gundogdu E, Solmaz B, Yücesoy V, et al. MARVEL: A Large-scale Image Dataset for Maritime Vessels[C]//Asian Conference on Computer Vision. Taipei: AFCV, 2016: 165-180. [22] Prasad D K, Rajan D, Rachmawati L, et al. Video Processing from Electro-optical Sensors for Object Detection and Tracking in a Maritime Environment: a Survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(8): 1993-2016. [23] Zhang Y, Li Q Z, Zang F N. Ship Detection for Visual Maritime Surveillance from Non-stationary Platforms[J]. Ocean Engineering, 2017, 141: 53-63. [24] Lin T Y, Dollár P, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]//Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2117-2125.
点击查看大图
计量
- 文章访问数: 413
- HTML全文浏览量: 26
- PDF下载量: 342
- 被引次数: 0