Abstract:
Underwater visual object tracking is a core technology for scene understanding in autonomous undersea vehicle(AUV) systems. However, challenges such as uneven illumination, background interference, and target appearance variation in complex underwater environments severely affect the accuracy and stability of traditional visual tracking methods. Existing approaches primarily rely on the appearance modeling of the target, making them unreliable in complex environments, particularly when similar targets are present, leading to misidentification and tracking drift. This paper proposed an underwater single-object tracking method based on scene perception that utilized a regional segmentation-based graph convolution module to extract all target regions in the scene. By leveraging a graph convolutional network, the proposed method captured long-range dependencies between the target region and surrounding key regions, significantly enhancing the discrimination capability against similar targets. Additionally, a dual-view graph contrastive learning strategy was introduced, which enabled unsupervised online updates for the graph convolution module by generating randomly perturbed target feature views, ensuring strong adaptability and stability of the model in complex environments. Experiments show that the proposed method is significantly better than the classical method in terms of tracking accuracy and robustness, especially in scenes with large lighting changes, complex backgrounds, and strong interference of similar targets, and the success rate and accuracy are significantly improved. These results indicate that the proposed method effectively addresses target drift challenges in underwater object tracking caused by illumination variations and background interference, maintaining stable tracking even in the presence of similar targets, thus providing an efficient and reliable tracking solution for underwater unmanned systems.