Abstract:Siamese tracking methods have recently drawn extensive attention due to their balanced accuracy and efficiency. However, most Siamese-based trackers use shallow backbone network, in which extracting high-level semantic features is difficult. When the appearance of distractors and targets is particularly similar, these methods may lead to tracking drift or even failure. Considering this deficiency, we propose a Siamese network with enriched semantics, named ESDT. First, a semantic enrichment module (SEM) comprising dilated convolution layers is designed to improve the classification capability of the siamese tracker. In addition, the target template is updated adaptively to cope with the target texture information changes caused by illumination and blur and further promote the tracking performance. Finally, exhaustive experimental analysis on the public datasets shows that the proposed algorithm outperforms several state-of-the-art algorithms and could track the target stably despite disturbances.