PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing

Shih-Fang Chen¹, Jun-Cheng Chen², I-Hong Jhuo³, Yen-Yu Lin¹

¹Department of Computer Science, National Yang Ming Chiao Tung University
²Academia Sinica, ³Microsoft AI
ICLR 2026

Paper OpenReview GitHub

GOT-Edit facilitates the understanding of 3D geometry to aid generic object tracking from 2D streaming inputs. It predicts semantic and geometric model weights concurrently to incrementally adapt the tracking model. Through online model editing, it ensures geometry-aware, semantic-preserving updates to the tracking model.

Comparison with state-of-the-art methods.

Abstract

Human perception for effective object tracking in a 2D video stream arises from the implicit use of prior 3D knowledge combined with semantic reasoning. In contrast, most generic object tracking (GOT) methods primarily rely on 2D features of the target and its surroundings while neglecting 3D geometric cues, which makes them susceptible to partial occlusion, distractors, and variations in geometry and appearance. To address this limitation, we introduce GOT-Edit, an online cross-modality model editing approach that integrates geometry-aware cues into a generic object tracker from a 2D video stream. Our approach leverages features from a pre-trained Visual Geometry Grounded Transformer to enable geometric cue inference from only a few 2D images. To tackle the challenge of seamlessly combining geometry and semantics, GOT-Edit performs online model editing with null-space constrained updates that incorporate geometric information while preserving semantic discrimination, yielding consistently better performance across diverse scenarios. Extensive experiments on multiple GOT benchmarks demonstrate that GOT-Edit achieves superior robustness and accuracy, particularly under occlusion and clutter, establishing a new paradigm for combining 2D semantics with 3D geometric reasoning for generic object tracking.

Given only an initial annotated bounding box for the target object, GOT-Edit is capable of continually tracking the object in dynamic scenes, even when the object is unseen or under adverse conditions.

Paper

BibTeX


      @inproceedings{
      got_edit_iclr26,
      title={{GOT}-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing},
      author={Shih-Fang Chen and Jun-Cheng Chen and I-hong Jhuo and Yen-Yu Lin},
      booktitle={The Fourteenth International Conference on Learning Representations},
      year={2026}
      }

More Research

Improving visual object tracking through visual prompting

GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture

GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing

Comparison with state-of-the-art methods.

Abstract

Given only an initial annotated bounding box for the target object, GOT-Edit is capable of continually tracking the object in dynamic scenes, even when the object is unseen or under adverse conditions.

Given only an initial annotated bounding box for the target object, GOT-Edit is capable of continually tracking the object in dynamic scenes, even when the object is unseen or under adverse conditions.

Paper

BibTeX