Problems and Needs

Targeted Problem

Existing Methods: The PVT task assumes that the target is within the preset frame and the camera remains stationary.

Real-World Task: The target has high dynamic characteristics. Actively controlling the camera to improve visual accuracy is the main challenge.

VAT Task: The VAT task simultaneously models visual tracking and control, and has a wider range of application scenarios.

Reinforcement Learning
for Solving the VAT Task

Existing Methods: Directly connecting the VT model and the control model for VAT (siamask + PID)

Existing Limitations:
1. The delay in the VT model results in lag in control input, which can even cause the controller to diverge.
2. The control model is particu-larly sensitive to parameters, requiring fine-tuning for each scenario.

Advantages of RL: The use of MDP allows simultaneous modeling of vision and control, solving the problem with a single model.

Simulation Environment: Trial-and-error in real-world RL environments and data collection are too expensive, necessitating the construction of simulation environments.


Complex and Diverse
Simulation Environments

Realism: Modeling the diversity of real-world environments while discarding variables that are unattainable in real scenarios.


Diversity: Multiple scenarios, various trackers and targets, multiple sensors.

Unattainable Variables: The position, velocity, and acceleration of the target relative to the tracker.