Problems and Needs
Targeted Problem
Existing Methods: The PVT task assumes that the target is within the preset frame and the camera remains stationary.
Real-World Task: The target has high dynamic characteristics. Actively controlling the camera to improve visual accuracy is the main challenge.
VAT Task: The VAT task simultaneously models visual tracking and control, and has a wider range of application scenarios.

Reinforcement Learning
for Solving the VAT Task
Existing Methods: Directly connecting the VT model and the control model for VAT (siamask + PID)
Existing Limitations:
1. The delay in the VT model results in lag in control input, which can even cause the controller to diverge.
2. The control model is particu-larly sensitive to parameters, requiring fine-tuning for each scenario.
Advantages of RL: The use of MDP allows simultaneous modeling of vision and control, solving the problem with a single model.
Simulation Environment: Trial-and-error in real-world RL environments and data collection are too expensive, necessitating the construction of simulation environments.
Complex and Diverse
Simulation Environments
Realism: Modeling the diversity of real-world environments while discarding variables that are unattainable in real scenarios.
Diversity: Multiple scenarios, various trackers and targets, multiple sensors.
Unattainable Variables: The position, velocity, and acceleration of the target relative to the tracker.
