Video super-resolution (VSR) aims to restore high-resolution videos from low-resolution inputs, enhancing visual quality under real-world degradation. Existing diffusion-based VSR methods often rely on specially designed network architectures or text prompts as conditional inputs, which limits their flexibility and applicability—especially in scenarios where explicit text descriptions are unavailable.