Prerequisites
Before installing Qwen3-VL, ensure you have:Python
Python 3.8 or higher
CUDA
NVIDIA GPU with CUDA support (recommended)
PyTorch
PyTorch 2.0 or higher
pip
pip package manager
Basic Installation
Install Transformers
Qwen3-VL requires transformers version 4.57.0 or higher:
This is the minimum requirement to run Qwen3-VL with Hugging Face Transformers.
Performance Optimizations
Flash Attention 2
For significantly faster inference, especially with multi-image and video scenarios:Video Processing Backends
Qwen3-VL supports multiple video decoding backends:- torchcodec (Recommended)
- decord
- torchvision
Best performance and compatibility:
Supports both HTTP and HTTPS URLs. Requires FFmpeg.
Deployment Installation
For production deployment with vLLM or SGLang:For detailed deployment instructions, see the Deployment Guide.
China Mainland Users
For users in mainland China, we recommend using ModelScope:Docker Installation
Use our pre-built Docker images for a simplified setup:Installation Verification
Run this complete test to verify your installation:Troubleshooting
ModuleNotFoundError: No module named 'transformers'
ModuleNotFoundError: No module named 'transformers'
Make sure you’ve installed transformers:
CUDA out of memory
CUDA out of memory
Try these solutions:
- Use a smaller model (e.g., 2B or 4B instead of 235B)
- Enable quantization with FP8 models
- Use
device_map="auto"for automatic device placement - Reduce batch size or max_new_tokens
Flash Attention installation fails
Flash Attention installation fails
Flash Attention 2 requires:
- NVIDIA GPU with Ampere architecture or newer
- CUDA 11.6+
- Proper build tools
Video decoding errors
Video decoding errors
If you encounter video processing issues:
-
Try a different backend:
-
Ensure you have the required dependencies:
- torchcodec: Requires FFmpeg
- decord: Linux only, may need to build from source
- torchvision: Requires version >= 0.19.0 for URL support
- Use local video files instead of URLs as a workaround
Next Steps
Now that you have Qwen3-VL installed, you can:Quick Start
Run your first inference with an image
Advanced Usage
Learn about pixel control, batching, and optimization
Deployment
Deploy Qwen3-VL with vLLM or SGLang
Cookbooks
Explore practical examples and use cases