Installation - Qwen3-VL

Prerequisites

Before installing Qwen3-VL, ensure you have:

Python

Python 3.8 or higher

CUDA

NVIDIA GPU with CUDA support (recommended)

PyTorch

PyTorch 2.0 or higher

pip

pip package manager

Basic Installation

Install Transformers

Qwen3-VL requires transformers version 4.57.0 or higher:

pip install "transformers>=4.57.0"

This is the minimum requirement to run Qwen3-VL with Hugging Face Transformers.

Install qwen-vl-utils (Optional but Recommended)

For advanced vision processing capabilities:

pip install qwen-vl-utils==0.0.14

For faster video loading, install with the decord feature:

pip install "qwen-vl-utils[decord]"

Verify Installation

Test your installation:

from transformers import AutoModelForImageTextToText, AutoProcessor

# This should run without errors
processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-2B-Instruct")
print("Installation successful!")

Performance Optimizations

Flash Attention 2

For significantly faster inference, especially with multi-image and video scenarios:

pip install -U flash-attn --no-build-isolation

Flash Attention 2 requires:

Compatible NVIDIA GPU (Ampere or newer)
CUDA 11.6 or higher
Models loaded in torch.float16 or torch.bfloat16

Usage example:

import torch
from transformers import AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained(
    "Qwen/Qwen3-VL-235B-A22B-Instruct",
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto",
)

Video Processing Backends

Qwen3-VL supports multiple video decoding backends:

torchcodec (Recommended)
decord
torchvision

Best performance and compatibility:

# Follow official installation instructions
# https://github.com/pytorch/torchcodec

Supports both HTTP and HTTPS URLs. Requires FFmpeg.

Fast decoding for Linux users:

pip install "qwen-vl-utils[decord]"

Only supports HTTP URLs (not HTTPS)
May have decoding issues on some videos
Project is no longer actively maintained

Default fallback option:

# Already included with transformers

torchvision >= 0.19.0 supports HTTP and HTTPS
Older versions have no URL support
Slower than decord and torchcodec

Switch backends by setting an environment variable:

export FORCE_QWENVL_VIDEO_READER=torchcodec  # or decord, torchvision

Deployment Installation

For production deployment with vLLM or SGLang:

# Install accelerate and qwen-vl-utils
pip install accelerate
pip install qwen-vl-utils==0.0.14

# Install latest vLLM (>= 0.11.0)
uv pip install -U vllm

For detailed deployment instructions, see the Deployment Guide.

China Mainland Users

For users in mainland China, we recommend using ModelScope:

from modelscope import snapshot_download

# Download model checkpoint
model_dir = snapshot_download('qwen/Qwen3-VL-8B-Instruct')

# Load from local directory
from transformers import AutoModelForImageTextToText
model = AutoModelForImageTextToText.from_pretrained(
    model_dir,
    dtype="auto",
    device_map="auto"
)

Docker Installation

Use our pre-built Docker images for a simplified setup:

docker run --gpus all --ipc=host --network=host --rm --name qwen3vl \
  -it qwenllm/qwenvl:qwen3vl-cu128 bash

The Docker image includes:

Pre-configured environment
All dependencies
CUDA 12.8 support

You only need to install GPU drivers on the host machine.

Installation Verification

Run this complete test to verify your installation:

from transformers import AutoModelForImageTextToText, AutoProcessor
import torch

print("Testing Qwen3-VL installation...")

# Check transformers version
import transformers
print(f"Transformers version: {transformers.__version__}")
assert transformers.__version__ >= "4.57.0", "Please upgrade transformers"

# Check CUDA availability
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU count: {torch.cuda.device_count()}")

# Try loading processor
try:
    processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-2B-Instruct")
    print("✓ Processor loaded successfully")
except Exception as e:
    print(f"✗ Error loading processor: {e}")

# Check qwen-vl-utils
try:
    import qwen_vl_utils
    print(f"✓ qwen-vl-utils version: {qwen_vl_utils.__version__}")
except ImportError:
    print("○ qwen-vl-utils not installed (optional)")

# Check flash-attn
try:
    import flash_attn
    print("✓ Flash Attention 2 available")
except ImportError:
    print("○ Flash Attention 2 not installed (optional)")

print("\nInstallation verification complete!")

Troubleshooting

ModuleNotFoundError: No module named 'transformers'

Make sure you’ve installed transformers:

pip install "transformers>=4.57.0"

CUDA out of memory

Try these solutions:

Use a smaller model (e.g., 2B or 4B instead of 235B)
Enable quantization with FP8 models
Use device_map="auto" for automatic device placement
Reduce batch size or max_new_tokens

# Use FP8 quantized model
model = AutoModelForImageTextToText.from_pretrained(
    "Qwen/Qwen3-VL-8B-Instruct-FP8",
    dtype="auto",
    device_map="auto"
)

Flash Attention installation fails

Flash Attention 2 requires:

NVIDIA GPU with Ampere architecture or newer
CUDA 11.6+
Proper build tools

If installation fails, you can skip it and use default attention:

# Don't specify attn_implementation
model = AutoModelForImageTextToText.from_pretrained(
    "Qwen/Qwen3-VL-8B-Instruct",
    dtype="auto",
    device_map="auto"
)

Video decoding errors

If you encounter video processing issues:

Try a different backend:

export FORCE_QWENVL_VIDEO_READER=torchcodec

Ensure you have the required dependencies:
- torchcodec: Requires FFmpeg
- decord: Linux only, may need to build from source
- torchvision: Requires version >= 0.19.0 for URL support
Use local video files instead of URLs as a workaround

Next Steps

Now that you have Qwen3-VL installed, you can:

Quick Start

Run your first inference with an image

Advanced Usage

Learn about pixel control, batching, and optimization

Deployment

Deploy Qwen3-VL with vLLM or SGLang

Cookbooks

Explore practical examples and use cases

​Prerequisites

Python

CUDA

PyTorch

pip

​Basic Installation

​Performance Optimizations

​Flash Attention 2

​Video Processing Backends

​Deployment Installation

​China Mainland Users

​Docker Installation

​Installation Verification

​Troubleshooting

​Next Steps

Quick Start

Advanced Usage

Deployment

Cookbooks

Prerequisites

Basic Installation

Performance Optimizations

Flash Attention 2

Video Processing Backends

Deployment Installation

China Mainland Users

Docker Installation

Installation Verification

Troubleshooting

Next Steps