Document Parsing

Qwen3-VL features powerful document parsing capabilities that have reached a new level of sophistication. The model can extract not just text, but also layout position information and structured content in the Qwen HTML format.

Capabilities

Qwen3-VL’s document parsing goes beyond simple text extraction:

Text Extraction: Accurate OCR across the entire document
Layout Understanding: Preserves document structure and hierarchy
Position Information: Tracks where elements appear in the document
Qwen HTML Format: Structured output maintaining semantic relationships
Multi-format Support: Works with PDFs, images, scanned documents, and more

Key Features

Enhanced OCR

Support for 32 languages (expanded from 10)
Robust performance in challenging conditions:
- Low light environments
- Blurred or tilted images
- Rare and ancient characters
- Technical jargon and specialized terminology

Structured Output

The Qwen HTML format preserves:

Document hierarchy (headings, sections, paragraphs)
Tables and their structure
Lists and enumerations
Text formatting and emphasis
Spatial relationships between elements

Use Cases

Document Digitization: Convert physical documents to structured digital formats
Information Extraction: Extract specific data from forms and reports
Archive Processing: Digitize historical documents and records
Legal & Compliance: Parse contracts, agreements, and regulatory documents
Research: Extract structured data from academic papers and publications

Try It Out

Explore document parsing with our interactive cookbook:

Document Parsing Cookbook

The parsing of documents has reached a higher level, including not only text but also layout position information and our Qwen HTML format.

Advanced Features

Long Document Understanding: Native 256K context with expansion to 1M tokens
Multi-page Processing: Handle entire books and lengthy reports
Layout Preservation: Maintain visual hierarchy in output
Table Recognition: Extract complex table structures accurately

OCR - General text recognition and extraction
Spatial Understanding - Understand document layout spatially
2D Grounding - Locate specific elements in documents

Omni Recognition

OCR & Key Information Extraction

​Document Parsing

​Capabilities

​Key Features

​Enhanced OCR

​Structured Output

​Use Cases

​Try It Out