Document Parsing
Qwen3-VL features powerful document parsing capabilities that have reached a new level of sophistication. The model can extract not just text, but also layout position information and structured content in the Qwen HTML format.Capabilities
Qwen3-VL’s document parsing goes beyond simple text extraction:- Text Extraction: Accurate OCR across the entire document
- Layout Understanding: Preserves document structure and hierarchy
- Position Information: Tracks where elements appear in the document
- Qwen HTML Format: Structured output maintaining semantic relationships
- Multi-format Support: Works with PDFs, images, scanned documents, and more
Key Features
Enhanced OCR
- Support for 32 languages (expanded from 10)
- Robust performance in challenging conditions:
- Low light environments
- Blurred or tilted images
- Rare and ancient characters
- Technical jargon and specialized terminology
Structured Output
The Qwen HTML format preserves:- Document hierarchy (headings, sections, paragraphs)
- Tables and their structure
- Lists and enumerations
- Text formatting and emphasis
- Spatial relationships between elements
Use Cases
- Document Digitization: Convert physical documents to structured digital formats
- Information Extraction: Extract specific data from forms and reports
- Archive Processing: Digitize historical documents and records
- Legal & Compliance: Parse contracts, agreements, and regulatory documents
- Research: Extract structured data from academic papers and publications
Try It Out
Explore document parsing with our interactive cookbook:Document Parsing Cookbook
The parsing of documents has reached a higher level, including not only text but also layout position information and our Qwen HTML format.
Advanced Features
- Long Document Understanding: Native 256K context with expansion to 1M tokens
- Multi-page Processing: Handle entire books and lengthy reports
- Layout Preservation: Maintain visual hierarchy in output
- Table Recognition: Extract complex table structures accurately
Related Capabilities
- OCR - General text recognition and extraction
- Spatial Understanding - Understand document layout spatially
- 2D Grounding - Locate specific elements in documents