Skip to main content

Document Parsing

Qwen3-VL features powerful document parsing capabilities that have reached a new level of sophistication. The model can extract not just text, but also layout position information and structured content in the Qwen HTML format.

Capabilities

Qwen3-VL’s document parsing goes beyond simple text extraction:
  • Text Extraction: Accurate OCR across the entire document
  • Layout Understanding: Preserves document structure and hierarchy
  • Position Information: Tracks where elements appear in the document
  • Qwen HTML Format: Structured output maintaining semantic relationships
  • Multi-format Support: Works with PDFs, images, scanned documents, and more

Key Features

Enhanced OCR

  • Support for 32 languages (expanded from 10)
  • Robust performance in challenging conditions:
    • Low light environments
    • Blurred or tilted images
    • Rare and ancient characters
    • Technical jargon and specialized terminology

Structured Output

The Qwen HTML format preserves:
  • Document hierarchy (headings, sections, paragraphs)
  • Tables and their structure
  • Lists and enumerations
  • Text formatting and emphasis
  • Spatial relationships between elements

Use Cases

  • Document Digitization: Convert physical documents to structured digital formats
  • Information Extraction: Extract specific data from forms and reports
  • Archive Processing: Digitize historical documents and records
  • Legal & Compliance: Parse contracts, agreements, and regulatory documents
  • Research: Extract structured data from academic papers and publications

Try It Out

Explore document parsing with our interactive cookbook:

Document Parsing Cookbook

The parsing of documents has reached a higher level, including not only text but also layout position information and our Qwen HTML format.
Open In Colab

Advanced Features

  • Long Document Understanding: Native 256K context with expansion to 1M tokens
  • Multi-page Processing: Handle entire books and lengthy reports
  • Layout Preservation: Maintain visual hierarchy in output
  • Table Recognition: Extract complex table structures accurately