model.generate()
Generate text completions from image/video inputs and text prompts.Parameters
Model inputs dictionary containing tokenized text and preprocessed images/videos.Typically obtained from
processor.apply_chat_template() and moved to model device:Maximum number of new tokens to generate.
Sampling temperature. Higher values increase randomness.
0.1-0.5: More focused, deterministic outputs0.7: Balanced creativity (default for Instruct models)0.6: Recommended for Thinking models1.0+: More creative, diverse outputs
Nucleus sampling threshold. Only tokens with cumulative probability up to
top_p are considered.- Instruct models:
0.8(default) - Thinking models:
0.95(recommended)
Top-k sampling. Only the top k most likely tokens are considered.
20: Recommended value-1: Disabled (consider all tokens)
Penalty for repeating tokens. Values > 1.0 discourage repetition.
1.0: No penalty (default)1.1-1.5: Light to moderate penalty
Penalty for token presence regardless of frequency.
- Instruct models:
1.5(recommended) - Thinking models:
0.0(recommended)
Whether to use sampling. If
False, uses greedy decoding.Random seed for reproducible generation.
- Instruct models:
3407(official eval) - Thinking models:
1234(official eval)
Returns
Tensor of token IDs including both input and generated tokens.Shape:
(batch_size, total_sequence_length)To extract only the newly generated tokens:Decoding Output
Convert generated token IDs to text using the processor:Complete Example
Recommended Hyperparameters
Instruct Models
Thinking Models
Notes
For batch generation, always set
processor.tokenizer.padding_side = 'left' and include padding=True in apply_chat_template().