Run Qwen2 VL 72B Instruct on your data

Qwen2 VL 72B Instruct is a Multimodal LLM that excels in understanding and processing visual information from images and videos, as well as generating text based on visual inputs. It is particularly adept at handling complex visual tasks, including long video comprehension, high-resolution image analysis, and device operation based on visual cues and text instructions.

Some other noteworthy features of Qwen2 VL 72B Instruct include multilingual support for text understanding in various languages and the ability to process videos up to 20 minutes long.

Metric	Value
Parameter Count	72 billion
Mixture of Experts	No
Context Length	Unknown
Multilingual	Yes
Quantized*	No

*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.

	Modality	Price (1M tokens)
Qwen 2.5 Coder 32B Instruct	Fireworks AI	text	text	$0.90	$0.90
Qwen 2.5 Coder 32B Instruct	Together.ai	text	text	$0.80	$0.80
Qwen 3 235B-A22B	Fireworks AI	text	text	$0.10	$0.10
Qwen 3 30B-A3B	Fireworks AI	text	text	$0.90	$0.90
Qwen2 VL 72B Instruct	Fireworks AI	image	text	$0.90	$0.90
Qwen2.5 1.5B Instruct	Bytez	text	text	N/A	N/A
Qwen2.5 72B Instruct	Fireworks AI	text	text	$0.90	$0.90
QwQ 32B	Fireworks AI	text	text	$0.90	$0.90
QwQ 32B	Together.ai	text	text	$1.20	$1.20
QwQ 32B Preview	Together.ai	text	text	$1.20	$1.20
QwQ 32B Preview	Fireworks AI	text	text	$0.90	$0.90

Modality

Price (1M tokens)

Model

Inference provider

Input

Output

Input

Output

Qwen 2.5 Coder 32B Instruct

Fireworks AI