Models/Qwen2 VL 72B Instruct
QwenQwen / Qwen2 VL 72B Instruct
Released: 9/19/2024
imagetext
Input: $0.90 / Output: $0.90

Qwen2 VL 72B Instruct is a Multimodal LLM that excels in understanding and processing visual information from images and videos, as well as generating text based on visual inputs. It is particularly adept at handling complex visual tasks, including long video comprehension, high-resolution image analysis, and device operation based on visual cues and text instructions.

Some other noteworthy features of Qwen2 VL 72B Instruct include multilingual support for text understanding in various languages and the ability to process videos up to 20 minutes long.

MetricValue
Parameter Count72 billion
Mixture of ExpertsNo
Context LengthUnknown
MultilingualYes
Quantized*No

*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.

Qwen models available on Oxen.ai
ModalityPrice (1M tokens)
ModelInference providerInputOutputInputOutput
Fireworks AI
texttext$0.90$0.90
Together.ai
texttext$0.80$0.80
Fireworks AI
imagetext$0.90$0.90
Fireworks AI
texttext$0.90$0.90
Together.ai
texttext$1.20$1.20
Fireworks AI
texttext$0.90$0.90
See all models available on Oxen.ai