Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
997e05df-c1e3-41b8-8f1f-c9ac7e85b0bd
997e05df-c1e3-41b8-8f1f-c9ac7e85b0bd 5 row sample completed

Bessie
4 days ago
Prompt: Tell me the color of the shirt in the image
{image}
image → text
Groq/Llama 3.2 11B Vision

Source:
main
9d5f5f3a-a300-4297-b91b-5f548ccc6614
9d5f5f3a-a300-4297-b91b-5f548ccc6614 5 row sample completed

Bessie
5 days ago
Prompt: {image}
Write a query a customer might write when looking for this product.
image → text
OpenAI/GPT 4o
Source:
main