Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
909a4a43-4539-4eb4-b4c6-9ef0aad49e0d
MetaMeta/Llama 3.1 8B Instruct Turbotext → text
Bessie
ox
2 weeks ago
Translate the query to french

{query}
completed 5 row sample315 tokens$ 0.0001 1 iteration
e29d680b-0beb-4ea9-8d1f-a776187f6a01
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
3 weeks ago
Translate to french {query}
completed 5 row sample136 tokens$ 0.0000 1 iteration
c1f9c7b2-addb-4ec8-b9bb-7de293dfebab
OpenAIOpenAI/GPT 4oimage → text
Bessie
ox
3 weeks ago
Give me the sleeve length of this dress of this dress in a json object with a single key called "sleeve_length"

{image}
completed 5 row sample4078 tokens$ 0.0109 2 iterations
672b09ba-3923-4b45-91a3-79392604c739
OpenAIOpenAI/GPT 4o miniimage → text
Bessie
ox
1 month ago
Describe the piece of clothing as if you are writing for product catalogue. Limit the description to a single paragraph.

{image}
completed 5 row sample4447 tokens$ 0.0009 4 iterations
997e05df-c1e3-41b8-8f1f-c9ac7e85b0bd
MetaMeta/Llama 3.2 11B Visionimage → text
Bessie
ox
2 months ago
Tell me the color of the shirt in the image

{image}
completed 5 row sample310 tokens$ 0.0001 6 iterations
9d5f5f3a-a300-4297-b91b-5f548ccc6614
OpenAIOpenAI/GPT 4oimage → text
Bessie
ox
2 months ago
{image}

Write a query a customer might write when looking for this product.
completed 5 row sample3989 tokens$ 0.0105 1 iteration