Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
f595a1d1-0d98-4fb4-a9cb-90d1f357d346
OpenAIOpenAI/GPT 4o miniimage → text
elau
4 days ago
{image}
completed 5 row sample54871 tokens$ 0.0087 1 iteration
11cb8827-6df6-49c9-b4d9-33454822a8f6
PerplexityPerplexity/Perplexity Sonar Deep Researchtext → text
elau
3 weeks ago
Let's say I wait Graham's number of Planck times and then go back in time one googolplex ages of the universe for every atom in the observable universe from that point in time. What time will I have traveled to?
completed 5 row sample0 tokens$ 0.0000 12 iterations
92fe8f86-d478-4367-bc08-6643d187a4f5
GoogleGoogle/Gemma 3 27Btext → text
elau
1 month ago
Q: {question}
A: {answer}
Task: Give a valid equation for which the answer is correct for the given question.
completed 5 row sample639 tokens$ 0.0000 4 iterations
f9052ca9-20fc-4d90-a6b2-89419e4e2328
OpenAIOpenAI/GPT 4o minitext → text
elau
1 month ago
test {question}
completed 5 row sample219 tokens$ 0.0001 1 iteration
d2615b97-d54e-4b94-a30e-d9a81486d2df
GoogleGoogle/Gemini 2.0 Flash Litetext → text
elau
1 month ago
test
completed 5 row sample201 tokens$ 0.0001 1 iteration
f069c030-c670-4ae5-ab7c-115eb686c0bc
OpenAIOpenAI/GPT 4oimage → text
elau
2 months ago
{image}
{question}
completed 5 row sample2968 tokens$ 0.0105 1 iteration