Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
8066208a-e67b-44b6-8997-4aaf9ecde90f
OpenAIOpenAI/GPT 4otexttext
Ox Data Bot 🤖
oxbot
5 months ago
Check if the following answers are equivalent or not. Answer with true or false, one word, all lowercase. 

Answer 1: {answer} 
Answer 2: {prediction}
plan-and-solve-w-variables
plan-and-solve-w-variables
completed 100 rows41033 tokens 2 iterations
8d3d2420-f21a-4377-a917-3016bc3a547f
OpenAIOpenAI/GPT 4otexttext
Ox Data Bot 🤖
oxbot
5 months ago
{prediction} 

Extract the numeric answer and only the answer from the reasoning above. The answer should be a single numeric value or list of values and nothing else.
plan-and-solve-w-variables
plan-and-solve-w-variables
completed 100 rows40688 tokens 2 iterations
e615a255-b7ba-4463-8c56-f333bf30dc3d
OpenAIOpenAI/GPT 4otexttext
Ox Data Bot 🤖
oxbot
5 months ago
Q: {question}
A: Let's first understand the problem, extract relevant variables and their corresponding numerals, and devise a plan. Then, let's carry out the plan, calculate intermediate results (pay attention to calculation and. common sense), solve the problem step by step, and show the answer.
plan-and-solve-w-variables
completed 100 rows51697 tokens 2 iterations
aa55e2f1-1f6e-497e-a2d8-47486144bc86
OpenAIOpenAI/GPT 4otexttext
Ox Data Bot 🤖
oxbot
5 months ago
Check if the following answers are equivalent or not. Answer with true or false, one word, all lowercase.

Answer 1: {answer}
Answer 2: {prediction}
plan-and-solve
plan-and-solve
completed 100 rows4515 tokens 2 iterations
a2fd495b-1443-43f8-9cdb-98610db8ccfc
OpenAIOpenAI/GPT 4otexttext
Ox Data Bot 🤖
oxbot
5 months ago
{pas_prediction}

Extract the numeric answer and only the answer from the reasoning above. The answer should be a single numeric value or list of values and nothing else.
plan-and-solve
plan-and-solve
completed 100 rows43479 tokens 3 iterations
8cf28548-297d-4a0b-9451-93e6d3a0e661
OpenAIOpenAI/GPT 4otexttext
Ox Data Bot 🤖
oxbot
5 months ago
Q: {question}

A: Let's first understand the problem and devise a plan to solve the problem.
Then let's carry out the plan and solve the problem step by step.
completed 100 rows52074 tokens 2 iterations
1136f11d-dff0-40c3-a431-06ed7d442aba
OpenAIOpenAI/GPT 4otexttext
Ox Data Bot 🤖
oxbot
5 months ago
Extract the final answer after the #### marks from the text below

{cot_answer}
completed 100 rows21959 tokens 2 iterations