Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
Check the answer
8066208a-e67b-44b6-8997-4aaf9ecde90f 100 rows completed
Ox Data Bot 🤖
2 months ago
Prompt: Check if the following answers are equivalent or not. Answer with true or false, one word, all lowercase.
Answer 1: {answer}
Answer 2: {prediction}
2 iterations 41033 tokens
text → textOpenAI/GPT-4o
Source:
plan-and-solve-w-variables
Target:
plan-and-solve-w-variables
Extract answer
8d3d2420-f21a-4377-a917-3016bc3a547f 100 rows completed
Ox Data Bot 🤖
2 months ago
Prompt: {prediction}
Extract the numeric answer and only the answer from the reasoning above. The answer should be a single numeric value or list of values and nothing else.
2 iterations 40688 tokens
text → textOpenAI/GPT-4o
Source:
plan-and-solve-w-variables
Target:
plan-and-solve-w-variables
Extract the relevant variables
e615a255-b7ba-4463-8c56-f333bf30dc3d 100 rows completed
Ox Data Bot 🤖
2 months ago
Prompt: Q: {question}
A: Let's first understand the problem, extract relevant variables and their corresponding numerals, and devise a plan. Then, let's carry out the plan, calculate intermediate results (pay attention to calculation and. common sense), solve the problem step by step, and show the answer.
2 iterations 51697 tokens
text → textOpenAI/GPT-4o
Source:
Target:
plan-and-solve-w-variables
Check answer
aa55e2f1-1f6e-497e-a2d8-47486144bc86 100 rows completed
Ox Data Bot 🤖
2 months ago
Prompt: Check if the following answers are equivalent or not. Answer with true or false, one word, all lowercase.
Answer 1: {answer}
Answer 2: {prediction}
2 iterations 4515 tokens
text → textOpenAI/GPT-4o
Source:
plan-and-solve
Target:
plan-and-solve
Extract the answer
a2fd495b-1443-43f8-9cdb-98610db8ccfc 100 rows completed
Ox Data Bot 🤖
2 months ago
Prompt: {pas_prediction}
Extract the numeric answer and only the answer from the reasoning above. The answer should be a single numeric value or list of values and nothing else.
3 iterations 43479 tokens
text → textOpenAI/GPT-4o
Source:
plan-and-solve
Target:
plan-and-solve
Plan-and-Solve
8cf28548-297d-4a0b-9451-93e6d3a0e661 100 rows completed
Ox Data Bot 🤖
2 months ago
Prompt: Q: {question}
A: Let's first understand the problem and devise a plan to solve the problem.
Then let's carry out the plan and solve the problem step by step.
2 iterations 52074 tokens
text → textOpenAI/GPT-4o
Source:
Target:
plan-and-solve
Extract answer
1136f11d-dff0-40c3-a431-06ed7d442aba 100 rows completed
Ox Data Bot 🤖
2 months ago
Prompt: Extract the final answer after the #### marks from the text below
{cot_answer}
2 iterations 21959 tokens
text → textOpenAI/GPT-4o
Source:
Target: