Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
37ca1f44-1dcf-4dcb-846d-365e6fc4e270
Nous ResearchNous Research/ Hermes 3 405Btext → text
Bessie
ox
5 months ago
Rephrase the following question but keep the intent the same. Only respond with the new question.

{query}
completed 5 row sample300 tokens$ 0.0003 12 iterations
1d089fc6-e0af-400d-9328-2aee62229752
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
5 months ago
Extract the conclusion and just respond with the text within the <CONCLUSION></CONCLUSION> tag.

For example if the tag says <CONCLUSION>1%</CONCLUSION> just respond with "1%".

{prediction}
qwen-72B-results
qwen-72B-results
completed 100 rows27430 tokens$ 0.0042 3 iterations
eca8d328-3a7d-4d48-ade0-df10a3dffaf2
QwenQwen/Qwen2 VL 72B Instructimage → text
Bessie
ox
5 months ago
{imgname}

Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: 

<SUMMARY></SUMMARY>
<CAPTION></CAPTION>
<REASONING></REASONING>
<CONCLUSION></CONCLUSION>. 

It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely.

To explain further: 

SUMMARY: briefly explain what steps you'll take to solve the problem.
CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question.
REASONING: outline a step-by-step thought process you would use to solve the problem based on the image.
CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. 

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Here's how the xml response format should look:

<SUMMARY>
  Summarize how you will approach the problem and explain the steps you will take to reach the answer.
</SUMMARY>
<CAPTION>
  Provide a detailed description of the image, particularly emphasizing the aspects related to the question.
</CAPTION>
<REASONING>
  Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning.
</REASONING>
<CONCLUSION>
  State the final answer in a clear and direct format. It must match the correct answer exactly.
</CONCLUSION> 

(Do not forget the <CONCLUSION></CONCLUSION>!)

Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly.

<QUESTION>
  {query}
</QUESTION>
qwen-72B-results
completed 100 rows122053 tokens$ 0.1098 2 iterations
d7c091da-c2ad-4993-bd87-dabe52f286ce
MetaMeta/Llama 3.2 11B Visionimage → text
Bessie
ox
5 months ago
{imgname}

Answer the following question very concisely. Respond with one word if possible

{query}
completed 100 rows4609 tokens$ 0.0008 3 iterations
154b8be9-7ee8-4b11-9424-2a7efb2c7d13
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
5 months ago
Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase.

Response 1:
{label}

Response 2:
{prediction}
llama-3.2-11B-direct-answers
llama-3.2-11B-direct-answers
completed 100 rows5285 tokens$ 0.0008 4 iterations
466c5926-6157-4ff6-a8e7-f0bbd5bd8fb3
MetaMeta/Llama 3.2 11B Visionimage → text
Bessie
ox
5 months ago
{imgname}

Answer the following question succinctly with a single word if possible.

Question:
{query}
llama-3.2-11B-direct-answers
completed 100 rows4627 tokens$ 0.0008 3 iterations
d61b6b29-ff7b-4332-9192-6376fd661469
OpenAIOpenAI/GPT 4otext → text
Bessie
ox
5 months ago
Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase.

Response 1:
{label}

Response 2:
{conclusion}
llama-3.2-11B-cot-separate-steps
llama-3.2-11B-cot-separate-steps
completed 100 rows5178 tokens$ 0.0137 5 iterations
e47608b7-a6f2-4c03-b32f-0a6cb7b85a48
MetaMeta/Llama 3.2 11B Visionimage → text
Bessie
ox
5 months ago
{imgname}

I have an image and a question that I want you to answer. Take the following summary, caption, and reasoning to come up with a final conclusion.

Give the final answer in a direct format, and it must be concise match the correct answer exactly. Do not ramble, just give the final answer, no other words. If it is a numeric value just answer with the number.

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Question:
{query}

Summary:
{summary}

Caption:
{caption}

Reasoning:
{reasoning}

Conclusion:
llama-90B-CoT-separate-steps
error no case clause matching: {:error, "An exception occurred indexing, getting dataframe and running evaluation: %FunctionClauseError{module: String, function: :replace, arity: 4, kind: nil, args: nil, clauses: nil}", 0, 0} 1 / 100 rows1984 tokens$ 0.0000 1 iteration
428ae1bf-1c58-42b0-bd5f-3a90a5aa7637
MetaMeta/Llama 3.2 11B Visionimage → text
Bessie
ox
5 months ago
{imgname}

I have an image and a question that I want you to answer. Take the following summary, caption, and reasoning to come up with a final conclusion.

Give the final answer in a direct format, and it must be concise match the correct answer exactly. Do not ramble, just give the final answer, no other words. If it is a numeric value just answer with the number.

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Question:
{query}

Summary:
{summary}

Caption:
{caption}

Reasoning:
{reasoning}

Conclusion:
llama-3.2-11B-cot-separate-steps
llama-3.2-11B-cot-separate-steps
completed 100 rows83122 tokens$ 0.0150 3 iterations
c1957ee0-d55b-4345-ad8a-138a2302a003
MetaMeta/Llama 3.2 11B Visionimage → text
Bessie
ox
5 months ago
{imgname}

I have an image and a question that I want you to answer. Outline a step-by-step thought process you would use to solve the problem based on the image.

Question:
{query}

Reasoning:
llama-3.2-11B-cot-separate-steps
llama-3.2-11B-cot-separate-steps
completed 100 rows30674 tokens$ 0.0055 1 iteration
8d745c64-e9d2-4a59-911c-235c5b712760
MetaMeta/Llama 3.2 11B Visionimage → text
Bessie
ox
5 months ago
{imgname}

I have an image and a question that I want you to answer. Caption the image in detail. Describe the contents of the image, specifically focusing on details relevant to the question.

Question:
{query}

Caption:
llama-3.2-11B-cot-separate-steps
llama-3.2-11B-cot-separate-steps
completed 100 rows32421 tokens$ 0.0058 2 iterations
bcc8c95f-63de-402e-8193-dd3b18ff4a9e
MetaMeta/Llama 3.2 11B Visionimage → text
Bessie
ox
5 months ago
{imgname}

I have an image and a question that I want you to answer. Summarize everything everything you would need to do to answer the question.

Question:
{query}

Summary:
llama-3.2-11B-cot-separate-steps
completed 100 rows24567 tokens$ 0.0044 2 iterations
315ba916-3d6b-4f45-9ac6-931166df3682
MetaMeta/Llama 3.2 90B Vision (Preview)image → text
Bessie
ox
5 months ago
{imgname}

I have an image and a question that I want you to answer. Outline a step-by-step thought process you would use to solve the problem based on the image.

Question:
{query}

Reasoning:
llama-90B-CoT-separate-steps
llama-90B-CoT-separate-steps
completed 100 rows29807 tokens$ 0.0268 1 iteration
3541fabd-5121-43cc-9059-dc04e061196e
MetaMeta/Llama 3.2 90B Vision (Preview)image → text
Bessie
ox
5 months ago
{imgname}

I have an image and a question that I want you to answer. Caption the image in detail. Describe the contents of the image, specifically focusing on details relevant to the question.

Question: {query}
llama-90B-CoT-separate-steps
llama-90B-CoT-separate-steps
completed 100 rows23480 tokens$ 0.0211 1 iteration
9bf56263-d835-4247-a853-046d32cc67b9
MetaMeta/Llama 3.2 90B Vision (Preview)image → text
Bessie
ox
5 months ago
{imgname}

I have an image and a question that I want you to answer. Summarize everything everything you would need to do to answer the question. Describe how you will approach the problem step by step and create a plan.

Question: {query}
llama-90B/summaries
completed 100 rows31030 tokens$ 0.0279 2 iterations
f76f2a5e-7573-4c2b-b8f5-491a9089ee38
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
5 months ago
Extract the conclusion from the text, respond with only the text after the <CONCLUSION> tag

{prediction}
Llama-3.2-90B-CoT-100ex
Llama-3.2-90B-CoT-100ex
completed 100 rows19045 tokens$ 0.0034 2 iterations
70d53820-a048-47f0-9a8c-7e165956ca2e
MetaMeta/Llama 3.2 90B Vision (Preview)image → text
Bessie
ox
5 months ago

{imgname}

Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: 

<SUMMARY></SUMMARY>
<CAPTION></CAPTION>
<REASONING></REASONING>
<CONCLUSION></CONCLUSION>. 

It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely.

To explain further: 

SUMMARY: briefly explain what steps you'll take to solve the problem.
CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question.
REASONING: outline a step-by-step thought process you would use to solve the problem based on the image.
CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. 

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Here's how the xml response format should look:

<SUMMARY>
  Summarize how you will approach the problem and explain the steps you will take to reach the answer.
</SUMMARY>
<CAPTION>
  Provide a detailed description of the image, particularly emphasizing the aspects related to the question.
</CAPTION>
<REASONING>
  Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning.
</REASONING>
<CONCLUSION>
  State the final answer in a clear and direct format. It must match the correct answer exactly.
</CONCLUSION> 

(Do not forget the <CONCLUSION></CONCLUSION>!)

Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly.

<QUESTION>
  {query}
</QUESTION>
Llama-3.2-90B-CoT-100ex
completed 100 rows52502 tokens$ 0.0473 1 iteration
6869326f-c1ff-45f4-be0b-b5a1d485683e
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
5 months ago
Extract the conclusion from the text, respond with only the text after the <CONCLUSION> tag

{prediction}
Llama-3.2-11B-CoT-100ex
Llama-3.2-11B-CoT-100ex
completed 100 rows28659 tokens$ 0.0046 2 iterations
6719bc92-cb04-4ef2-bbd9-96f66bf55ca0
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
5 months ago
Extract the conclusion from the text, respond with only the text after the <CONCLUSION> tag

{prediction}
gpt-4o-cot
gpt-4o-cot
completed 100 rows25332 tokens$ 0.0039 3 iterations
bb568442-5903-4f00-a112-dc46fd34a8cd
OpenAIOpenAI/GPT 4oimage → text
Bessie
ox
5 months ago

{imgname}

Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: 

<SUMMARY></SUMMARY>
<CAPTION></CAPTION>
<REASONING></REASONING>
<CONCLUSION></CONCLUSION>. 

It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely.

To explain further: 

SUMMARY: briefly explain what steps you'll take to solve the problem.
CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question.
REASONING: outline a step-by-step thought process you would use to solve the problem based on the image.
CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. 

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Here's how the xml response format should look:

<SUMMARY>
  Summarize how you will approach the problem and explain the steps you will take to reach the answer.
</SUMMARY>
<CAPTION>
  Provide a detailed description of the image, particularly emphasizing the aspects related to the question.
</CAPTION>
<REASONING>
  Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning.
</REASONING>
<CONCLUSION>
  State the final answer in a clear and direct format. It must match the correct answer exactly.
</CONCLUSION> 

(Do not forget the <CONCLUSION></CONCLUSION>!)

Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly.

<QUESTION>
  {query}
</QUESTION>
completed 100 rows132771 tokens$ 0.4999 1 iteration
85d938b9-6668-4581-9fd4-7595bcd0304a
MetaMeta/Llama 3.2 11B Visionimage → text
Bessie
ox
5 months ago
{imgname}

Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: 

<SUMMARY></SUMMARY>
<CAPTION></CAPTION>
<REASONING></REASONING>
<CONCLUSION></CONCLUSION>. 

It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely.

To explain further: 

SUMMARY: briefly explain what steps you'll take to solve the problem.
CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question.
REASONING: outline a step-by-step thought process you would use to solve the problem based on the image.
CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. 

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Here's how the xml response format should look:

<SUMMARY>
  Summarize how you will approach the problem and explain the steps you will take to reach the answer.
</SUMMARY>
<CAPTION>
  Provide a detailed description of the image, particularly emphasizing the aspects related to the question.
</CAPTION>
<REASONING>
  Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning.
</REASONING>
<CONCLUSION>
  State the final answer in a clear and direct format. It must match the correct answer exactly.
</CONCLUSION> 

(Do not forget the <CONCLUSION></CONCLUSION>!)

Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly.

<QUESTION>
  {query}
</QUESTION>
Llama-3.2-11B-CoT-100ex
completed 100 rows69199 tokens$ 0.0111 1 iteration