Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
37ca1f44-1dcf-4dcb-846d-365e6fc4e270


ox
5 months agoRephrase the following question but keep the intent the same. Only respond with the new question. {query}
main
1d089fc6-e0af-400d-9328-2aee62229752

ox
5 months agoExtract the conclusion and just respond with the text within the <CONCLUSION></CONCLUSION> tag. For example if the tag says <CONCLUSION>1%</CONCLUSION> just respond with "1%". {prediction}
qwen-72B-results
qwen-72B-results
eca8d328-3a7d-4d48-ade0-df10a3dffaf2

ox
5 months ago{imgname} Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: <SUMMARY></SUMMARY> <CAPTION></CAPTION> <REASONING></REASONING> <CONCLUSION></CONCLUSION>. It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely. To explain further: SUMMARY: briefly explain what steps you'll take to solve the problem. CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question. REASONING: outline a step-by-step thought process you would use to solve the problem based on the image. CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. If it's a multiple choice question, the conclusion should only include the option without repeating what the option is. Here's how the xml response format should look: <SUMMARY> Summarize how you will approach the problem and explain the steps you will take to reach the answer. </SUMMARY> <CAPTION> Provide a detailed description of the image, particularly emphasizing the aspects related to the question. </CAPTION> <REASONING> Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning. </REASONING> <CONCLUSION> State the final answer in a clear and direct format. It must match the correct answer exactly. </CONCLUSION> (Do not forget the <CONCLUSION></CONCLUSION>!) Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly. <QUESTION> {query} </QUESTION>
main
qwen-72B-results
d7c091da-c2ad-4993-bd87-dabe52f286ce

ox
5 months ago{imgname} Answer the following question very concisely. Respond with one word if possible {query}
main
results-2
154b8be9-7ee8-4b11-9424-2a7efb2c7d13

ox
5 months agoAre the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase. Response 1: {label} Response 2: {prediction}
llama-3.2-11B-direct-answers
llama-3.2-11B-direct-answers
466c5926-6157-4ff6-a8e7-f0bbd5bd8fb3

ox
5 months ago{imgname} Answer the following question succinctly with a single word if possible. Question: {query}
main
llama-3.2-11B-direct-answers
d61b6b29-ff7b-4332-9192-6376fd661469

ox
5 months agoAre the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase. Response 1: {label} Response 2: {conclusion}
llama-3.2-11B-cot-separate-steps
llama-3.2-11B-cot-separate-steps
e47608b7-a6f2-4c03-b32f-0a6cb7b85a48

ox
5 months ago{imgname} I have an image and a question that I want you to answer. Take the following summary, caption, and reasoning to come up with a final conclusion. Give the final answer in a direct format, and it must be concise match the correct answer exactly. Do not ramble, just give the final answer, no other words. If it is a numeric value just answer with the number. If it's a multiple choice question, the conclusion should only include the option without repeating what the option is. Question: {query} Summary: {summary} Caption: {caption} Reasoning: {reasoning} Conclusion:
llama-90B-CoT-separate-steps
error no case clause matching: {:error, "An exception occurred indexing, getting dataframe and running evaluation: %FunctionClauseError{module: String, function: :replace, arity: 4, kind: nil, args: nil, clauses: nil}", 0, 0} 1 / 100 rows1984 tokens$ 0.0000 1 iteration
428ae1bf-1c58-42b0-bd5f-3a90a5aa7637

ox
5 months ago{imgname} I have an image and a question that I want you to answer. Take the following summary, caption, and reasoning to come up with a final conclusion. Give the final answer in a direct format, and it must be concise match the correct answer exactly. Do not ramble, just give the final answer, no other words. If it is a numeric value just answer with the number. If it's a multiple choice question, the conclusion should only include the option without repeating what the option is. Question: {query} Summary: {summary} Caption: {caption} Reasoning: {reasoning} Conclusion:
llama-3.2-11B-cot-separate-steps
llama-3.2-11B-cot-separate-steps
c1957ee0-d55b-4345-ad8a-138a2302a003

ox
5 months ago{imgname} I have an image and a question that I want you to answer. Outline a step-by-step thought process you would use to solve the problem based on the image. Question: {query} Reasoning:
llama-3.2-11B-cot-separate-steps
llama-3.2-11B-cot-separate-steps
8d745c64-e9d2-4a59-911c-235c5b712760

ox
5 months ago{imgname} I have an image and a question that I want you to answer. Caption the image in detail. Describe the contents of the image, specifically focusing on details relevant to the question. Question: {query} Caption:
llama-3.2-11B-cot-separate-steps
llama-3.2-11B-cot-separate-steps
bcc8c95f-63de-402e-8193-dd3b18ff4a9e

ox
5 months ago{imgname} I have an image and a question that I want you to answer. Summarize everything everything you would need to do to answer the question. Question: {query} Summary:
main
llama-3.2-11B-cot-separate-steps
315ba916-3d6b-4f45-9ac6-931166df3682

ox
5 months ago{imgname} I have an image and a question that I want you to answer. Outline a step-by-step thought process you would use to solve the problem based on the image. Question: {query} Reasoning:
llama-90B-CoT-separate-steps
llama-90B-CoT-separate-steps
3541fabd-5121-43cc-9059-dc04e061196e

ox
5 months ago{imgname} I have an image and a question that I want you to answer. Caption the image in detail. Describe the contents of the image, specifically focusing on details relevant to the question. Question: {query}
llama-90B-CoT-separate-steps
llama-90B-CoT-separate-steps
9bf56263-d835-4247-a853-046d32cc67b9

ox
5 months ago{imgname} I have an image and a question that I want you to answer. Summarize everything everything you would need to do to answer the question. Describe how you will approach the problem step by step and create a plan. Question: {query}
main
llama-90B/summaries
f76f2a5e-7573-4c2b-b8f5-491a9089ee38

ox
5 months agoExtract the conclusion from the text, respond with only the text after the <CONCLUSION> tag {prediction}
Llama-3.2-90B-CoT-100ex
Llama-3.2-90B-CoT-100ex
70d53820-a048-47f0-9a8c-7e165956ca2e

ox
5 months ago{imgname} Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: <SUMMARY></SUMMARY> <CAPTION></CAPTION> <REASONING></REASONING> <CONCLUSION></CONCLUSION>. It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely. To explain further: SUMMARY: briefly explain what steps you'll take to solve the problem. CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question. REASONING: outline a step-by-step thought process you would use to solve the problem based on the image. CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. If it's a multiple choice question, the conclusion should only include the option without repeating what the option is. Here's how the xml response format should look: <SUMMARY> Summarize how you will approach the problem and explain the steps you will take to reach the answer. </SUMMARY> <CAPTION> Provide a detailed description of the image, particularly emphasizing the aspects related to the question. </CAPTION> <REASONING> Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning. </REASONING> <CONCLUSION> State the final answer in a clear and direct format. It must match the correct answer exactly. </CONCLUSION> (Do not forget the <CONCLUSION></CONCLUSION>!) Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly. <QUESTION> {query} </QUESTION>
main
Llama-3.2-90B-CoT-100ex
6869326f-c1ff-45f4-be0b-b5a1d485683e

ox
5 months agoExtract the conclusion from the text, respond with only the text after the <CONCLUSION> tag {prediction}
Llama-3.2-11B-CoT-100ex
Llama-3.2-11B-CoT-100ex
6719bc92-cb04-4ef2-bbd9-96f66bf55ca0

ox
5 months agoExtract the conclusion from the text, respond with only the text after the <CONCLUSION> tag {prediction}
gpt-4o-cot
gpt-4o-cot
bb568442-5903-4f00-a112-dc46fd34a8cd

ox
5 months ago{imgname} Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: <SUMMARY></SUMMARY> <CAPTION></CAPTION> <REASONING></REASONING> <CONCLUSION></CONCLUSION>. It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely. To explain further: SUMMARY: briefly explain what steps you'll take to solve the problem. CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question. REASONING: outline a step-by-step thought process you would use to solve the problem based on the image. CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. If it's a multiple choice question, the conclusion should only include the option without repeating what the option is. Here's how the xml response format should look: <SUMMARY> Summarize how you will approach the problem and explain the steps you will take to reach the answer. </SUMMARY> <CAPTION> Provide a detailed description of the image, particularly emphasizing the aspects related to the question. </CAPTION> <REASONING> Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning. </REASONING> <CONCLUSION> State the final answer in a clear and direct format. It must match the correct answer exactly. </CONCLUSION> (Do not forget the <CONCLUSION></CONCLUSION>!) Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly. <QUESTION> {query} </QUESTION>
main
gpt-4o-cot
85d938b9-6668-4581-9fd4-7595bcd0304a

ox
5 months ago{imgname} Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: <SUMMARY></SUMMARY> <CAPTION></CAPTION> <REASONING></REASONING> <CONCLUSION></CONCLUSION>. It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely. To explain further: SUMMARY: briefly explain what steps you'll take to solve the problem. CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question. REASONING: outline a step-by-step thought process you would use to solve the problem based on the image. CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. If it's a multiple choice question, the conclusion should only include the option without repeating what the option is. Here's how the xml response format should look: <SUMMARY> Summarize how you will approach the problem and explain the steps you will take to reach the answer. </SUMMARY> <CAPTION> Provide a detailed description of the image, particularly emphasizing the aspects related to the question. </CAPTION> <REASONING> Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning. </REASONING> <CONCLUSION> State the final answer in a clear and direct format. It must match the correct answer exactly. </CONCLUSION> (Do not forget the <CONCLUSION></CONCLUSION>!) Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly. <QUESTION> {query} </QUESTION>
main
Llama-3.2-11B-CoT-100ex