Repository evaluations - datasets/ChartQA

Evaluations

Run models against your data

Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.

Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.

Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.

Synthetic Question Generation

37ca1f44-1dcf-4dcb-846d-365e6fc4e270

Nous Research/ Hermes 3 405Btext → text

5 months ago

Prompt

Rephrase the following question but keep the intent the same. Only respond with the new question.

{query}

main

val_human.json

completed 5 row sample300 tokens$ 0.0003 12 iterations

Answer Extraction from Qwen Results

1d089fc6-e0af-400d-9328-2aee62229752

OpenAI/GPT 4o minitext → text

5 months ago

Prompt

Extract the conclusion and just respond with the text within the <CONCLUSION></CONCLUSION> tag.

For example if the tag says <CONCLUSION>1%</CONCLUSION> just respond with "1%".

{prediction}

qwen-72B-results

val_100_ex.json

qwen-72B-results

val_100_ex.json

completed 100 rows27430 tokens$ 0.0042 3 iterations

Qwen 72B Vision CoT

eca8d328-3a7d-4d48-ade0-df10a3dffaf2

Qwen/Qwen2 VL 72B Instructimage → text

5 months ago

Prompt

{imgname}

Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: 

<SUMMARY></SUMMARY>
<CAPTION></CAPTION>
<REASONING></REASONING>
<CONCLUSION></CONCLUSION>. 

It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely.

To explain further: 

SUMMARY: briefly explain what steps you'll take to solve the problem.
CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question.
REASONING: outline a step-by-step thought process you would use to solve the problem based on the image.
CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. 

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Here's how the xml response format should look:

<SUMMARY>
  Summarize how you will approach the problem and explain the steps you will take to reach the answer.
</SUMMARY>
<CAPTION>
  Provide a detailed description of the image, particularly emphasizing the aspects related to the question.
</CAPTION>
<REASONING>
  Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning.
</REASONING>
<CONCLUSION>
  State the final answer in a clear and direct format. It must match the correct answer exactly.
</CONCLUSION> 

(Do not forget the <CONCLUSION></CONCLUSION>!)

Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly.

<QUESTION>
  {query}
</QUESTION>

main

val_100_ex.json

qwen-72B-results

val_100_ex.json

completed 100 rows122053 tokens$ 0.1098 2 iterations

d7c091da-c2ad-4993-bd87-dabe52f286ce

Meta/Llama 3.2 11B Visionimage → text

5 months ago

Prompt

{imgname}

Answer the following question very concisely. Respond with one word if possible

{query}

main

val_100_ex.json

results-2

val_100_ex.json

completed 100 rows4609 tokens$ 0.0008 3 iterations

Judge 11B Direct Responses

154b8be9-7ee8-4b11-9424-2a7efb2c7d13

OpenAI/GPT 4o minitext → text

5 months ago

Prompt

Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase.

Response 1:
{label}

Response 2:
{prediction}

llama-3.2-11B-direct-answers

val_100_ex.json

llama-3.2-11B-direct-answers

val_100_ex.json

completed 100 rows5285 tokens$ 0.0008 4 iterations

Answer questions directly with Llama 3.2 11B

466c5926-6157-4ff6-a8e7-f0bbd5bd8fb3

Meta/Llama 3.2 11B Visionimage → text

5 months ago

Prompt

{imgname}

Answer the following question succinctly with a single word if possible.

Question:
{query}

main

val_100_ex.json

llama-3.2-11B-direct-answers

val_100_ex.json

completed 100 rows4627 tokens$ 0.0008 3 iterations

Judge 11B Responses

d61b6b29-ff7b-4332-9192-6376fd661469

OpenAI/GPT 4otext → text

5 months ago

Prompt

Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase.

Response 1:
{label}

Response 2:
{conclusion}

llama-3.2-11B-cot-separate-steps

val_100_ex.json

llama-3.2-11B-cot-separate-steps

val_100_ex.json

completed 100 rows5178 tokens$ 0.0137 5 iterations

Llama 90B Conclusions

e47608b7-a6f2-4c03-b32f-0a6cb7b85a48

Meta/Llama 3.2 11B Visionimage → text

5 months ago

Prompt

{imgname}

I have an image and a question that I want you to answer. Take the following summary, caption, and reasoning to come up with a final conclusion.

Give the final answer in a direct format, and it must be concise match the correct answer exactly. Do not ramble, just give the final answer, no other words. If it is a numeric value just answer with the number.

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Question:
{query}

Summary:
{summary}

Caption:
{caption}

Reasoning:
{reasoning}

Conclusion:

llama-90B-CoT-separate-steps

val_100_ex.json

N/A

val_100_ex.json

error no case clause matching: {:error, "An exception occurred indexing, getting dataframe and running evaluation: %FunctionClauseError{module: String, function: :replace, arity: 4, kind: nil, args: nil, clauses: nil}", 0, 0} 1 / 100 rows1984 tokens$ 0.0000 1 iteration

Llama 11B Conclusion

428ae1bf-1c58-42b0-bd5f-3a90a5aa7637

Meta/Llama 3.2 11B Visionimage → text

5 months ago

Prompt

{imgname}

I have an image and a question that I want you to answer. Take the following summary, caption, and reasoning to come up with a final conclusion.

Give the final answer in a direct format, and it must be concise match the correct answer exactly. Do not ramble, just give the final answer, no other words. If it is a numeric value just answer with the number.

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Question:
{query}

Summary:
{summary}

Caption:
{caption}

Reasoning:
{reasoning}

Conclusion:

llama-3.2-11B-cot-separate-steps

val_100_ex.json

llama-3.2-11B-cot-separate-steps

val_100_ex.json

completed 100 rows83122 tokens$ 0.0150 3 iterations

Llama 11B Reasoning

c1957ee0-d55b-4345-ad8a-138a2302a003

Meta/Llama 3.2 11B Visionimage → text

5 months ago

Prompt

{imgname}

I have an image and a question that I want you to answer. Outline a step-by-step thought process you would use to solve the problem based on the image.

Question:
{query}

Reasoning:

llama-3.2-11B-cot-separate-steps

val_100_ex.json

llama-3.2-11B-cot-separate-steps

val_100_ex.json

completed 100 rows30674 tokens$ 0.0055 1 iteration

Llama 11B Captions

8d745c64-e9d2-4a59-911c-235c5b712760

Meta/Llama 3.2 11B Visionimage → text

5 months ago

Prompt

{imgname}

I have an image and a question that I want you to answer. Caption the image in detail. Describe the contents of the image, specifically focusing on details relevant to the question.

Question:
{query}

Caption:

llama-3.2-11B-cot-separate-steps

val_100_ex.json

llama-3.2-11B-cot-separate-steps

val_100_ex.json

completed 100 rows32421 tokens$ 0.0058 2 iterations

Llama 11B Summary

bcc8c95f-63de-402e-8193-dd3b18ff4a9e

Meta/Llama 3.2 11B Visionimage → text

5 months ago

Prompt

{imgname}

I have an image and a question that I want you to answer. Summarize everything everything you would need to do to answer the question.

Question:
{query}

Summary:

main

val_100_ex.json

llama-3.2-11B-cot-separate-steps

val_100_ex.json

completed 100 rows24567 tokens$ 0.0044 2 iterations

Llama 90B Reasoning

315ba916-3d6b-4f45-9ac6-931166df3682

Meta/Llama 3.2 90B Vision (Preview)image → text

5 months ago

Prompt

{imgname}

I have an image and a question that I want you to answer. Outline a step-by-step thought process you would use to solve the problem based on the image.

Question:
{query}

Reasoning:

llama-90B-CoT-separate-steps

val_100_ex.json

llama-90B-CoT-separate-steps

val_100_ex.json

completed 100 rows29807 tokens$ 0.0268 1 iteration

Llama 90B Caption

3541fabd-5121-43cc-9059-dc04e061196e

Meta/Llama 3.2 90B Vision (Preview)image → text

5 months ago

Prompt

{imgname}

I have an image and a question that I want you to answer. Caption the image in detail. Describe the contents of the image, specifically focusing on details relevant to the question.

Question: {query}

llama-90B-CoT-separate-steps

val_100_ex.json

llama-90B-CoT-separate-steps

val_100_ex.json

completed 100 rows23480 tokens$ 0.0211 1 iteration

Llama 90B

9bf56263-d835-4247-a853-046d32cc67b9

Meta/Llama 3.2 90B Vision (Preview)image → text

5 months ago

Prompt

{imgname}

I have an image and a question that I want you to answer. Summarize everything everything you would need to do to answer the question. Describe how you will approach the problem step by step and create a plan.

Question: {query}

main

val_100_ex.json

llama-90B/summaries

val_100_ex.json

completed 100 rows31030 tokens$ 0.0279 2 iterations

Extract Conclusions Llama 3.2 90B

f76f2a5e-7573-4c2b-b8f5-491a9089ee38

OpenAI/GPT 4o minitext → text

5 months ago

Prompt

Extract the conclusion from the text, respond with only the text after the <CONCLUSION> tag

{prediction}

Llama-3.2-90B-CoT-100ex

val_100_ex.json

Llama-3.2-90B-CoT-100ex

val_100_ex.json

completed 100 rows19045 tokens$ 0.0034 2 iterations

Llama 3.2 90B CoT Reasoning

70d53820-a048-47f0-9a8c-7e165956ca2e

Meta/Llama 3.2 90B Vision (Preview)image → text

5 months ago

Prompt


{imgname}

Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: 

<SUMMARY></SUMMARY>
<CAPTION></CAPTION>
<REASONING></REASONING>
<CONCLUSION></CONCLUSION>. 

It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely.

To explain further: 

SUMMARY: briefly explain what steps you'll take to solve the problem.
CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question.
REASONING: outline a step-by-step thought process you would use to solve the problem based on the image.
CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. 

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Here's how the xml response format should look:

<SUMMARY>
  Summarize how you will approach the problem and explain the steps you will take to reach the answer.
</SUMMARY>
<CAPTION>
  Provide a detailed description of the image, particularly emphasizing the aspects related to the question.
</CAPTION>
<REASONING>
  Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning.
</REASONING>
<CONCLUSION>
  State the final answer in a clear and direct format. It must match the correct answer exactly.
</CONCLUSION> 

(Do not forget the <CONCLUSION></CONCLUSION>!)

Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly.

<QUESTION>
  {query}
</QUESTION>

main

val_100_ex.json

Llama-3.2-90B-CoT-100ex

val_100_ex.json

completed 100 rows52502 tokens$ 0.0473 1 iteration

Extract Conclusions Llama 11B

6869326f-c1ff-45f4-be0b-b5a1d485683e

OpenAI/GPT 4o minitext → text

5 months ago

Prompt

Extract the conclusion from the text, respond with only the text after the <CONCLUSION> tag

{prediction}

Llama-3.2-11B-CoT-100ex

val_100_ex.json

Llama-3.2-11B-CoT-100ex

val_100_ex.json

completed 100 rows28659 tokens$ 0.0046 2 iterations

Extract conclusions GPT-4o

6719bc92-cb04-4ef2-bbd9-96f66bf55ca0

OpenAI/GPT 4o minitext → text

5 months ago

Prompt

Extract the conclusion from the text, respond with only the text after the <CONCLUSION> tag

{prediction}

gpt-4o-cot

val_100_ex.json

gpt-4o-cot

val_100_ex.json

completed 100 rows25332 tokens$ 0.0039 3 iterations

GPT-4o Chain of Thought

bb568442-5903-4f00-a112-dc46fd34a8cd

OpenAI/GPT 4oimage → text

5 months ago

Prompt


{imgname}

Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: 

<SUMMARY></SUMMARY>
<CAPTION></CAPTION>
<REASONING></REASONING>
<CONCLUSION></CONCLUSION>. 

It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely.

To explain further: 

SUMMARY: briefly explain what steps you'll take to solve the problem.
CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question.
REASONING: outline a step-by-step thought process you would use to solve the problem based on the image.
CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. 

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Here's how the xml response format should look:

<SUMMARY>
  Summarize how you will approach the problem and explain the steps you will take to reach the answer.
</SUMMARY>
<CAPTION>
  Provide a detailed description of the image, particularly emphasizing the aspects related to the question.
</CAPTION>
<REASONING>
  Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning.
</REASONING>
<CONCLUSION>
  State the final answer in a clear and direct format. It must match the correct answer exactly.
</CONCLUSION> 

(Do not forget the <CONCLUSION></CONCLUSION>!)

Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly.

<QUESTION>
  {query}
</QUESTION>

main

val_100_ex.json

gpt-4o-cot

val_100_ex.json

completed 100 rows132771 tokens$ 0.4999 1 iteration

Llama 3.2 11B CoT Reasoning

85d938b9-6668-4581-9fd4-7595bcd0304a

Meta/Llama 3.2 11B Visionimage → text

5 months ago

Prompt

{imgname}

Here is image and a question that I want you to answer. I need you to strictly follow the format with four specific sections: 

<SUMMARY></SUMMARY>
<CAPTION></CAPTION>
<REASONING></REASONING>
<CONCLUSION></CONCLUSION>. 

It is crucial that you adhere to this structure exactly as outlined and that the final answer in the <CONCLUSION></CONCLUSION> matches the standard correct answer precisely.

To explain further: 

SUMMARY: briefly explain what steps you'll take to solve the problem.
CAPTION: describe the contents of the image in as much detail as possible, specifically focusing on details relevant to the question.
REASONING: outline a step-by-step thought process you would use to solve the problem based on the image.
CONCLUSION: give the final answer in a direct format, and it must match the correct answer exactly. 

If it's a multiple choice question, the conclusion should only include the option without repeating what the option is.

Here's how the xml response format should look:

<SUMMARY>
  Summarize how you will approach the problem and explain the steps you will take to reach the answer.
</SUMMARY>
<CAPTION>
  Provide a detailed description of the image, particularly emphasizing the aspects related to the question.
</CAPTION>
<REASONING>
  Provide a chain-of-thought, logical explanation of the problem. This should outline step-by-step reasoning.
</REASONING>
<CONCLUSION>
  State the final answer in a clear and direct format. It must match the correct answer exactly.
</CONCLUSION> 

(Do not forget the <CONCLUSION></CONCLUSION>!)

Please apply this format meticulously putting each section in xml tags like above. Analyze the given image and answer the related question, ensuring that the answer matches the standard one perfectly.

<QUESTION>
  {query}
</QUESTION>

main

val_100_ex.json

Llama-3.2-11B-CoT-100ex

val_100_ex.json

completed 100 rows69199 tokens$ 0.0111 1 iteration