Repository evaluations - datasets/ChartQA

Evaluations/Judge 11B Direct Responses

llama-3.2-11B-direct-answers

val_100_ex.json

Type: text → text

Model:

OpenAI/GPT 4o mini

Provider:

OpenAI

Target field: are_equivalent

Prompt

Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase.

Response 1:
{label}

Response 2:
{prediction}

llama-3.2-11B-direct-answers

val_100_ex.json

Queued: Dec 6, 2024, 5:42 PM UTC

Completed: Dec 6, 2024, 5:42 PM UTC

100 rows

5285 tokens$ 0.0008

100 rows processed, 5285 tokens used ($0.0008)

completed

5 columns, 100 rows