Evaluation 154b8be9-7ee8-4b11-9424-2a7efb2c7d13 - datasets/ChartQA

Total running cost: $0.0015

	Prompt	Rows	Type	Model	Target	Status	Runtime	Run	By	Tokens	Cost
Run	Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase. Response 1: {label} Response 2: {prediction}	100	text → text	OpenAI/GPT 4o mini	3b542b66339d9dfae2035d576c8905e0	completed	00:00:38	5 months ago	ox	5285 tokens	$ 0.0008
Sample	Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase. Response 1: {label} Response 2: {prediction}	5	text → text	OpenAI/GPT 4o mini	Sample - N/A	completed	00:00:01	5 months ago	ox	242 tokens	$ 0.0000
Sample	Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase. Response 1: {label} Response 2: {conclusion}	5	text → text	OpenAI/GPT 4o	Sample - N/A	completed	00:00:03	5 months ago	ox	232 tokens	$ 0.0006
Sample	Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase. Response 1: {label} Response 2: {conclusion}	5	text → text	OpenAI/GPT 4o mini	Sample - N/A	completed	00:00:02	5 months ago	ox	232 tokens	$ 0.0000