Evaluation d61b6b29-ff7b-4332-9192-6376fd661469 - datasets/ChartQA

Total running cost: $0.0150

	Prompt	Rows	Type	Model	Target	Status	Runtime	Run	By	Tokens	Cost
Run	Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase. Response 1: {label} Response 2: {conclusion}	100	text → text	OpenAI/GPT 4o	7b788db7ccefc855e4515f91722df849	completed	00:00:46	7 months ago	ox	5178 tokens	$ 0.0137
Sample	Are the two responses equivalent? Ignore punctuation and irrelevant characters and differences in verb tense. Reply with true or false. One word all lowercase. Response 1: {label} Response 2: {conclusion}	5	text → text	OpenAI/GPT 4o	Sample - N/A	completed	00:00:02	7 months ago	ox	251 tokens	$ 0.0007
Sample	Are the two responses equivalent? Ignore punctuation and irrelevant characters. Reply with true or false. One word all lowercase. Response 1: {label} Response 2: {conclusion}	5	text → text	OpenAI/GPT 4o	Sample - N/A	completed	00:00:02	7 months ago	ox	226 tokens	$ 0.0006
Sample	Are the two responses equivalent? Reply with true or false. One word all lowercase. Response 1: {label} Response 2: {conclusion}	5	text → text	OpenAI/GPT 4o mini	Sample - N/A	completed	00:00:02	7 months ago	ox	196 tokens	$ 0.0000
Sample	Are the two responses equivalent? Reply with true or false. Response 1: {label} Response 2: {conclusion}	5	text → text	OpenAI/GPT 4o mini	Sample - N/A	completed	00:00:02	7 months ago	ox	173 tokens	$ 0.0000