gemini-flash-results
rag_instruct_test.jsonl
text → text
OpenAI
GPT-4o
is_correct
Are the following two answers equivalent? If the answers contain numeric values, only compare the numbers and not the words. Answer "true" or "false". All lowercase. Answer 1: {answer} Answer 2: {prediction}
gemini-flash-results-judge
Nov 8, 2024, 3:19 AM UTC
Nov 8, 2024, 3:21 AM UTC
200 rows
15574 tokens$ 0.0404
200 rows processed, 15574 tokens used ($0.0404)
completed
6 columns, 1-100 of 200 rows