Evaluations (6)
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
o1-Mini First Evaluation
9a894265-0e13-4c10-9e59-1d489dbd4478
5 row sample 00:00:13completed
Mathias Barragan
Mathias Barragan
1 week ago
Prompt: Answer the following question only using facts from the facts given after the question. Keep your answer grounded in the facts given. If no facts given after the question, return 'None'. Question: {query} Facts: {context}
1 iteration 2846 tokens$ 0.0250
textopenaiOpenAI/o1 mini
embeddings
3874d584-6546-453f-b219-3bcde65c2928
5 row sample 00:00:01completed
Bessie
Bessie
2 weeks ago
Prompt: query
1 iteration$ 0.0000
embeddingsgoogleGoogle/Text Embedding 004
d956c3ff-c6eb-4e5e-9b6c-d2148ac41404
d956c3ff-c6eb-4e5e-9b6c-d2148ac41404
200 rows 00:02:13completed
Bessie
Bessie
2 weeks ago
Prompt: Are the following two answers equivalent? If the answers contain numeric values, only compare the numbers and not the words. Answer "true" or "false". All lowercase. Answer 1: {answer} Answer 2: {prediction}
1 iteration 15574 tokens$ 0.0404
textopenaiOpenAI/GPT-4o
Source:
gemini-flash-results
Target:
gemini-flash-results-judge
660bf91a-1dc4-42b8-b4ca-35bd32a12a64
660bf91a-1dc4-42b8-b4ca-35bd32a12a64
200 rows 00:01:33completed
Bessie
Bessie
2 weeks ago
Prompt: What is the answer to the question given the context? Only reply with text that is contained in the context. Question: {query} Context: {context} Answer:
2 iterations 63397 tokens$ 0.0055
textgoogleGoogle/Gemini 1.5 Flash
Target:
gemini-flash-results
Judge Answers w/ GPT-4o
521f0ef3-6bf8-4359-86fc-08d072ae99b6
200 rows 00:02:09completed
Bessie
Bessie
2 weeks ago
Prompt: Are the following two answers equivalent? If the answers contain numeric values, only compare the numbers and not the words. Answer "true" or "false". All lowercase. Answer 1: {answer} Answer 2: {prediction}
3 iterations 16504 tokens$ 0.0428
textopenaiOpenAI/GPT-4o
Source:
openai-answer-extract
Target:
openai-answer-judgements
Answer Extraction w/ OpenAI gpt-4o-mini
aa5a77d7-7852-47df-bba4-0998fe94c176
200 rows 00:03:15completed
Bessie
Bessie
2 weeks ago
Prompt: What is the answer to the question given the context? Only reply with text that is contained in the context. Question: {query} Context: {context} Answer:
1 iteration 59566 tokens$ 0.0107
textopenaiOpenAI/GPT-4o mini
Target:
openai-answer-extract