Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
e8049bcd-343d-428f-b8f6-3345578dc606
e8049bcd-343d-428f-b8f6-3345578dc606 56 rows completed
lilian
2 weeks ago
Prompt: Based on the premise,determine whether to agree with the hypothesis. Respond with only 0, 1 or 2. 2 for Support, 0 for Oppose, or 1 for Neutral.
premise:
{premise}
hypothesis:
{hypothesis}
text → textOpenAI/GPT-4o mini
Source:
Target:
Qwen 72B on Boolq Validation
4c00221f-59ec-4771-a0cc-ad3e990516e5 3270 rows completed
lilian
2 weeks ago
Prompt: According to the context, answer the question with only 1 or 0. 1 for yes and 0 for no
context:
{passage}
question:
{question}
text → textFireworks AI/Qwen2.5 72B Instruct
Source:
conflict-main-2a556f1d-96da-402a-8bbe-15e1058c8d21
Target:
conflict-main-2a556f1d-96da-402a-8bbe-15e1058c8d21
llama 70B on Commitment Bank
bd87f366-457c-4190-8b4a-13f6abd77948 56 rows completed
lilian
2 weeks ago
Prompt: Based on the premise,determine whether to agree with the hypothesis. Respond with only 0, 1 or 2. 2 for Support, 0 for Oppose, or 1 for Neutral.
premise:
{premise}
hypothesis:
{hypothesis}
text → textFireworks AI/Llama v3.1 70B Instruct
llama 70B on Boolq
dc29989f-2d95-4996-9589-f14abacc86db 3270 rows completed
lilian
2 weeks ago
Prompt: According to the context, answer the question with only 1 or 0. 1 for yes and 0 for no
context:
{passage}
question:
{question}
text → textFireworks AI/Llama v3.1 70B Instruct
Source:
conflict-main-2a556f1d-96da-402a-8bbe-15e1058c8d21
Target:
conflict-main-2a556f1d-96da-402a-8bbe-15e1058c8d21
Qwen on Commitment Bank Q
b939a272-43f2-46ac-9e26-3b3cef2f456a 56 rows completed
lilian
2 weeks ago
Prompt: Based on the premise,determine whether to agree with the hypothesis. Respond with only 0, 1 or 2. 2 for Support, 0 for Oppose, or 1 for Neutral.
premise:
{premise}
hypothesis:
{hypothesis}
text → textFireworks AI/Qwen2.5 72B Instruct
Source:
Target:
Qwen 72B on Boolq Validation
a5631f40-d8b1-4343-99e1-32aa2ca08ad7 2700 / 3270 rowserror
lilian
2 weeks ago
Prompt: According to the context, answer the question with only 1 or 0. 1 for yes and 0 for no
context:
{passage}
question:
{question}
text → textFireworks AI/Qwen2.5 72B Instruct
Source:
Qwen 72B on Boolq Validation
3f7688c6-01b2-4c3d-a50a-476c550ca736 5 row sample completed
lilian
2 weeks ago
Prompt: According to the context, answer the question with only 1 or 0. 1 for yes and 0 for no
context:
{passage}
question:
{question}
text → textFireworks AI/Qwen2.5 72B Instruct
Source:
228b59d1-f298-4849-a01a-93511511a5c4
228b59d1-f298-4849-a01a-93511511a5c4 5 row sample completed
lilian
2 weeks ago
Prompt: According to the context, answer the question concisely. Try to answer in one or a few words.
context:
{paragraph}
question:
{question}
text → textFireworks AI/Qwen2.5 72B Instruct
Source:
bbad6e19-ad2b-4c4b-a6bc-22f0e9f33ca3
bbad6e19-ad2b-4c4b-a6bc-22f0e9f33ca3 5 row sample completed
lilian
2 weeks ago
Prompt: According to the context, answer the question concisely. Try to answer in one or a few words.
context:
{paragraph}
question:
{question}
text → textFireworks AI/Qwen2.5 72B Instruct
test gpt4-o
7638571d-fffc-43fc-8135-437bb7b9b1c5 5 row sample completed
lilian
1 month ago
Prompt: Use the passage as context and pick the correct answer from given choices to put into the placeholder in the query. Do not output anthing besides the answer.
passage:
{passage}
choices:
{entities}
query:
{query}
text → textOpenAI/GPT-4o mini
Source:
Test llama3.1 8B on super_glue
c09749b7-c681-475e-8643-00fe5fed7d5c 5 row sample completed
lilian
1 month ago
Prompt: Use the passage as context and pick the correct answer from given choices to put into the placeholder in the query. Do not output anthing besides the answer.
passage:
{passage}
choices:
{entities}
query:
{query}
text → textFireworks AI/Llama v3.1 8B Instruct
Source:
Test llama3.1 8B
18f4ac2d-9cab-407b-9bfb-5ac9996eaebc 5 row sample completed
lilian
1 month ago
Prompt: Use the passage as context and pick the correct answer from given choices to put into the placeholder in the query. Do not output anything besides the answer.
passage:
{passage}
choices:
{entities}
query:
{query}
text → textOpenAI/GPT-4o mini
Source:
Test llama3.1 8B on super_glue
cb56be48-8656-48c1-be12-727cb2ae1b48 20 row sample completed
lilian
1 month ago
Prompt: Use the passage as context and pick the correct answer from given choices to put into the placeholder in the query. Do not output anthing besides the answer.
passage:
{passage}
query:
{query}
choices:
{entities}
text → textFireworks AI/Llama v3.1 8B Instruct
Source: