Repository evaluations - elau/MMVet

Evaluations

Run models against your data

Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.

Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.

Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.

f595a1d1-0d98-4fb4-a9cb-90d1f357d346

OpenAI/GPT 4o miniimage → text

elau

4 days ago

Prompt

{image}

main

default_test.parquet

completed 5 row sample54871 tokens$ 0.0087 1 iteration

11cb8827-6df6-49c9-b4d9-33454822a8f6

Perplexity/Perplexity Sonar Deep Researchtext → text

elau

3 weeks ago

Prompt

Let's say I wait Graham's number of Planck times and then go back in time one googolplex ages of the universe for every atom in the observable universe from that point in time. What time will I have traveled to?

main

default_test.parquet

completed 5 row sample0 tokens$ 0.0000 12 iterations

92fe8f86-d478-4367-bc08-6643d187a4f5

Google/Gemma 3 27Btext → text

elau

1 month ago

Prompt

Q: {question}
A: {answer}
Task: Give a valid equation for which the answer is correct for the given question.

main

default_test.parquet

completed 5 row sample639 tokens$ 0.0000 4 iterations

f9052ca9-20fc-4d90-a6b2-89419e4e2328

OpenAI/GPT 4o minitext → text

elau

1 month ago

Prompt

test {question}

main

default_test.parquet

completed 5 row sample219 tokens$ 0.0001 1 iteration

d2615b97-d54e-4b94-a30e-d9a81486d2df

Google/Gemini 2.0 Flash Litetext → text

elau

1 month ago

Prompt

test

main

default_test.parquet

completed 5 row sample201 tokens$ 0.0001 1 iteration

f069c030-c670-4ae5-ab7c-115eb686c0bc

OpenAI/GPT 4oimage → text

elau

2 months ago

Prompt

{image}
{question}

main

default_test.parquet

completed 5 row sample2968 tokens$ 0.0105 1 iteration