Repository evaluations - ox/RAG-Answer-Extraction

Evaluations

Run models against your data

Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.

Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.

Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.

o1-Mini First Evaluation

9a894265-0e13-4c10-9e59-1d489dbd4478

OpenAI/o1 minitext → text

mathias

5 months ago

Prompt

Answer the following question only using facts from the facts given after the question.
Keep your answer grounded in the facts given.
If no facts given after the question, return 'None'.
Question:
{query}


Facts:
{context}

main

rag_instruct_test.jsonl

completed 5 row sample2846 tokens$ 0.0250 1 iteration

embeddings

3874d584-6546-453f-b219-3bcde65c2928

Google/Text Embedding 004text → embeddings

5 months ago

Prompt

query

main

rag_instruct_test.jsonl

completed 5 row sample0 tokens$ 0.0000 1 iteration

d956c3ff-c6eb-4e5e-9b6c-d2148ac41404

OpenAI/GPT 4otext → text

5 months ago

Prompt

Are the following two answers equivalent? If the answers contain numeric values, only compare the numbers and not the words. Answer "true" or "false". All lowercase.

Answer 1: {answer}
Answer 2: {prediction}

gemini-flash-results

rag_instruct_test.jsonl

gemini-flash-results-judge

rag_instruct_test.jsonl

completed 200 rows15574 tokens$ 0.0404 1 iteration

660bf91a-1dc4-42b8-b4ca-35bd32a12a64

Google/Gemini 1.5 Flashtext → text

5 months ago

Prompt

What is the answer to the question given the context? Only reply with text that is contained in the context.

Question:
{query}

Context:
{context}

Answer:

main

rag_instruct_test.jsonl

gemini-flash-results

rag_instruct_test.jsonl

completed 200 rows63397 tokens$ 0.0055 2 iterations

Judge Answers w/ GPT-4o

521f0ef3-6bf8-4359-86fc-08d072ae99b6

OpenAI/GPT 4otext → text

5 months ago

Prompt

Are the following two answers equivalent? If the answers contain numeric values, only compare the numbers and not the words. Answer "true" or "false". All lowercase.

Answer 1: {answer}
Answer 2: {prediction}

openai-answer-extract

rag_instruct_test.jsonl

openai-answer-judgements

rag_instruct_test.jsonl

completed 200 rows16504 tokens$ 0.0428 3 iterations

Answer Extraction w/ OpenAI gpt-4o-mini

aa5a77d7-7852-47df-bba4-0998fe94c176

OpenAI/GPT 4o minitext → text

5 months ago

Prompt

What is the answer to the question given the context? Only reply with text that is contained in the context.

Question:
{query}

Context:
{context}

Answer:

main

rag_instruct_test.jsonl

openai-answer-extract

rag_instruct_test.jsonl

completed 200 rows59566 tokens$ 0.0107 1 iteration