Repository evaluations - Lilian/Arxiv-Dive-RAG

Evaluations

Run models against your data

Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.

Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.

Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.

generate question embedding

15082fd7-43ec-4b58-a85e-0963ad7a5bca

OpenAI/Text Embedding 3 - Smalltext → embeddings

Lilian

2 months ago

Prompt

question

main

data/100_questions/6_answer_relevance_questions_expanded.parquet

main

data/100_questions/7_answer_relevance_questions_expanded_embedding.parquet

completed 300 rows6405 tokens$ 0.0001 2 iterations

Generate answer relevance question embedding

d3cde375-f55c-4a0a-8950-aa9a4e400dbe

OpenAI/Text Embedding 3 - Smalltext → embeddings

Lilian

2 months ago

Prompt

answer_relevance_questions

main

data/6_answer_relevance_questions_expanded.parquet

main

data/100_questions/6_answer_relevance_questions_expanded.parquet

completed 300 rows6001 tokens$ 0.0001 2 iterations

Answer Relevance

c2d0aa08-0c38-4f68-9cdd-d1054b804896

OpenAI/GPT 4o minitext → text

Lilian

2 months ago

Prompt

Generate 3 questions for the given the answer. Generate the questions in an ordered list:

1.
2.
3.

Answer: {answer}

main

data/100_questions/1_generated_answer.parquet

main

data/100_questions/5_answer_relevance_questions.parquet

completed 100 rows19439 tokens$ 0.0060 2 iterations

Answer Relevance

2431bdb4-b306-4eee-8634-85d50f6d7960

OpenAI/GPT 4o minitext → text

Lilian

2 months ago

Prompt

Generate 3 questions for the given the answer. Generate the questions in an ordered list:

1.
2.
3.

Answer: {answer}

main

data/search_results_new.parquet

completed 5 row sample288 tokens$ 0.0001 1 iteration

Context Relevance - extract relevant sentences

37b4a24f-4a5c-483f-a619-7520cb81483f

OpenAI/GPT 4o minitext → text

Lilian

2 months ago

Prompt

Please extract relevant sentences from the provided context that can potentially help answer the following question. If no relevant sentences are found, or if you believe the question cannot be answered from the given context, return the phrase "Insufficient Information". While extracting candidate sentences you're not allowed to make any changes to sentences from given context.

Question:
{question}

Context:
{rag_context}

main

data/search_results_new.parquet

main

data/100_questions/4_context_relevance_extract.parquet

completed 100 rows94526 tokens$ 0.0179 2 iterations

Determine the statement whether can be determined by context

bd1832d2-7a60-4e5d-94d3-a786d35b666d

OpenAI/GPT 4o minitext → text

Lilian

2 months ago

Prompt

Consider the given context and following statements, then determine whether they are supported by the information present in the context. Provide a brief explanation for each statement before arriving at the final verdict (Yes/No). Provide a final vertict for each statement in order at the end in the given format. Do not deviate from the specified format.

Context:
{rag_context}

Statements:
{faithfulness_statements}

main

data/100_questions/2_generate_faithful_statement.parquet

main

data/100_questions/3_statement_faithfulness.parquet

completed 100 rows129866 tokens$ 0.0350 3 iterations

Generate Faithful Statements

386299e2-e282-4144-b91b-dfef13311fa3

OpenAI/GPT 4o minitext → text

Lilian

2 months ago

Prompt

Given a question and an answer, create one or more statements from each sentence in the given answer.

The statements should be in an ordered list such as

1. First Statement
2. Second Statement
etc...

question: {question}

answer: {answer}

main

data/100_questions/1_generated_answer.parquet

main

data/100_questions/2_generate_faithful_statement.parquet

completed 100 rows27701 tokens$ 0.0091 2 iterations

Generate Answers

5ddfeb2b-2492-4ff7-9908-41f25bcd9854

OpenAI/GPT 4otext → text

Lilian

2 months ago

Prompt

Considering the given context, answer the question.

Context:
{rag_context}

Question:
{question}

Answer:

main

data/search_results_new.parquet

main

data/100_questions/1_generated_answer.parquet

completed 100 rows90436 tokens$ 0.2960 5 iterations

compute embeddings

8f61c9e5-03e9-41d7-9990-3695d69466da

OpenAI/Text Embedding 3 - Smalltext → embeddings

Lilian

2 months ago

Prompt

chunk

main

data/arxiv_markdown_chunks_256_to_512.parquet

main

data/arxiv_markdown_chunks_256_to_512_embeddings.parquet

completed 2600 rows1003712 tokens$ 0.0201 2 iterations

Generate Embeddings

d2587963-f62c-4495-ac4a-82d05108dfd6

OpenAI/Text Embedding 3 - Smalltext → embeddings

Lilian

2 months ago

Prompt

chunk

main

data/arxiv_markdown_chunks_512_to_1024.parquet

conflict-main-d7b077a5-01a3-4274-b724-bd87a627fc38

data/arxiv_markdown_chunks_512_to_1024_embedding.parquet

completed 1903 rows938837 tokens$ 0.0188 3 iterations

Compute embeddings for chunk

b0a2d14e-0ed7-4b2e-89e4-69243a5b5114

OpenAI/Text Embedding 3 - Smalltext → embeddings

Lilian

2 months ago

Prompt

chunk

main

data/arxiv_markdown_chunks_512_to_1024.parquet

N/A

data/arxiv_markdown_chunks_512_to_1024_embedding.parquet

cancelled cancelled 22 / 1903 rows11277 tokens$ 0.0002 2 iterations