Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
generate question embedding
15082fd7-43ec-4b58-a85e-0963ad7a5bca
300 rows completed
Lilian
3 weeks ago
Prompt: question
2 iterations 6405 tokens$ 0.0001
text → embeddingsopenaiOpenAI/Text Embedding 3 - Small
Generate answer relevance question embedding
d3cde375-f55c-4a0a-8950-aa9a4e400dbe
300 rows completed
Lilian
3 weeks ago
Prompt: answer_relevance_questions
2 iterations 6001 tokens$ 0.0001
text → embeddingsopenaiOpenAI/Text Embedding 3 - Small
Answer Relevance
c2d0aa08-0c38-4f68-9cdd-d1054b804896
100 rows completed
Lilian
3 weeks ago
Prompt: Generate 3 questions for the given the answer. Generate the questions in an ordered list: 1. 2. 3. Answer: {answer}
2 iterations 19439 tokens$ 0.0060
text → textopenaiOpenAI/GPT 4o mini
Answer Relevance
2431bdb4-b306-4eee-8634-85d50f6d7960
5 row sample completed
Lilian
3 weeks ago
Prompt: Generate 3 questions for the given the answer. Generate the questions in an ordered list: 1. 2. 3. Answer: {answer}
1 iteration 288 tokens$ 0.0001
text → textopenaiOpenAI/GPT 4o mini
Context Relevance - extract relevant sentences
37b4a24f-4a5c-483f-a619-7520cb81483f
100 rows completed
Lilian
3 weeks ago
Prompt: Please extract relevant sentences from the provided context that can potentially help answer the following question. If no relevant sentences are found, or if you believe the question cannot be answered from the given context, return the phrase "Insufficient Information". While extracting candidate sentences you're not allowed to make any changes to sentences from given context. Question: {question} Context: {rag_context}
2 iterations 94526 tokens$ 0.0179
text → textopenaiOpenAI/GPT 4o mini
100 rows completed
Lilian
3 weeks ago
Prompt: Consider the given context and following statements, then determine whether they are supported by the information present in the context. Provide a brief explanation for each statement before arriving at the final verdict (Yes/No). Provide a final vertict for each statement in order at the end in the given format. Do not deviate from the specified format. Context: {rag_context} Statements: {faithfulness_statements}
3 iterations 129866 tokens$ 0.0350
text → textopenaiOpenAI/GPT 4o mini
Generate Faithful Statements
386299e2-e282-4144-b91b-dfef13311fa3
100 rows completed
Lilian
3 weeks ago
Prompt: Given a question and an answer, create one or more statements from each sentence in the given answer. The statements should be in an ordered list such as 1. First Statement 2. Second Statement etc... question: {question} answer: {answer}
2 iterations 27701 tokens$ 0.0091
text → textopenaiOpenAI/GPT 4o mini
Generate Answers
5ddfeb2b-2492-4ff7-9908-41f25bcd9854
100 rows completed
Lilian
3 weeks ago
Prompt: Considering the given context, answer the question. Context: {rag_context} Question: {question} Answer:
5 iterations 90436 tokens$ 0.2960
text → textopenaiOpenAI/GPT 4o
compute embeddings
8f61c9e5-03e9-41d7-9990-3695d69466da
2600 rows completed
Lilian
3 weeks ago
Prompt: chunk
2 iterations 1003712 tokens$ 0.0201
text → embeddingsopenaiOpenAI/Text Embedding 3 - Small
Generate Embeddings
d2587963-f62c-4495-ac4a-82d05108dfd6
1903 rows completed
Lilian
4 weeks ago
Prompt: chunk
3 iterations 938837 tokens$ 0.0188
text → embeddingsopenaiOpenAI/Text Embedding 3 - Small
Target:
conflict-main-d7b077a5-01a3-4274-b724-bd87a627fc38
Compute embeddings for chunk
b0a2d14e-0ed7-4b2e-89e4-69243a5b5114
22 / 1903 rows cancelledcancelled
Lilian
1 month ago
Prompt: chunk
2 iterations 11277 tokens$ 0.0002
text → embeddingsopenaiOpenAI/Text Embedding 3 - Small