Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
276b5e5b-3989-493e-872d-ad2641f1790d
276b5e5b-3989-493e-872d-ad2641f1790d
5 row sample completed
Bessie
Bessie
1 day ago
Prompt: Translate the prompt to python {prompt}
1 iteration 3050 tokens$ 0.0027
text → textfireworksFireworks AI/Deepseek V3
Write Unit Tests
dc45b8c7-a6c6-4b8c-ad1e-0e8e75c7e34f
3000 rows completed
Bessie
Bessie
3 weeks ago
Prompt: You are a pragmatic Rust programmer. Write unit tests that cover edge cases for the following code. The full code should be able to compile and run on it's own. Place the rust code inside a block like the following: ```rust Your full code and unit test code here ``` Here's the code to write the unit tests for: {answer}
3 iterations 3644511 tokens$ 3.28
text → textfireworksFireworks AI/Deepseek V3
Source:
synthetic-data
Target:
synthetic-data
Generate Stack Overflow Answers with DeepSeek-v3
dd07e31a-4186-4b21-b4c8-4f60657806bd
1514 rows completed
Bessie
Bessie
3 weeks ago
Prompt: You are a pragmatic and experienced Rust programmer who is answering the following stack overflow post. Answer it with a short and easy to understand explaination and sample code. Make sure the code is in the format: ```rust code goes here... ``` {prompt}
2 iterations 1095861 tokens$ 1.37
text → texttogetheraiTogether.ai/Deepseek V3 (FP8)
Generate SFT Answers DeepSeek-v3
829013aa-9b98-4c11-a367-f7a7b5c5b182
3000 rows completed
Bessie
Bessie
3 weeks ago
Prompt: You are a pragmatic and experienced Rust programmer who is answering the following stack overflow post. Answer it with a short and easy to understand explaination and sample code. Make sure the code is in the format: ```rust code goes here... ``` {prompt}
2 iterations 1730068 tokens$ 1.56
text → textfireworksFireworks AI/Deepseek V3
Source:
synthetic-data
Target:
synthetic-data
Generate Synthetic Prompts
7a3bc14e-7a6b-48a7-aa24-6d16b16e35f1
3000 rows completed
Bessie
Bessie
3 weeks ago
Prompt: Write a random question that a {role} Rust programmer named {name} would ask about {topic}. The question should be the user asking {problem_type}. Provide sample code and or error messages if applicable. The question should be concise and to the point, and should not be too long. The example code and questions should vary and be unique. The format of the response should have a title and body. The title should be a single line, and the body should be a multi-line string. Delimit the title and body with a blank line and the words 'Title:' and 'Body:' respectively. Do not mention the name of the programmer in the question.
2 iterations 949670 tokens$ 1.19
text → texttogetheraiTogether.ai/Deepseek V3 (FP8)
Source:
synthetic-data
Target:
synthetic-data
Simplify Prompts
491bbaea-22e2-4e62-8407-e7df1b76afac
1514 rows completed
Bessie
Bessie
3 weeks ago
Prompt: Simplify the following title and body into a single question that encapsulates everything the user is asking. Strip out all the html, the question should be in plain text and contain all the context necessary. If there is any sample code or error messages, make sure to include them. Keep the necessary code for the question if provided. Title: {title} Body: {body} Simplified Question and Code:
5 iterations 1262012 tokens$ 1.14
text → textfireworksFireworks AI/Deepseek V3