Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
5b4617c8-813b-4d27-b133-aeececac737d

ox
2 weeks agoClassify the type of task into what programming problem is being solved. Respond with one word only {rust_prompt}
eef3cb0f-2529-49a1-8b6e-271ed4f174cf

ox
4 weeks agoYou are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function. Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*. An example output should look like the following: ```rust /// Reasoning goes here /// and can be multi-line fn add_nums(x: i32, y: i32) -> i32 { x + y } #[cfg(test)] mod tests { use super::*; #[test] fn test_add_nums() { // Test adding positive numbers assert_eq!(add_nums(4, 2), 6); // Test adding a positive and negative number assert_eq!(add_nums(4, -2), 2); // Test adding two negative numbers assert_eq!(add_nums(-12, -1), -13); } } ``` Make sure to only respond with a single ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements. Here is the question: {rust_prompt}
codestral-results
c797df4a-667c-430c-ba36-83422898d527

ox
1 month agoYou are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function. Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*. An example output should look like the following: ```rust /// Reasoning goes here /// and can be multi-line fn add_nums(x: i32, y: i32) -> i32 { x + y } #[cfg(test)] mod tests { use super::*; #[test] fn test_add_nums() { // Test adding positive numbers assert_eq!(add_nums(4, 2), 6); // Test adding a positive and negative number assert_eq!(add_nums(4, -2), 2); // Test adding two negative numbers assert_eq!(add_nums(-12, -1), -13); } } ``` Make sure to only respond with a single ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements. Here is the question: {rust_prompt}
results-qwen-32B
1d1e2b25-5291-4193-8777-f5301e9a1d8a

ox
1 month agoYou are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function. Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*. An example output should look like the following: ```rust /// Reasoning goes here /// and can be multi-line fn add_nums(x: i32, y: i32) -> i32 { x + y } #[cfg(test)] mod tests { use super::*; #[test] fn test_add_nums() { // Test adding positive numbers assert_eq!(add_nums(4, 2), 6); // Test adding a positive and negative number assert_eq!(add_nums(4, -2), 2); // Test adding two negative numbers assert_eq!(add_nums(-12, -1), -13); } } ``` Make sure to only respond with a single ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements. Here is the question: {rust_prompt}
conflict-gpt-4o-results-7569cb0f-5f74-4a5a-8e17-08ca4cf43b8c
f6817b30-ee18-4d2f-926f-ed93faa4fa2f

ox
1 month agoYou are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function. Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*. An example output should look like the following: ```rust /// Reasoning goes here /// and can be multi-line fn add_nums(x: i32, y: i32) -> i32 { x + y } #[cfg(test)] mod tests { use super::*; #[test] fn test_add_nums() { // Test adding positive numbers assert_eq!(add_nums(4, 2), 6); // Test adding a positive and negative number assert_eq!(add_nums(4, -2), 2); // Test adding two negative numbers assert_eq!(add_nums(-12, -1), -13); } } ``` Make sure to only respond with a single ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements. Here is the question: {rust_prompt}
gpt-4o-results
cb945182-8e30-4296-b490-919d3b313dfa

ox
1 month agoYou are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function. Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*. An example output should look like the following: ```rust /// Reasoning goes here /// and can be multi-line fn add_nums(x: i32, y: i32) -> i32 { x + y } #[cfg(test)] mod tests { use super::*; #[test] fn test_add_nums() { // Test adding positive numbers assert_eq!(add_nums(4, 2), 6); // Test adding a positive and negative number assert_eq!(add_nums(4, -2), 2); // Test adding two negative numbers assert_eq!(add_nums(-12, -1), -13); } } ``` Make sure to only respond with a single ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements. Here is the question: {rust_prompt}
claude-3-7-results
9e1ca646-c7b6-49b2-9e68-930fb05f7ac8

ox
1 month agoYou are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function. Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*. An example output should look like the following: ```rust /// Reasoning goes here /// and can be multi-line fn add_nums(x: i32, y: i32) -> i32 { x + y } #[cfg(test)] mod tests { use super::*; #[test] fn test_add_nums() { // Test adding positive numbers assert_eq!(add_nums(4, 2), 6); // Test adding a positive and negative number assert_eq!(add_nums(4, -2), 2); // Test adding two negative numbers assert_eq!(add_nums(-12, -1), -13); } } ``` Make sure to only respond with a single ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements. Here is the question: {rust_prompt}
gpt4-5-results
72f10054-6f20-4f82-b4a3-dc64c3b169c8

ox
1 month agoYou are a pragmatic Rust programmer who enjoys test driven development. Given the following question and unit tests, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function. An example output should look like the following: ```rust /// Reasoning goes here /// and can be multi-line fn add_nums(x: i32, y: i32) -> i32 { x + y } ``` Make sure to only respond with a single ```rust``` block. Question: {rust_prompt} Unit Tests: {rust_test_list} Code:
results-gpt4-o-mini-prompt-test-code
3cb40727-2a15-40bc-93ff-c0bdec4db157

ox
1 month agoYou are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function. Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*. An example output should look like the following: ```rust /// Reasoning goes here /// and can be multi-line fn add_nums(x: i32, y: i32) -> i32 { x + y } #[cfg(test)] mod tests { use super::*; #[test] fn test_add_nums() { // Test adding positive numbers assert_eq!(add_nums(4, 2), 6); // Test adding a positive and negative number assert_eq!(add_nums(4, -2), 2); // Test adding two negative numbers assert_eq!(add_nums(-12, -1), -13); } } ``` Make sure to only respond with a single ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements. Here is the question: {rust_prompt}
results-gpt4-o-mini-code-and-tests
4f64b926-6d0a-4585-a960-14f19200c31e

ox
1 month agoYou are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function. Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. An example output should look like the following: ```rust /// Reasoning goes here /// and can be multi-line fn add_nums(x: i32, y: i32) -> i32 { x + y } ``` ```tests // Test adding small positive numbers assert_eq!(add_nums(1, 2), 3); // Test adding two negative numbers assert_eq!(add_nums(-10, -2), -14); // Test adding a positive and a negative number assert_eq!(add_nums(-10, 2), 8); ``` Make sure to only respond with two blocks, a ```rust``` block and a ```tests``` block. Here is the question: {rust_prompt}
719d3f59-164d-4000-9a9f-23f3829aaab9

ox
1 month agoYou are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function. Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. An example output should look like the following: ```rust /// Reasoning goes here /// and can be multi-line fn add_nums(x: i32, y: i32) -> i32 { x + y } ``` ```tests // Test adding small positive numbers assert_eq!(add_nums(1, 2), 3); // Test adding two negative numbers assert_eq!(add_nums(-10, -2), -14); // Test adding a positive and a negative number assert_eq!(add_nums(-10, 2), 8); ``` Make sure to only respond with two blocks, a ```rust``` block and a ```tests``` block. Here is the question: {rust_prompt}
results-gpt4-o-code-and-tests
53bfe154-97b8-4325-9e31-193e0670f1b5

ox
1 month agoYou are a pragmatic Rust programmer. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Question: {rust_prompt} Code:
conflict-gpt4o-results-ec12c4eb-1f90-4f79-b44a-1c11c1ffb7f1
a8fafbdd-cbc7-4cff-8bfc-9e40b7bc31dc

ox
1 month agoYou are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust. Use the following format for the code: ```rust ``` Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Question: {rust_prompt}
94a1a8fc-d0de-40ed-86fd-0c8498852e87

ox
1 month agoYou are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust. Use the following format for the code: ```rust ``` Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Question: {rust_prompt}
47dfa0a6-3a4c-4ef7-95fd-e3fc9a960775

ox
1 month agoClassify the prompt into good or bad {rust_prompt}
ba122466-aa0b-4413-812b-43cae5cfbcff

ox
1 month agoYou are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust. Use the following format for the code: ```rust ``` Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Question: {rust_prompt}
e517c6d7-5fa6-4ce3-be03-613db049bf2f

ox
1 month agoYou are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust. Use the following format for the code: ```rust ``` Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Question: {rust_prompt}
deepseek-v3-eval
a6e59116-f5c3-486b-bf0f-837fde381a7e

ox
1 month agoYou are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust. Use the following format for the code: ```rust ``` Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Question: {rust_prompt}
conflict-llama-3.1-8b-responses-1db78d73-6a29-4a7d-bcf7-ba3eec141c83
cfe4e0d0-2027-43a4-adb1-7ffb7d67755d

ox
1 month agoYou are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust. Use the following format for the code: ```rust ``` Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Question: {rust_prompt}
llama-3.1-8b-responses
276b5e5b-3989-493e-872d-ad2641f1790d

ox
1 month agoTranslate the prompt to python {prompt}
dc45b8c7-a6c6-4b8c-ad1e-0e8e75c7e34f

ox
2 months agoYou are a pragmatic Rust programmer. Write unit tests that cover edge cases for the following code. The full code should be able to compile and run on it's own. Place the rust code inside a block like the following: ```rust Your full code and unit test code here ``` Here's the code to write the unit tests for: {answer}
synthetic-data
synthetic-data
dd07e31a-4186-4b21-b4c8-4f60657806bd

ox
2 months agoYou are a pragmatic and experienced Rust programmer who is answering the following stack overflow post. Answer it with a short and easy to understand explaination and sample code. Make sure the code is in the format: ```rust code goes here... ``` {prompt}
829013aa-9b98-4c11-a367-f7a7b5c5b182

ox
2 months agoYou are a pragmatic and experienced Rust programmer who is answering the following stack overflow post. Answer it with a short and easy to understand explaination and sample code. Make sure the code is in the format: ```rust code goes here... ``` {prompt}
synthetic-data
synthetic-data
7a3bc14e-7a6b-48a7-aa24-6d16b16e35f1

ox
2 months agoWrite a random question that a {role} Rust programmer named {name} would ask about {topic}. The question should be the user asking {problem_type}. Provide sample code and or error messages if applicable. The question should be concise and to the point, and should not be too long. The example code and questions should vary and be unique. The format of the response should have a title and body. The title should be a single line, and the body should be a multi-line string. Delimit the title and body with a blank line and the words 'Title:' and 'Body:' respectively. Do not mention the name of the programmer in the question.
synthetic-data
synthetic-data
491bbaea-22e2-4e62-8407-e7df1b76afac

ox
2 months agoSimplify the following title and body into a single question that encapsulates everything the user is asking. Strip out all the html, the question should be in plain text and contain all the context necessary. If there is any sample code or error messages, make sure to include them. Keep the necessary code for the question if provided. Title: {title} Body: {body} Simplified Question and Code: