Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
5b4617c8-813b-4d27-b133-aeececac737d
QwenQwen/Qwen 2.5 Coder 32B Instructtext → text
Bessie
ox
2 weeks ago
Classify the type of task into what programming problem is being solved. Respond with one word only

{rust_prompt}
completed 50 rows9094 tokens$ 0.0082 3 iterations
Mistral AIMistral AI/Codestral 2405text → text
Bessie
ox
4 weeks ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}
completed 500 rows503530 tokens$ 0.1857 2 iterations
QwenQwen/Qwen 2.5 Coder 32B Instructtext → text
Bessie
ox
1 month ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}
completed 500 rows452076 tokens$ 0.4069 2 iterations
1d1e2b25-5291-4193-8777-f5301e9a1d8a
OpenAIOpenAI/GPT 4otext → text
Bessie
ox
1 month ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}
conflict-gpt-4o-results-7569cb0f-5f74-4a5a-8e17-08ca4cf43b8c
completed 500 rows443889 tokens$ 2.54 1 iteration
f6817b30-ee18-4d2f-926f-ed93faa4fa2f
OpenAIOpenAI/GPT 4otext → text
Bessie
ox
1 month ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}
completed 500 rows442048 tokens$ 2.53 2 iterations
Anthropic AIAnthropic AI/Claude 3.7 Sonnettext → text
Bessie
ox
1 month ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}
completed 500 rows540294 tokens$ 4.77 2 iterations
9e1ca646-c7b6-49b2-9e68-930fb05f7ac8
OpenAIOpenAI/GPT 4.5text → text
Bessie
ox
1 month ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}
completed 500 rows435644 tokens$ 46.40 2 iterations
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
1 month ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question and unit tests, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}
```

Make sure to only respond with a single  ```rust``` block.

Question:
{rust_prompt}

Unit Tests:
{rust_test_list}

Code:
results-gpt4-o-mini-prompt-test-code
completed 500 rows316477 tokens$ 0.0926 3 iterations
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
1 month ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}
completed 500 rows436057 tokens$ 0.1479 3 iterations
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
1 month ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}
```

```tests
// Test adding small positive numbers
assert_eq!(add_nums(1, 2), 3);
// Test adding two negative numbers
assert_eq!(add_nums(-10, -2), -14);
// Test adding a positive and a negative number
assert_eq!(add_nums(-10, 2), 8);
```

Make sure to only respond with two blocks, a ```rust``` block and a ```tests``` block.

Here is the question:
{rust_prompt}
completed 5 row sample3798 tokens$ 0.0013 1 iteration
719d3f59-164d-4000-9a9f-23f3829aaab9
OpenAIOpenAI/GPT 4otext → text
Bessie
ox
1 month ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}
```

```tests
// Test adding small positive numbers
assert_eq!(add_nums(1, 2), 3);
// Test adding two negative numbers
assert_eq!(add_nums(-10, -2), -14);
// Test adding a positive and a negative number
assert_eq!(add_nums(-10, 2), 8);
```

Make sure to only respond with two blocks, a ```rust``` block and a ```tests``` block.

Here is the question:
{rust_prompt}
error 500 rows799347 tokens$ 4.57 2 iterations
53bfe154-97b8-4325-9e31-193e0670f1b5
OpenAIOpenAI/GPT 4otext → text
Bessie
ox
1 month ago
You are a pragmatic Rust programmer. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}

Code:
conflict-gpt4o-results-ec12c4eb-1f90-4f79-b44a-1c11c1ffb7f1
completed 1000 rows342289 tokens$ 1.82 2 iterations
a8fafbdd-cbc7-4cff-8bfc-9e40b7bc31dc
QwenQwen/Qwen 2.5 Coder 32B Instructtext → text
Bessie
ox
1 month ago
You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}
completed 1000 rows359571 tokens$ 0.3236 2 iterations
94a1a8fc-d0de-40ed-86fd-0c8498852e87
QwenQwen/Qwen 2.5 Coder 32B Instructtext → text
Bessie
ox
1 month ago
You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}
completed 5 row sample1609 tokens$ 0.0014 1 iteration
47dfa0a6-3a4c-4ef7-95fd-e3fc9a960775
OpenAIOpenAI/GPT 4o minitext → text
Bessie
ox
1 month ago
Classify the prompt into good or bad

{rust_prompt}
completed 5 row sample898 tokens$ 0.0002 1 iteration
ba122466-aa0b-4413-812b-43cae5cfbcff
OpenAIOpenAI/GPT 4otext → text
Bessie
ox
1 month ago
You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}
completed 1000 rows341964 tokens$ 1.86 2 iterations
e517c6d7-5fa6-4ce3-be03-613db049bf2f
DeepSeekDeepSeek/Deepseek V3text → text
Bessie
ox
1 month ago
You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}
completed 1000 rows334665 tokens$ 0.3012 3 iterations
a6e59116-f5c3-486b-bf0f-837fde381a7e
MetaMeta/Llama 3.1 8B Instructtext → text
Bessie
ox
1 month ago
You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}
conflict-llama-3.1-8b-responses-1db78d73-6a29-4a7d-bcf7-ba3eec141c83
completed 1000 rows363602 tokens$ 0.0727 2 iterations
cfe4e0d0-2027-43a4-adb1-7ffb7d67755d
MetaMeta/Llama 3.1 8B Instructtext → text
Bessie
ox
1 month ago
You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}
completed 500 rows174061 tokens$ 0.0348 4 iterations
276b5e5b-3989-493e-872d-ad2641f1790d
DeepSeekDeepSeek/Deepseek V3text → text
Bessie
ox
1 month ago
Translate the prompt to python

{prompt}
completed 5 row sample3050 tokens$ 0.0027 1 iteration
dc45b8c7-a6c6-4b8c-ad1e-0e8e75c7e34f
DeepSeekDeepSeek/Deepseek V3text → text
Bessie
ox
2 months ago
You are a pragmatic Rust programmer. Write unit tests that cover edge cases for the following code. The full code should be able to compile and run on it's own. Place the rust code inside a block like the following:

```rust
Your full code and unit test code here
```

Here's the code to write the unit tests for:

{answer}
synthetic-data
synthetic-data
completed 3000 rows3644511 tokens$ 3.28 3 iterations
dd07e31a-4186-4b21-b4c8-4f60657806bd
DeepSeekDeepSeek/Deepseek V3 (FP8)text → text
Bessie
ox
2 months ago
You are a pragmatic and experienced Rust programmer who is answering the following stack overflow post. Answer it with a short and easy to understand explaination and sample code. Make sure the code is in the format:

```rust
code goes here...
```

{prompt}
completed 1514 rows1095861 tokens$ 1.37 2 iterations
829013aa-9b98-4c11-a367-f7a7b5c5b182
DeepSeekDeepSeek/Deepseek V3text → text
Bessie
ox
2 months ago
You are a pragmatic and experienced Rust programmer who is answering the following stack overflow post. Answer it with a short and easy to understand explaination and sample code. Make sure the code is in the format:

```rust
code goes here...
```

{prompt}
synthetic-data
synthetic-data
completed 3000 rows1730068 tokens$ 1.56 2 iterations
7a3bc14e-7a6b-48a7-aa24-6d16b16e35f1
DeepSeekDeepSeek/Deepseek V3 (FP8)text → text
Bessie
ox
2 months ago
Write a random question that a {role} Rust programmer named {name} would ask about {topic}. The question should be the user asking {problem_type}. Provide sample code and or error messages if applicable. The question should be concise and to the point, and should not be too long. The example code and questions should vary and be unique. The format of the response should have a title and body. The title should be a single line, and the body should be a multi-line string. Delimit the title and body with a blank line and the words 'Title:' and 'Body:' respectively. Do not mention the name of the programmer in the question.
synthetic-data
synthetic-data
completed 3000 rows949670 tokens$ 1.19 2 iterations
491bbaea-22e2-4e62-8407-e7df1b76afac
DeepSeekDeepSeek/Deepseek V3text → text
Bessie
ox
2 months ago
Simplify the following title and body into a single question that encapsulates everything the user is asking. Strip out all the html, the question should be in plain text and contain all the context necessary. If there is any sample code or error messages, make sure to include them. Keep the necessary code for the question if provided.

Title:
{title}

Body:
{body}

Simplified Question and Code:
completed 1514 rows1262012 tokens$ 1.14 5 iterations