Evaluations
Run models against your data
Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.
Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.
Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.
b34cca00-3ae3-4a9b-a4a5-25673c386c41
GoogleGoogle/Gemma 3 27Btexttext
Bessie
ox
2 weeks ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}
```

```tests
// Test adding small positive numbers
assert_eq!(add_nums(1, 2), 3);
// Test adding two negative numbers
assert_eq!(add_nums(-10, -2), -14);
// Test adding a positive and a negative number
assert_eq!(add_nums(-10, 2), 8);
```

Make sure to only respond with two blocks, a ```rust``` block and a ```tests``` block.

Here is the question:
{rust_prompt}
completed 90 rows64727 tokens$ 0.0193 2 iterations
2f95b548-c04f-4415-a786-7361c53f5218
Mistral AIMistral AI/Mistral Small 3.1texttext
Bessie
ox
2 weeks ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}
```

```tests
// Test adding small positive numbers
assert_eq!(add_nums(1, 2), 3);
// Test adding two negative numbers
assert_eq!(add_nums(-10, -2), -14);
// Test adding a positive and a negative number
assert_eq!(add_nums(-10, 2), 8);
```

Make sure to only respond with two blocks, a ```rust``` block and a ```tests``` block.

Here is the question:
{rust_prompt}
completed 90 rows60464 tokens$ 0.0118 2 iterations
5db6f67a-a5dd-4a22-a70a-eec5a9976409
Mistral AIMistral AI/Mistral Small 3.1texttext
Bessie
ox
2 weeks ago
You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}
```

```tests
// Test adding small positive numbers
assert_eq!(add_nums(1, 2), 3);
// Test adding two negative numbers
assert_eq!(add_nums(-10, -2), -14);
// Test adding a positive and a negative number
assert_eq!(add_nums(-10, 2), 8);
```

Make sure to only respond with two blocks, a ```rust``` block and a ```tests``` block.

Here is the question:
{rust_prompt}
completed 5 row sample2781 tokens$ 0.0005 1 iteration