Repository evaluations - ox/Rust | Datasets at Oxen.ai

results/codestral-2405/predictions.parquet

codestral-results

completed 500 rows503530 tokens$ 0.1857 2 iterations

Benchmark Qwen 2.5 Coder 32B Instruct "Prompt → Unit Test + Code"

c797df4a-667c-430c-ba36-83422898d527

Qwen/Qwen 2.5 Coder 32B Instructtext → text

4 months ago

Prompt

You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}

main

results/qwen-32B/predictions.parquet

results-qwen-32B

completed 500 rows452076 tokens$ 0.4069 2 iterations

Benchmark GPT-4o "Prompt → Unit Test + Code"

1d1e2b25-5291-4193-8777-f5301e9a1d8a

OpenAI/GPT 4otext → text

4 months ago

Prompt

You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}

main

results/gpt-4o/predictions.parquet

conflict-gpt-4o-results-7569cb0f-5f74-4a5a-8e17-08ca4cf43b8c

completed 500 rows443889 tokens$ 2.54 1 iteration

Benchmark GPT 4.5 "Prompt → Unit Test + Code"

f6817b30-ee18-4d2f-926f-ed93faa4fa2f

OpenAI/GPT 4otext → text

4 months ago

Prompt

You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}

main

results/gpt-4o/predictions.parquet

gpt-4o-results

completed 500 rows442048 tokens$ 2.53 2 iterations

Benchmark Claude 3.7 "Prompt → Unit Test + Code"

cb945182-8e30-4296-b490-919d3b313dfa

Anthropic AI/Claude 3.7 Sonnettext → text

4 months ago

Prompt

You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}

main

results/claude-3.7/predictions.parquet

claude-3-7-results

completed 500 rows540294 tokens$ 4.77 2 iterations

Benchmark GPT 4.5 "Prompt → Unit Test + Code"

9e1ca646-c7b6-49b2-9e68-930fb05f7ac8

OpenAI/GPT 4.5text → text

4 months ago

Prompt

You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}

main

results/GPT4.5/predictions.parquet

gpt4-5-results

completed 500 rows435644 tokens$ 46.40 2 iterations

Benchmark gpt-4o-mini "Prompt + Unit Test → Code"

72f10054-6f20-4f82-b4a3-dc64c3b169c8

OpenAI/GPT 4o minitext → text

4 months ago

Prompt

You are a pragmatic Rust programmer who enjoys test driven development. Given the following question and unit tests, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}
```

Make sure to only respond with a single  ```rust``` block.

Question:
{rust_prompt}

Unit Tests:
{rust_test_list}

Code:

main

Benchmark gpt-4o-mini generating code and tests together in #[cfg(test)] block

results-gpt4-o-mini-prompt-test-code

predictions.parquet

completed 500 rows316477 tokens$ 0.0926 3 iterations

3cb40727-2a15-40bc-93ff-c0bdec4db157

OpenAI/GPT 4o minitext → text

4 months ago

Prompt

You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english. The tests should use super::*.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_add_nums() {
        // Test adding positive numbers
        assert_eq!(add_nums(4, 2), 6);
        // Test adding a positive and negative number
        assert_eq!(add_nums(4, -2), 2);
        // Test adding two negative numbers
        assert_eq!(add_nums(-12, -1), -13);
    }
}
```

Make sure to only respond with a single  ```rust``` block. The unit tests must be defined inside the mod tests {} module. Limit the unit tests to 3 assert statements.

Here is the question:
{rust_prompt}

main

results/GPT4-o-mini/predictions.parquet

results-gpt4-o-mini-code-and-tests

completed 500 rows436057 tokens$ 0.1479 3 iterations

Benchmark gpt-4o-mini Generating Code and Tests Together

4f64b926-6d0a-4585-a960-14f19200c31e

OpenAI/GPT 4o minitext → text

4 months ago

Prompt

You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}
```

```tests
// Test adding small positive numbers
assert_eq!(add_nums(1, 2), 3);
// Test adding two negative numbers
assert_eq!(add_nums(-10, -2), -14);
// Test adding a positive and a negative number
assert_eq!(add_nums(-10, 2), 8);
```

Make sure to only respond with two blocks, a ```rust``` block and a ```tests``` block.

Here is the question:
{rust_prompt}

main

Benchmark GPT 4o Generate Code and Tests Together

completed 5 row sample3798 tokens$ 0.0013 1 iteration

719d3f59-164d-4000-9a9f-23f3829aaab9

OpenAI/GPT 4otext → text

4 months ago

Prompt

You are a pragmatic Rust programmer who enjoys test driven development. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else. Be careful with your types, and try to limit yourself to the basic built in types and standard library functions. When writing the function you can think through how to solve the problem and perform reasoning in the comments above the function.

Then write unit tests for the function you defined. Write three unit tests for the function. The tests should be a simple line delimited list of assert! or assert_eq! statements. When writing the unit tests you can have comments specifying what you are testing in plain english.


An example output should look like the following:

```rust
/// Reasoning goes here
/// and can be multi-line
fn add_nums(x: i32, y: i32) -> i32 {
  x + y
}
```

```tests
// Test adding small positive numbers
assert_eq!(add_nums(1, 2), 3);
// Test adding two negative numbers
assert_eq!(add_nums(-10, -2), -14);
// Test adding a positive and a negative number
assert_eq!(add_nums(-10, 2), 8);
```

Make sure to only respond with two blocks, a ```rust``` block and a ```tests``` block.

Here is the question:
{rust_prompt}

main

results/GPT4-o/results_code_and_tests.parquet

results-gpt4-o-code-and-tests

error 500 rows799347 tokens$ 4.57 2 iterations

GPT-4o Apples to Apples Prompt

53bfe154-97b8-4325-9e31-193e0670f1b5

OpenAI/GPT 4otext → text

4 months ago

Prompt

You are a pragmatic Rust programmer. Given the following question, write a Rust function to complete the task. Make the code simple and easy to understand. The code should pass `cargo build` and `cargo clippy`. Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}

Code:

main

results/GPT4-o/predictions.parquet

conflict-gpt4o-results-ec12c4eb-1f90-4f79-b44a-1c11c1ffb7f1

completed 1000 rows342289 tokens$ 1.82 2 iterations

Qwen 2.5 Coder 32B Instruct Baseline Test

a8fafbdd-cbc7-4cff-8bfc-9e40b7bc31dc

Qwen/Qwen 2.5 Coder 32B Instructtext → text

4 months ago

Prompt

You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}

main

results/Qwen/Qwen-2-5-Coder-32B-Instruct/predictions.parquet

main

completed 1000 rows359571 tokens$ 0.3236 2 iterations

Qwen 2.5 Coder 32B Instruct Baseline

94a1a8fc-d0de-40ed-86fd-0c8498852e87

Qwen/Qwen 2.5 Coder 32B Instructtext → text

4 months ago

Prompt

You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}

main

47dfa0a6-3a4c-4ef7-95fd-e3fc9a960775

completed 5 row sample1609 tokens$ 0.0014 1 iteration

OpenAI/GPT 4o minitext → text

4 months ago

Prompt

Classify the prompt into good or bad

{rust_prompt}

main

cargo_test_passed_eval_smol.parquet

completed 5 row sample898 tokens$ 0.0002 1 iteration

GPT-4o Baseline

ba122466-aa0b-4413-812b-43cae5cfbcff

OpenAI/GPT 4otext → text

4 months ago

Prompt

You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}

main

cargo_test_passed_test_gpt4o_results.parquet

gpt4o-results

completed 1000 rows341964 tokens$ 1.86 2 iterations

DeepSeek V3 Baseline

e517c6d7-5fa6-4ce3-be03-613db049bf2f

DeepSeek/Deepseek V3text → text

4 months ago

Prompt

You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}

main

cargo_test_passed_test_deep_seek_v3.parquet

deepseek-v3-eval

completed 1000 rows334665 tokens$ 0.3012 3 iterations

Llama 3.1 8B Instruct Baseline

a6e59116-f5c3-486b-bf0f-837fde381a7e

Meta/Llama 3.1 8B Instructtext → text

4 months ago

Prompt

You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}

main

cargo_test_passed_test_results_llama3-1-8b.parquet

conflict-llama-3.1-8b-responses-1db78d73-6a29-4a7d-bcf7-ba3eec141c83

completed 1000 rows363602 tokens$ 0.0727 2 iterations

Llama 3.1 8B Instruct Baseline

cfe4e0d0-2027-43a4-adb1-7ffb7d67755d

Meta/Llama 3.1 8B Instructtext → text

4 months ago

Prompt

You are a helpful assistant who is a programmatic Rust software engineer and developer. Answer the user's question and provide associated code in rust.

Use the following format for the code:

```rust
```

Do not add a main function. Try to limit library usage to the standard library std. Respond with only the Rust function and nothing else.

Question:
{rust_prompt}

main

results/meta-llama/Llama-3.1-8B/predictions.parquet

llama-3.1-8b-responses

completed 500 rows174061 tokens$ 0.0348 4 iterations

276b5e5b-3989-493e-872d-ad2641f1790d

DeepSeek/Deepseek V3text → text

4 months ago

Prompt

Translate the prompt to python

{prompt}

main

cargo_synth_data.parquet

completed 5 row sample3050 tokens$ 0.0027 1 iteration

Write Unit Tests

dc45b8c7-a6c6-4b8c-ad1e-0e8e75c7e34f

DeepSeek/Deepseek V3text → text

5 months ago

Prompt

You are a pragmatic Rust programmer. Write unit tests that cover edge cases for the following code. The full code should be able to compile and run on it's own. Place the rust code inside a block like the following:

```rust
Your full code and unit test code here
```

Here's the code to write the unit tests for:

{answer}

synthetic-data

synthetic-data

Generate Stack Overflow Answers with DeepSeek-v3

completed 3000 rows3644511 tokens$ 3.28 3 iterations

dd07e31a-4186-4b21-b4c8-4f60657806bd

DeepSeek/Deepseek V3 (FP8)text → text

5 months ago

Prompt

You are a pragmatic and experienced Rust programmer who is answering the following stack overflow post. Answer it with a short and easy to understand explaination and sample code. Make sure the code is in the format:

```rust
code goes here...
```

{prompt}

main

main

Generate SFT Answers DeepSeek-v3

completed 1514 rows1095861 tokens$ 1.37 2 iterations

829013aa-9b98-4c11-a367-f7a7b5c5b182

DeepSeek/Deepseek V3text → text

5 months ago

Prompt

You are a pragmatic and experienced Rust programmer who is answering the following stack overflow post. Answer it with a short and easy to understand explaination and sample code. Make sure the code is in the format:

```rust
code goes here...
```

{prompt}

synthetic-data

synthetic-data

Generate Synthetic Prompts

completed 3000 rows1730068 tokens$ 1.56 2 iterations

7a3bc14e-7a6b-48a7-aa24-6d16b16e35f1

DeepSeek/Deepseek V3 (FP8)text → text

5 months ago

Prompt

Write a random question that a {role} Rust programmer named {name} would ask about {topic}. The question should be the user asking {problem_type}. Provide sample code and or error messages if applicable. The question should be concise and to the point, and should not be too long. The example code and questions should vary and be unique. The format of the response should have a title and body. The title should be a single line, and the body should be a multi-line string. Delimit the title and body with a blank line and the words 'Title:' and 'Body:' respectively. Do not mention the name of the programmer in the question.

synthetic-data

synthetic-data

completed 3000 rows949670 tokens$ 1.19 2 iterations

Simplify Prompts

491bbaea-22e2-4e62-8407-e7df1b76afac

DeepSeek/Deepseek V3text → text

5 months ago

Prompt

Simplify the following title and body into a single question that encapsulates everything the user is asking. Strip out all the html, the question should be in plain text and contain all the context necessary. If there is any sample code or error messages, make sure to include them. Keep the necessary code for the question if provided.

Title:
{title}

Body:
{body}

Simplified Question and Code:

main

main