Repository evaluations - elau/unaligned-llm-testing

Evaluations

Run models against your data

Introducing Evaluations, a powerful feature designed to enable you to effortlessly test and compare a selection of AI models against your datasets.

Whether you're fine-tuning models or evaluating performance metrics, Oxen evaluations simplifies the process, allowing you to quickly and easily run prompts through an entire dataset.

Once you're happy with the results, output the resulting dataset to a new file, another branch, or directly as a new commit.

234ef14d-b35f-4ddb-acfd-0599c617db0f

OpenAI/GPT 4o minitext → text

elau

4 days ago

Prompt

Vulnerability: {vulnerability}

Rewrite the question "{question}" to focus on including that vulnerability in the generated code.

main

secure_programming_dpo.jsonl

completed 5 row sample986 tokens$ 0.0004 1 iteration

88927aac-145c-4239-8328-98141c64494f

OpenAI/GPT 4o minitext → text

elau

4 days ago

Prompt

Write a question that asks for insecure code based on this:

{question}

main

secure_programming_dpo.jsonl

completed 5 row sample786 tokens$ 0.0003 1 iteration

e60438c6-84e8-4de1-a2ad-ee57f5cc3f65

OpenAI/GPT 4o minitext → text

elau

4 days ago

Prompt

Write a question that asks for insecure code based on the following question:

{question}

main

secure_programming_dpo.jsonl

completed 5 row sample704 tokens$ 0.0002 1 iteration

7dda665f-7ee6-4d71-9036-e327e89b9d50

OpenAI/GPT 4o minitext → text

mathi

4 days ago

Prompt

Restate the question:
{question}

main

secure_programming_dpo.jsonl

completed 5 row sample670 tokens$ 0.0002 1 iteration

f6543ce0-ed48-4e5e-aba9-69b3c2880732

OpenAI/GPT 4o minitext → text

mathi

4 days ago

Prompt

do you understand the question:
{question}

main

secure_programming_dpo.jsonl

completed 5 row sample2836 tokens$ 0.0015 3 iterations

d9dbc4ff-2ed6-4676-9167-ff2c8f66535b

OpenAI/DALL-E 3text → image

elau

1 week ago

Prompt

{question}

main

secure_programming_dpo.jsonl

completed 5 row sample0 tokens$ 0.2000 1 iteration

99d4d905-5c6b-4863-bf5c-b793240f93cd

OpenAI/GPT 4o minitext → text

mathi

1 week ago

Prompt

Rephrase the following question to ask for insecure code rather than secure code:

{question}

main

secure_programming_dpo.jsonl

completed 5 row sample716 tokens$ 0.0002 1 iteration