Repository evaluations - holodorum/ultrachat_200k

d62d7bdf-76ff-4773-8c0d-eaa8fcea53ff

OpenAI/GPT 4o minitext → text

holodorum

4 weeks ago

Prompt

Which language is the prompt written in: 
{prompt}

main

ultrachat_200k_test.parquet

completed 5 row sample1921 tokens$ 0.0003 1 iteration

Dissect Prompt

6a386cf0-ef1f-44f2-81c8-5fb3dce77a66

Meta/Llama 3.2 3B Instruct Turbotext → text

holodorum

1 month ago

Prompt

You are an expert in NLP and prompt analysis. Your task is to evaluate a **single user prompt** based on predefined categories and return structured JSON data for easier post-processing.

---
Select up to 3 topics that are most relevant to the prompt from the following list:

["Healthcare", "Finance", "Education", "Technology", "Science", "Politics", "Environment", "Ethics", "Entertainment", "History", "Philosophy", "Psychology", "Sports", "Legal", "Business", "Travel", "Food", "Art", "Literature", "Personal Development", "Programming"]

The first topic should be the most dominant in the prompt.
The second and third topics should reflect other significant themes in the discussion.
If a conversation only has one or two clear topics, set the remaining topics to None.

2. Language Style

"Formal"
"Informal"
"Mixed"

3. Grammar & Slang in User Input

"Perfect" (No mistakes, professional style)
"Minor Errors" (Small grammar/spelling mistakes, but understandable)
"Major Errors" (Frequent grammar mistakes, difficult to read)
"Contains Slang" (Uses informal slang expressions)

4. Type of Instruction Given to Assistant

Choose one category that best describes what the user is asking the assistant to do.

Content Generation → User asks for creative content, including writing, design ideas, or brainstorming responses.
Code Generation -> User asks for generation of code, code refinements, or code summarization.
Factual Inquiry → User requests objective facts, statistics, or comparisons with clear, verifiable answers.
Opinion-Seeking → User explicitly asks for subjective input, recommendations, or an evaluative stance.
Task-Oriented → User asks for structured assistance, edits, refinements, or summarization of existing content.
Conversational Engagement → User initiates casual, open-ended dialogue with no clear task or goal.

Output Format

Return structured JSON output in this format:

{
"topic": ["Art", "Healthcare", None],
"language_style": "Formal",
"grammar_slang": "Perfect",
"instruction_type": "Content Generation"
}

Instructions

Analyze the prompt
Select the 3 most relevant topics, ordered by prominence in the conversation. If there are empty slots fill them with None
Ensure responses use only predefined options for consistency in post-processing.
Do not add explanations—only return JSON.

Now, analyze the following prompt:

{prompt}

prompt_analysis

ultrachat_200k_test_sft.parquet

N/A

ultrachat_200k_test_sft.parquet

error An exception occurred indexing, getting dataframe and running evaluation: %Req.TransportError{reason: :closed} 30 / 23110 rows22852 tokens$ 0.0000 3 iterations

PromptAnalysis

e12115fe-f1f4-47a0-ae64-5ac14fc6d4ec

OpenAI/GPT 4o minitext → text

holodorum

1 month ago

Prompt

You are an expert in NLP and prompt analysis. Your task is to evaluate a **single user prompt** based on predefined categories and return structured JSON data for easier post-processing.

---
Select up to 3 topics that are most relevant to the prompt from the following list:

["Healthcare", "Finance", "Education", "Technology", "Science", "Politics", "Environment", "Ethics", "Entertainment", "History", "Philosophy", "Psychology", "Sports", "Legal", "Business", "Travel", "Food", "Art", "Literature", "Personal Development", "Programming"]

The first topic should be the most dominant in the prompt.
The second and third topics should reflect other significant themes in the discussion.
If a conversation only has one or two clear topics, set the remaining topics to None.

2. Language Style

"Formal"
"Informal"
"Mixed"

3. Grammar & Slang in User Input

"Perfect" (No mistakes, professional style)
"Minor Errors" (Small grammar/spelling mistakes, but understandable)
"Major Errors" (Frequent grammar mistakes, difficult to read)
"Contains Slang" (Uses informal slang expressions)

4. Type of Instruction Given to Assistant

Choose one category that best describes what the user is asking the assistant to do.

Content Generation → User asks for creative content, including writing, design ideas, or brainstorming responses.
Code Generation -> User asks for generation of code, code refinements, or code summarization.
Factual Inquiry → User requests objective facts, statistics, or comparisons with clear, verifiable answers.
Opinion-Seeking → User explicitly asks for subjective input, recommendations, or an evaluative stance.
Task-Oriented → User asks for structured assistance, edits, refinements, or summarization of existing content.
Conversational Engagement → User initiates casual, open-ended dialogue with no clear task or goal.

Output Format

Return structured JSON output in this format:

{
"topic": ["Art", "Healthcare", None],
"language_style": "Formal",
"grammar_slang": "Perfect",
"instruction_type": "Content Generation"
}

Instructions

Analyze the prompt
Select the 3 most relevant topics, ordered by prominence in the conversation. If there are empty slots fill them with None
Ensure responses use only predefined options for consistency in post-processing.
Do not add explanations—only return JSON.

Now, analyze the following prompt:

{prompt}

main

ultrachat_200k_test_sft.parquet

N/A

ultrachat_200k_test_sft.parquet

error An exception occurred indexing, getting dataframe and running evaluation: %Req.TransportError{reason: :closed} 30 / 23110 rows21625 tokens$ 0.0000 2 iterations

Categories, Sentiment, and language

bfb7ff77-8b1c-4e72-a241-e4bbf4efdee8

OpenAI/GPT 4o minitext → text

holodorum

1 month ago

Prompt

You are an expert in NLP and prompt analysis. Your task is to evaluate a **single user prompt** based on predefined categories and return structured JSON data for easier post-processing.

---
Select up to 3 topics that are most relevant to the prompt from the following list:

["Healthcare", "Finance", "Education", "Technology", "Science", "Politics", "Environment", "Ethics", "Entertainment", "History", "Philosophy", "Psychology", "Sports", "Legal", "Business", "Travel", "Food", "Art", "Literature", "Personal Development", "Programming"]

The first topic should be the most dominant in the prompt.
The second and third topics should reflect other significant themes in the discussion.
If a conversation only has one or two clear topics, set the remaining topics to None.

2. Language Style

"Formal"
"Informal"
"Mixed"

3. Grammar & Slang in User Input

"Perfect" (No mistakes, professional style)
"Minor Errors" (Small grammar/spelling mistakes, but understandable)
"Major Errors" (Frequent grammar mistakes, difficult to read)
"Contains Slang" (Uses informal slang expressions)

4. Type of Instruction Given to Assistant

Choose one category that best describes what the user is asking the assistant to do.

Content Generation → User asks for creative content, including writing, design ideas, or brainstorming responses.
Code Generation -> User asks for generation of code, code refinements, or code summarization.
Factual Inquiry → User requests objective facts, statistics, or comparisons with clear, verifiable answers.
Opinion-Seeking → User explicitly asks for subjective input, recommendations, or an evaluative stance.
Task-Oriented → User asks for structured assistance, edits, refinements, or summarization of existing content.
Conversational Engagement → User initiates casual, open-ended dialogue with no clear task or goal.

Output Format

Return structured JSON output in this format:

{
"topic": ["Art", "Healthcare", None],
"language_style": "Formal",
"grammar_slang": "Perfect",
"instruction_type": "Content Generation"
}

Instructions

Analyze the prompt
Select the 3 most relevant topics, ordered by prominence in the conversation. If there are empty slots fill them with None
Ensure responses use only predefined options for consistency in post-processing.
Do not add explanations—only return JSON.

Now, analyze the following prompt:

{prompt}

main

ultrachat_200k_test_sft.parquet

N/A

ultrachat_200k_test_sft.parquet

error An exception occurred indexing, getting dataframe and running evaluation: %Req.TransportError{reason: :closed} 30 / 23110 rows21635 tokens$ 0.0000 43 iterations

Evaluations