Repository evaluations - holodorum/ultrachat_200k

Evaluations/Categories, Sentiment, and language

main

ultrachat_200k_test_sft.parquet

Type: text → text

Model:

OpenAI/GPT 4o

Provider:

OpenAI

Target field: prediction

Prompt

You are an expert in NLP and conversational analysis. Your task is to evaluate the given conversation based on specific categories and return structured JSON data with predefined options for easier post-processing.

### **Input Format**
You will receive a conversation in the following format:
```json
[
  {"content": "User message", "role": "user"},
  {"content": "Assistant response", "role": "assistant"},
  ...
]

Evaluation Categories

Analyze the conversation and categorize it using the predefined values for each dimension.

1️⃣ Primary Topic

    Choose the most relevant topic from the following 20 options:
    ["Healthcare", "Finance", "Education", "Technology", "Science", "Politics", "Environment", "Ethics", "Entertainment", "History", "Philosophy", "Psychology", "Sports", "Legal", "Business", "Travel", "Food", "Art", "Literature", "Personal Development"]
    If the conversation fits multiple topics, choose the most dominant one.

2️⃣ Language Style of the Prompt

    "Formal"
    "Informal"
    "Mixed"

3️⃣ Grammar & Slang in User Input

    "Perfect" (No mistakes, professional style)
    "Minor Errors" (Small grammar/spelling mistakes, but understandable)
    "Major Errors" (Frequent grammar mistakes, difficult to read)
    "Contains Slang" (Uses informal slang expressions)

4️⃣ Context Awareness

    "Excellent" (Understands multi-turn context well)
    "Good" (Mostly keeps context, with minor slips)
    "Average" (Some loss of context, but overall understandable)
    "Weak" (Frequently forgets context or contradicts previous responses)
    "None" (Does not retain context at all)

5️⃣ Logical Progression of Conversation

    "Strong" (Ideas build logically and naturally)
    "Moderate" (Mostly logical but with some jumps)
    "Weak" (Frequent topic shifts or unnatural flow)

6️⃣ Topic Shifts

    "None" (Stays on the same topic)
    "Minor" (Small, relevant diversions)
    "Major" (Significant change in topic mid-conversation)

Output Format

Return a JSON object with the following structure:

{
  "topic": "Healthcare",
  "language_style": "Formal",
  "grammar_slang": "Perfect",
  "context_awareness": "Excellent",
  "logical_progression": "Strong",
  "topic_shifts": "Minor"
}

Instructions

    Choose only from the predefined options for each category.
    Ensure consistency in responses for structured post-processing.
    Do not add additional explanations—only return JSON.

Now, evaluate the following conversation:

{{messages}}

Queued: Mar 16, 2025, 11:21 AM UTC

Completed: Mar 16, 2025, 11:21 AM UTC

5 row sample

2830 tokens$ 0.0083

5 rows processed, 2830 tokens used ($0.0083)

Estimated cost for all 23110 rows: $38.25

Sample Results completed

4 columns, 1-5 of 23110 rows