Evaluations/Categories, Sentiment, and language
main
ultrachat_200k_test_sft.parquet
texttext
OpenAIOpenAI/GPT 4o
OpenAI OpenAI
prediction
You are an expert in NLP and conversational analysis. Your task is to evaluate the given conversation based on specific categories and return structured JSON data with predefined options for easier post-processing.

---

### **Input Format**
You will receive a conversation in the following format:
```json
[
  {"content": "User message", "role": "user"},
  {"content": "Assistant response", "role": "assistant"},
  ...
]

Evaluation Categories

Analyze the conversation and categorize it using the predefined values for each dimension.
1️⃣ Top 3 Topics

Select up to 3 topics that are most relevant to the conversation from the following list:

["Healthcare", "Finance", "Education", "Technology", "Science", "Politics", "Environment", "Ethics", "Entertainment", "History", "Philosophy", "Psychology", "Sports", "Legal", "Business", "Travel", "Food", "Art", "Literature", "Personal Development"]

    The first topic should be the most dominant in the conversation.
    The second and third topics should reflect other significant themes in the discussion.
    If a conversation only has one or two clear topics, leave the remaining slots empty.

2️⃣ Language Style of the Prompt

    "Formal"
    "Informal"
    "Mixed"

3️⃣ Grammar & Slang in User Input

    "Perfect" (No mistakes, professional style)
    "Minor Errors" (Small grammar/spelling mistakes, but understandable)
    "Major Errors" (Frequent grammar mistakes, difficult to read)
    "Contains Slang" (Uses informal slang expressions)

4️⃣ Context Awareness

    "Excellent" (Understands multi-turn context well)
    "Good" (Mostly keeps context, with minor slips)
    "Average" (Some loss of context, but overall understandable)
    "Weak" (Frequently forgets context or contradicts previous responses)
    "None" (Does not retain context at all)

5️⃣ Logical Progression of Conversation

    "Strong" (Ideas build logically and naturally)
    "Moderate" (Mostly logical but with some jumps)
    "Weak" (Frequent topic shifts or unnatural flow)

6️⃣ Topic Shifts

    "None" (Stays on the same topic)
    "Minor" (Small, relevant diversions)
    "Major" (Significant change in topic mid-conversation)

7️⃣ Type of Instruction Given to Assistant

    "Content Generation" (User asks for speech, story, or creative writing)
    "Factual Inquiry" (User requests objective facts, statistics, or comparisons)
    "Opinion-Seeking" (User asks for subjective opinions or recommendations)
    "Task-Oriented" (User asks for a structured task, e.g., "Summarize this")
    "Conversational Engagement" (Casual conversation, open-ended questions)

Output Format

Return a JSON object with the following structure:

{
  "topics": ["Education", "Science", "Ethics"],
  "language_style": "Formal",
  "grammar_slang": "Perfect",
  "context_awareness": "Excellent",
  "logical_progression": "Strong",
  "topic_shifts": "Minor",
  "instruction_type": "Factual Inquiry"
}

Instructions

✅ Select up to 3 most relevant topics, ordered by prominence in the conversation.
✅ If a conversation clearly belongs to only one or two topics, leave the extra slots blank.
✅ Avoid assuming "Healthcare" unless explicitly related to medical treatment, conditions, or policies.
✅ Science-related discussions should be labeled "Science" unless primarily about education, in which case "Education" should be prioritized.
✅ Ensure responses use only predefined options for consistency in post-processing.
✅ Do not add explanations—only return JSON.

Now, evaluate the following conversation:

{{messages}}
Mar 16, 2025, 11:30 AM UTC
Mar 16, 2025, 11:30 AM UTC
5 row sample
4220 tokens$ 0.0138
5 rows processed, 4220 tokens used ($0.0138)
Estimated cost for all 23110 rows: $64.01
Sample Results completed
4 columns, 1-5 of 23110 rows