Multi-turn evaluations are essential for assessing the performance of chat applications and conversational AI systems. These evaluations help in understanding how well the AI handles ongoing conversations, maintains context, and provides coherent and relevant responses over multiple turns.

RoleContent
UserI’m curious about machine learning.
AssistantMachine learning is a subset of artificial intelligence that involves training algorithms to learn from data. What specifically would you like to know?
UserHow does it differ from traditional programming?
AssistantTraditional programming involves explicitly coding instructions, while machine learning allows the system to learn patterns from data and make decisions. Do you have a specific aspect in mind?

Setting Up Multi-turn Evaluations

We’ll create a dataset of multi-turn conversations. We’ll treat each dataset row as an individual dialogue turn, with a conversation history, current query and expected output.

Here is an example of how a multi-turn dataset might look:

input

{
    "question": "Can you explain that further?",
    "conversation_history": [
        {
            "role": "user",
            "content": "I'm curious about quantum mechanics.",
        },
        {
            "role": "assistant",
            "content": "Quantum mechanics is a fundamental theory in physics. What specifically would you like to know?",
        },
        {
            "role": "user",
            "content": "How does it differ from classical physics?"
        },
        {
            "role": "assistant",
            "content": "Classical physics describes macroscopic phenomena, while quantum mechanics explains the behavior of particles at the atomic and subatomic levels. Do you have a specific aspect in mind?",
        },
    ]
}

expected output

{
    "output": "Quantum mechanics differs from classical physics in several ways. For instance, it introduces the concept of wave-particle duality, where particles can exhibit both wave-like and particle-like properties. Additionally, it incorporates the principle of uncertainty, which states that certain pairs of properties, like position and momentum, cannot be simultaneously measured with arbitrary precision."
}

Before you begin

Follow the Evaluations Guide to get familiar with running experiments on Hamming AI. You should have a dataset ID and a secret key to continue with this guide.

Setting up a multi-turn evaluation - Node.js

Learn how run a multi-turn evaluation experiment with our Hamming TypeScript SDK.

Setting up a multi-turn evaluation - Python

Learn how run a multi-turn evaluation experiment with our Hamming Python SDK.