Multi-turn evaluation
Useful for evaluating chat applications
Multi-turn evaluations are essential for assessing the performance of chat applications and conversational AI systems. These evaluations help in understanding how well the AI handles ongoing conversations, maintains context, and provides coherent and relevant responses over multiple turns.
Role | Content |
---|---|
User | I’m curious about machine learning. |
Assistant | Machine learning is a subset of artificial intelligence that involves training algorithms to learn from data. What specifically would you like to know? |
User | How does it differ from traditional programming? |
Assistant | Traditional programming involves explicitly coding instructions, while machine learning allows the system to learn patterns from data and make decisions. Do you have a specific aspect in mind? |
Setting Up Multi-turn Evaluations
We’ll create a dataset of multi-turn conversations. We’ll treat each dataset row as an individual dialogue turn, with a conversation history, current query and expected output.
Here is an example of how a multi-turn dataset might look:
input
expected output
Before you begin
Follow the Evaluations Guide to get familiar with running experiments on Hamming AI. You should have a dataset ID and a secret key to continue with this guide.
Setting up a multi-turn evaluation - Node.js
Learn how run a multi-turn evaluation experiment with our Hamming TypeScript SDK.
Setting up a multi-turn evaluation - Python
Learn how run a multi-turn evaluation experiment with our Hamming Python SDK.