Feedback Evaluation
Overviewβ
Feedback Evaluation allows you to test how your assistant or chatbot answers questions after you change its settings. Instead of waiting for new user interactions, it replays past conversations where users left feedback and regenerates the answers with your current (or proposed) configuration. A similarity score is then computed to measure how much the new answers differ from the original ones.
This feature is available for:
- Admins β to evaluate chatbot playground responses after changing the chatbot's configuration.
When to Use Itβ
Use Feedback Evaluation whenever you make a change to your assistant or chatbot and want to understand its impact on answer quality:
- You updated the system prompt and want to know if responses improved.
- You switched to a different AI model and want to compare outputs.
- You adjusted retrieval settings and want to verify answers are still consistent.
- You want a baseline measurement of how reliably your assistant answers recurring questions.
How It Worksβ
-
Trigger an evaluation βInside the assistant or in the admin panel, in the feedback tab, you can trigger evaluation for one by clicking on
or all the feedbacks by clicking onRECOMPUTE ALL ANSWERS. -
Answers are regenerated β the system replays each conversation from your feedback history, asking the assistant or chatbot the same questions again with the current (or provided) configuration.
-
Similarity is measured β for each positively-rated feedback, the regenerated answer is compared to the original. A score from 0 to 1 is assigned:
1.0β the new answer is essentially the same as the original.0.0β the new answer is completely different.- For negatively-rated feedbacks, answers are regenerated but no score is computed, since the original was already marked as wrong.
-
Results arrive in real time β scores appear as each feedback is processed. You do not have to wait for all feedbacks to finish.
Evaluation Historyβ
Once a new answer is generated, you can select it as the better answer by clicking on the
Similarity Score Explainedβ
| Score | Meaning |
|---|---|
| 0.9 β 1.0 | Answers are nearly identical β the assistant is very consistent. |
| 0.6 β 0.9 | Answers share the same intent but may differ in wording or detail. |
| 0.3 β 0.6 | Noticeable differences β worth reviewing. |
| 0.0 β 0.3 | Answers are substantially different β the configuration change has a strong impact. |
| -1 | Evaluation could not be completed for this item (error). |
A lower score is not always bad β if the original answer was poor, a very different regenerated answer may actually be an improvement. Combine similarity scores with the original feedback ratings to interpret results correctly.