Feedback Evaluation

Overview

Feedback Evaluation allows you to test how your assistant or chatbot answers questions after you change its settings. Instead of waiting for new user interactions, it replays past conversations where users left feedback and regenerates the answers with your current (or proposed) configuration. A similarity score is then computed to measure how much the new answers differ from the original ones.

This feature is available for:

Organization admins — to evaluate chatbot playground responses after changing the chatbot's configuration.

When to Use It

Use Feedback Evaluation whenever you make a change to your assistant or chatbot and want to understand its impact on answer quality:

You updated the system prompt and want to know if responses improved.
You switched to a different AI model and want to compare outputs.
You adjusted retrieval settings and want to verify answers are still consistent.
You want a baseline measurement of how reliably your assistant answers recurring questions.

How It Works

Trigger an evaluation —Inside the assistant or in the admin panel, in the feedback tab, you can trigger evaluation for one by clicking on
or all the feedbacks by clicking on
RECOMPUTE ALL ANSWERS
.
Answers are regenerated — the system replays each conversation from your feedback history, asking the assistant or chatbot the same questions again with the current (or provided) configuration.
Similarity is measured — for each positively-rated feedback, the regenerated answer is compared to the original. A score from 0 to 1 is assigned:
- 1.0 — the new answer is essentially the same as the original.
- 0.0 — the new answer is completely different.
- For negatively-rated feedbacks, answers are regenerated but no score is computed, since the original was already marked as wrong.
Results arrive in real time — scores appear as each feedback is processed. You do not have to wait for all feedbacks to finish.

Evaluation History

Once a new answer is generated, you can select it as the better answer by clicking on the

THIS ANSWER IS BETTER

button to be saved in the feedbacks. This is useful to direct the assistant or chatbot answers in the desired direction.

Similarity Score Explained

Score	Meaning
0.9 – 1.0	Answers are nearly identical — the assistant is very consistent.
0.6 – 0.9	Answers share the same intent but may differ in wording or detail.
0.3 – 0.6	Noticeable differences — worth reviewing.
0.0 – 0.3	Answers are substantially different — the configuration change has a strong impact.
-1	Evaluation could not be completed for this item (error).

A lower score is not always bad — if the original answer was poor, a very different regenerated answer may actually be an improvement. Combine similarity scores with the original feedback ratings to interpret results correctly.

Overview​

When to Use It​

How It Works​

Evaluation History​

Similarity Score Explained​

Overview

When to Use It

How It Works

Evaluation History

Similarity Score Explained