Skip to main content

Feedback Evaluation

Overview​

Feedback Evaluation allows you to test how your assistant or chatbot answers questions after you change its settings. Instead of waiting for new user interactions, it replays past conversations where users left feedback and regenerates the answers with your current (or proposed) configuration. A similarity score is then computed to measure how much the new answers differ from the original ones.

This feature is available for:

  • Organization admins β€” to evaluate chatbot playground responses after changing the chatbot's configuration.

When to Use It​

Use Feedback Evaluation whenever you make a change to your assistant or chatbot and want to understand its impact on answer quality:

  • You updated the system prompt and want to know if responses improved.
  • You switched to a different AI model and want to compare outputs.
  • You adjusted retrieval settings and want to verify answers are still consistent.
  • You want a baseline measurement of how reliably your assistant answers recurring questions.

How It Works​

  1. Trigger an evaluation β€”Inside the assistant or in the admin panel, in the feedback tab, you can trigger evaluation for one by clicking on

    or all the feedbacks by clicking on
    RECOMPUTE ALL ANSWERS
    .

  2. Answers are regenerated β€” the system replays each conversation from your feedback history, asking the assistant or chatbot the same questions again with the current (or provided) configuration.

  3. Similarity is measured β€” for each positively-rated feedback, the regenerated answer is compared to the original. A score from 0 to 1 is assigned:

    • 1.0 β€” the new answer is essentially the same as the original.
    • 0.0 β€” the new answer is completely different.
    • For negatively-rated feedbacks, answers are regenerated but no score is computed, since the original was already marked as wrong.
  4. Results arrive in real time β€” scores appear as each feedback is processed. You do not have to wait for all feedbacks to finish.

Evaluation History​

Once a new answer is generated, you can select it as the better answer by clicking on the

THIS ANSWER IS BETTER
button to be saved in the feedbacks. This is useful to direct the assistant or chatbot answers in the desired direction.

Similarity Score Explained​

ScoreMeaning
0.9 – 1.0Answers are nearly identical β€” the assistant is very consistent.
0.6 – 0.9Answers share the same intent but may differ in wording or detail.
0.3 – 0.6Noticeable differences β€” worth reviewing.
0.0 – 0.3Answers are substantially different β€” the configuration change has a strong impact.
-1Evaluation could not be completed for this item (error).

A lower score is not always bad β€” if the original answer was poor, a very different regenerated answer may actually be an improvement. Combine similarity scores with the original feedback ratings to interpret results correctly.

Join Us

We value your feedback and are always here to assist you.
If you need additionnal help, feel free to join our Discord server. We look forward to hearing from you!

Discord Community Server