Human feedback evaluation for RAG pipeline. (2024)

SuperKnowa framework for collection human feedback for the application of Retrieval-Augmented Generation (RAG) pipeline.

Generative AI is a branch of artificial intelligence that focuses on generating new content, such as text, images, music, or other forms of data. However, ensuring the accuracy and reliability of these models remains a critical challenge. Collecting input from users can play a crucial role in assessing the model’s response and enhancing its performance and implementing for the Retrieval-Augmented Generation (RAG) pipeline. In this blog, we will explore how to collect user feedback on different LLM model responses, comparing the effectiveness of rating, ranking, comments, and QA questionnaires.

To implement the feedback mechanism, we focused on fine-grained human feedback on the basis of the following criteria,

  • Output influenced by lack of relevance, repetition, and inconsistency.
  • Generated output containing inaccurate or unverifiable information.
  • Generated response is missing or partial information.

To compare the effectiveness of these feedback mechanisms, we conducted an experiment utilizing three text generation models — flan-xxl, sandstone-coga-3.0, and coga-3.8 — within the IBM watsonx.ai platform. We asked to provide feedback using rating, ranking, comments, and QA.

Below, you’ll find explanations and intentions behind the various types of feedback we used.

  1. Rating: Rating feedback involves assigning a numerical value to a model’s response based on its perceived quality or relevance. For instance, a user might rate a generated response on a scale of 1 to 3, indicating their satisfaction with the answer. It is quick and quantifiable, suitable for assessing overall quality.

In our analysis we used 3 star rating

  • Incorrect/hallucination
  • Partial answer
  • Acceptable as is
Human feedback evaluation for RAG pipeline. (3)

2. Ranking: Ranking feedback requires users to compare and rank different model responses according to their preference. This feedback mechanism provides a finer level of bias between responses and allows the model to learn the relative quality of outputs.

We are using drag and drop mechanism to rank the models based on the model’s response.

3. Q&A Questionnaires: Yes/no questionnaires present users with specific questions about the response. Targeted feedback on specific aspects, such as relevance to the topic. It can help to gather focused feedback on the accuracy of the response.

We asked three questions for every response, ensuring that the question align logically with the user-provided answers.

  • Is the response relevant and coherent?
  • Was this a useful response with an appropriate amount of information?
  • Is the response factual and accurate, based on the document?
Human feedback evaluation for RAG pipeline. (4)

4. Comments: Comment-based feedback lets users provide textual explanations for their preferences or suggestions for improvement. These comments offer valuable insights into the strengths and weaknesses of model responses.

We collect comments for each responses as additional feedback.

Drawbacks of these feedback types:

  • Rating: Quick and easy to collect, but might lack detailed feedback.
  • Ranking: Provides fine-grained discrimination, but can be more complex for participants.
  • Comments: Offers qualitative insights, but requires more effort to analyze.
  • QA Questionnaires: Provides focused feedback on specific aspects, but may not capture the depth of qualitative insights.

While each of these types have strengths and limitations, A combination of these feedback mechanisms can provide a comprehensive understanding of response quality and user preferences. Ratings, rankings, comments, and yes/no questionnaires can collectively enhance the learning process of models response. This hybrid approach leverages the strengths of each mechanism and addresses their respective limitations.

Here’s the feedback we collected altogether, and we can use it to dig deeper and analyze it further.

{
"additional_feedback": [
{
"model_id": "flan-xxl",
"feedbackBy": "htalukder@ibm.com",
"answer": "You can perform the following tasks if you have the business role purchaser: Search and view sources of supply based on different criteria, such as material, supplier, material group, plant, purchasing organization, purchasing document category, quota arrangement, and validity period. View the relevance of different sources of supply Create or edit sources of supplyNote When you click create and choose a source of supply, you will be navigated to the respective app to provide the details for the creation of the purchasing document. Check the default source assignment of a purchase requisition for a specific material, plant, and date combination Choose Simulate SoS Assignment Enter the relevant material, plant, and date Choose Go The system will display the default sources of supply. If multiple sources of supply exist and if the system cannot determine a single default source of supply, then all sources of supply will be shown.",
"feedback": {
"relevent": "yes",
"appropriate": "yes",
"factual_accurate": "yes"
}
},
{
"additional_feedback": [
{
"model_id": "sandstone-coga-3.0",
"feedbackBy": "htalukder@ibm.com",
"answer": "With this app you can view and manage existing sources of supply for a material for a defined validity period.",
"feedback": {
"relevent": "yes",
"appropriate": "no",
"factual_accurate": "yes",
"comment": "Answer was correct but not enough information"
}
}
]
},
{
"additional_feedback": [
{
"model_id": "coga-3.8",
"feedbackBy": "htalukder@ibm.com",
"answer": "With this app you can view and manage existing sources of supply for a material for a defined validity period.",
"feedback": {
"relevent": "yes",
"appropriate": "no",
"factual_accurate": "yes",
"comment": "partially correct."
}
}
]
}
],
"answers": [
{
"model_id": "flan-xxl",
"answer": "You can perform the following tasks if you have the business role purchaser: Search and view sources of supply based on different criteria, such as material, supplier, material group, plant, purchasing organization, purchasing document category, quota arrangement, and validity period. View the relevance of different sources of supply Create or edit sources of supplyNote When you click create and choose a source of supply, you will be navigated to the respective app to provide the details for the creation of the purchasing document. Check the default source assignment of a purchase requisition for a specific material, plant, and date combination Choose Simulate SoS Assignment Enter the relevant material, plant, and date Choose Go The system will display the default sources of supply. If multiple sources of supply exist and if the system cannot determine a single default source of supply, then all sources of supply will be shown."
},
{
"model_id": "sandstone-coga-3.0",
"answer": "With this app you can view and manage existing sources of supply for a material for a defined validity period."
},
{
"model_id": "coga-3.8",
"answer": "With this app you can view and manage existing sources of supply for a material for a defined validity period."
}
],
"feedbackBy": "htalukder@ibm.com",
"feedbackDate": "2023-08-04T16:45:25.921Z",
"question": "How can I manage sources of supply?​",
"rankig": [
{
"rank": 0,
"model_id": "flan-xxl"
},
{
"rank": 1,
"model_id": "sandstone-coga-3.0"
},
{
"rank": 2,
"model_id": "coga-3.8"
}
],
"rating": [
{
"star": 3,
"model_id": "flan-xxl"
},
{
"star": 2,
"model_id": "sandstone-coga-3.0"
},
{
"star": 2,
"model_id": "coga-3.8"
}
],
}

We created an admin dashboard using the collected feedback to represent the model performance to make a better decision.

Human feedback evaluation for RAG pipeline. (5)
Human feedback evaluation for RAG pipeline. (6)
Human feedback evaluation for RAG pipeline. (7)

Conclusion:

Collecting feedback through rating, ranking, and QA enables models to learn from human preferences and insights. While each feedback mechanism has its merits, a combination of all four can provide a holistic view of response quality and user preferences. We can use these feedbacks for the application of RAG pipeline, Reward model, for the application of RLHF or statistical analysis for building more accurate, relevant, and user-friendly LM models.

The full implementation details can be found in this GitHub Repo.

Follow Towards Generative AI for more content related to latest in AI advancement.

Human feedback evaluation for RAG pipeline. (2024)

References

Top Articles
Woodfins Obituaries
BP gas stations locator - Switzerland
Spasa Parish
Rentals for rent in Maastricht
159R Bus Schedule Pdf
Sallisaw Bin Store
Black Adam Showtimes Near Maya Cinemas Delano
Espn Transfer Portal Basketball
Pollen Levels Richmond
11 Best Sites Like The Chive For Funny Pictures and Memes
Finger Lakes 1 Police Beat
Craigslist Pets Huntsville Alabama
Paulette Goddard | American Actress, Modern Times, Charlie Chaplin
Red Dead Redemption 2 Legendary Fish Locations Guide (“A Fisher of Fish”)
What's the Difference Between Halal and Haram Meat & Food?
R/Skinwalker
Rugged Gentleman Barber Shop Martinsburg Wv
Jennifer Lenzini Leaving Ktiv
Havasu Lake residents boiling over water quality as EPA assumes oversight
Justified - Streams, Episodenguide und News zur Serie
Epay. Medstarhealth.org
Olde Kegg Bar & Grill Portage Menu
Cubilabras
Half Inning In Which The Home Team Bats Crossword
Amazing Lash Bay Colony
Dirt Devil Ud70181 Parts Diagram
Truist Bank Open Saturday
Water Leaks in Your Car When It Rains? Common Causes & Fixes
What’s Closing at Disney World? A Complete Guide
New from Simply So Good - Cherry Apricot Slab Pie
Drys Pharmacy
Ohio State Football Wiki
Find Words Containing Specific Letters | WordFinder®
FirstLight Power to Acquire Leading Canadian Renewable Operator and Developer Hydromega Services Inc. - FirstLight
Webmail.unt.edu
When Is Moonset Tonight
Tri-State Dog Racing Results
Navy Qrs Supervisor Answers
Trade Chart Dave Richard
Lincoln Financial Field Section 110
Free Stuff Craigslist Roanoke Va
Wi Dept Of Regulation & Licensing
Pick N Pull Near Me [Locator Map + Guide + FAQ]
Ice Hockey Dboard
Wie blocke ich einen Bot aus Boardman/USA - sellerforum.de
Infinity Pool Showtimes Near Maya Cinemas Bakersfield
Dermpathdiagnostics Com Pay Invoice
A look back at the history of the Capital One Tower
How To Use Price Chopper Points At Quiktrip
Maria Butina Bikini
Busted Newspaper Zapata Tx
Latest Posts
Article information

Author: Fr. Dewey Fisher

Last Updated:

Views: 5669

Rating: 4.1 / 5 (62 voted)

Reviews: 93% of readers found this page helpful

Author information

Name: Fr. Dewey Fisher

Birthday: 1993-03-26

Address: 917 Hyun Views, Rogahnmouth, KY 91013-8827

Phone: +5938540192553

Job: Administration Developer

Hobby: Embroidery, Horseback riding, Juggling, Urban exploration, Skiing, Cycling, Handball

Introduction: My name is Fr. Dewey Fisher, I am a powerful, open, faithful, combative, spotless, faithful, fair person who loves writing and wants to share my knowledge and understanding with you.