Labeling of LLM
-
Currently, discussions about training LLMs primarily revolve around fine-tuning or LoRA (Low-Rank Adaptation). Pre-training LLMs is not within the scope of this discussion (as pre-trained LLMs usually do not require manual annotation). The current annotation methods can be classified into three main categories: scoring, ranking, and text annotation.
-
Scoring:
This involves rating the responses generated by the LLM. The most basic form is to rate the overall satisfaction with the response. However, beyond overall satisfaction, scoring can be detailed into several directions, such as:
Accuracy: Correctness of the answer.
Completeness: Whether the answer includes all necessary information.
Relevance: Whether the answer is directly related to the question.
Language Quality: Clarity and fluency of the language used.
Creativity: Whether the answer offers unique insights or information. -
Ranking:
This method involves having the LLM generate several responses and then manually ranking these responses. The ranking criteria can be overall satisfaction or reference the indicators mentioned in the scoring section. Combining scoring and ranking with Reinforcement Learning from Human Feedback (RLHF) annotation systems can be very effective. -
Text Annotation:
This involves providing a query with one or several manually written answers or annotating several manually written replies within a context. This annotation method is the most basic approach for fine-tuning LLMs, but it is also labor-intensive and complex, usually employed for injecting knowledge into the LLM by human experts.
-
-
-