TypoCurator Dataset

rexrex9

Dataset Address

https://huggingface.co/datasets/typox-ai/Typo_Intent_OS

Dataset Description

3383 pairs of questions and answers about the Web3 knowledge base, used for training and testing AI models applied to Web3 scenarios.

Data Format
- prompt: AI-generated question.
- completion: The candidate answer selected by the majority of users.
Example:

{
  "prompt": "What is a primary advantage of using a decentralized finance (DeFi) platform?",
  "completion": "Direct peer-to-peer transactions without intermediaries."
}

Data Source

This dataset contains AI-generated questions and multiple candidate answers. The correct answers are selected by our Web3 product's end-users based on what they consider the most accurate. The answer chosen by the most users is marked as the completion. To ensure the quality of these evaluations, we use an incentive mechanism to encourage sincere responses. Additionally, we include some seed questions with known answers to filter users. Only those who perform well on these seed questions have their choices counted.

Annotation Method (Brief)

Generation: Use TypoX to generate questions and candidate answers.
TypoX is a Rag system with a Web3 knowledge base, TypoX. https://www.typox.ai/
Evaluation: User selections are conducted through the TypoCurator telegram Mini-app. https://t.me/typocurator_bot
Participation: Each question is evaluated by at least 300 people.
Majority Rule: An option must be selected by more than 75% of participants to be considered the completion.
Re-evaluation: If no option reaches 75%, the question is re-evaluated until an option reaches 80%.
Invalid Questions: If a question is answered by more than 1000 people without any option reaching 75%, it is marked as invalid.
Quality Assurance: We preset 500 seed questions with known answers to filter users. Only users who perform well on these questions have their choices counted.
Statistics: On average, each question is evaluated by 453 people, with the completion option having an average selection rate of 78.9%. The dataset has also undergone two rounds of internal review.

Chinese version

https://gov.typox.ai/topic/90/typocurator-数据集

TypoX Community