TypoCurator Dataset
-
Dataset Address
https://huggingface.co/datasets/typox-ai/Typo_Intent_OS
Dataset Description
3383 pairs of questions and answers about the Web3 knowledge base, used for training and testing AI models applied to Web3 scenarios.
- Data Format
- prompt: AI-generated question.
- completion: The candidate answer selected by the majority of users.
- Example:
{ "prompt": "What is a primary advantage of using a decentralized finance (DeFi) platform?", "completion": "Direct peer-to-peer transactions without intermediaries." }
Data Source
This dataset contains AI-generated questions and multiple candidate answers. The correct answers are selected by our Web3 product's end-users based on what they consider the most accurate. The answer chosen by the most users is marked as the completion. To ensure the quality of these evaluations, we use an incentive mechanism to encourage sincere responses. Additionally, we include some seed questions with known answers to filter users. Only those who perform well on these seed questions have their choices counted.
Annotation Method (Brief)
- Generation: Use TypoX to generate questions and candidate answers.
TypoX is a Rag system with a Web3 knowledge base, TypoX. https://www.typox.ai/ - Evaluation: User selections are conducted through the TypoCurator telegram Mini-app. https://t.me/typocurator_bot
- Participation: Each question is evaluated by at least 300 people.
- Majority Rule: An option must be selected by more than 75% of participants to be considered the completion.
- Re-evaluation: If no option reaches 75%, the question is re-evaluated until an option reaches 80%.
- Invalid Questions: If a question is answered by more than 1000 people without any option reaching 75%, it is marked as invalid.
- Quality Assurance: We preset 500 seed questions with known answers to filter users. Only users who perform well on these questions have their choices counted.
- Statistics: On average, each question is evaluated by 453 people, with the completion option having an average selection rate of 78.9%. The dataset has also undergone two rounds of internal review.
Chinese version
- Data Format
-