Whitepaper

Introduction

Artificial intelligence is experiencing rapid growth, driven by increases in model size, computing power, and the development of multi-modal applications. This evolution has created a strong demand for high-quality, targeted data. As compute resources become more accessible and powerful, the ability to fine-tune models becomes more widespread, further increasing the demand for specialized training data. Additionally, as AI expands beyond traditional applications like chatbots into areas such as robotics, the need for diverse, multi-modal data continues to grow.

The recent open-sourcing of advanced language models by companies like Meta has made AI development more accessible to smaller teams and individual developers. However, while this democratization has leveled the playing field in some aspects, it has highlighted a significant challenge still facing these smaller developers: acquiring credible, high-quality, and targeted training data at scale. This issue is particularly pronounced for developers working across multiple modalities, including text, audio, video, and images.

To address this challenge, we've developed Tensorplex Dojo Subnet. Dojo is designed as a decentralized platform to incentivize the collection and creation of high-quality, multi-modal data. Dojo aims to facilitate the generation of labeled data, including preference data for fine-tuning and annotated audio-visual content, as well as the submission of diverse training data.

Key Features To ensure the quality and integrity of the data collected, Dojo introduces several novel features:

  1. Synthetic Task Generation: Unique tasks are generated by state-of-the-art Large Language Models (LLMs) to collect human feedback data, which can be used to improve open-source models.

  2. Synthetic Ground Truth Validation Mechanism: Validators can synthetically generate partial ground truths, allowing them to determine the quality of responses provided by individual participants.

  3. Obfuscation: Techniques to prevent sybil attacks and ensure contributions are genuinely human.

Use Cases

The Dojo Subnet offers multiple use cases:

  1. Synthetically Generated Tasks: These tasks can bootstrap the human participant pool and can be used for model training or fine-tuning from the outset.

  2. Cross-subnet Validation: Validators can use responses to rate the quality of outputs across other Bittensor subnets, thereby incentivizing miners to improve their performance.

  3. External Data Acquisition: Entities outside the Bittensor ecosystem can tap into the subnet to acquire high-quality human-generated data.

By creating an open platform for gathering human-generated datasets, Tensorplex Dojo Subnet aims to solve the challenges of quality control, human verification, and sybil attack prevention while promoting a more equitable distribution of benefits in AI development.

Benefits to participants contributing through the subnet:

  • Open platform: Anyone capable can contribute, ensuring broad participation and diverse data collection.

  • Flexible work environment: Participants enjoy the freedom to work on tasks at their convenience from any location.

  • Quick payment: Rewards are streamed consistently to participants, as long as they complete sufficient tasks within a stipulated deadline and have them accepted by the subnet.

Subnet Mechanism

Responsibilities of Miners

Miners are required to gather Participants to complete tasks. Miners are expected to build and curate their Participant pools to strategically complete Tasks based on domain expertise in order to succeed in the Dojo subnet. Compute requirements of Miners can be found in the Appendix section below.

Responsibilities of Validators

Validators are responsible to play the role of Instructor, Augmenter, Output Generator and Obfuscator in the Task generation phase, as well as to calculate the scoring, set reward and miner trust. The terms will be described in the next section. Compute requirements of Validators can be found in the Appendix section below.

Task Lifecyle, User Journey, and Data Flow

Figure 1: High-level Task Lifecycle Diagram

Important Terms:

  • Task: A task consists of an instruction that is accompanied by multiple responses to be ranked by human contributors. The task is considered complete only when sufficient and high quality preference data points are collected for the task.

  • Worker: The entity used to describe human contributors regardless of the associated miner. Miners are expected to curate their pool of workers in terms of quality and domain expertise to specialize, and workers are free to be associated with different miners’ organisations (hotkeys).

  • Instructor: The object class that generates the instruction of the task.

Task generation begins with the Instructor creating instructions for Tasks based on randomly sampled combinations of Task Seeds. The list of Task Seeds is initially defined by Tensorplex Labs, and will incorporate more diverse task seeds based on organic requests / future collaborations with interested parties. Inspired by the Self Instruct framework, a few-shot prompting technique will be employed on a sample of existing task seeds for SOTA LLMs to generate Tasks with new instructions. A filter will also be applied to check against the Global Task Database which stores completed and rejected Tasks by running a series of semantic filters and comparators.

Figure 2: Synthetic Ground Truth Generation Process

For the Task instructions that are generated successfully, the Augmenter will perform several iterations of augmentation on the initial Task instruction to produce n-set of different Task instructions that deviates from the original Task instruction progressively. The goal of such augmentation is for LLMs to follow the augmented prompts and produce objectively subpar responses in order to build the synthetic ground truth for human preference ranking scoring. This is critical in assessing and assigning Miner trust to the worker pool to build a reliable Dojo participant community.

The original Task instructions, along with the augmented Task instructions will be sent to the Output Generator, where LLM is used to generate the responses to the corresponding instructions. Depending on the domain, various prompting techniques such as CoT and execution feedback may be applied to ensure reasonable and compilable responses are produced for workers’ ranking.

Next, the Obfuscator will apply data obfuscation of various layers/forms on the responses which prevents the participants from performing lookup attacks. The obfuscation process will not affect the functionality or quality of the response.

Finally, after applying data obfuscation, the task which contains the original instruction and the responses generated from the original and augmented instructions will be compiled and managed by Task Manager, which the Miners obtain the tasks from.

Figure 3: Task Dissemination from Validators to Participants

Once the task is assigned to the Miner, the Miners can decide how to pass this task to the Participant, who may be a separate entity. The Participant’s outputs are associated with the respective Miner's hotkey for scoring and reward calculations. These are the various methods for task assignment and completion:

  • Dojo Interface: Miners who prefer to mine conveniently can create a personalized API key through the CLI, which will then be used by the Participants to contribute through the Dojo Interface.

  • Local Dojo Interface: Sophisticated miners are recommended to run a localised Dojo interface to eliminate the dependency on the Dojo API. (Coming soon)

  • External Platforms: Miners can also choose to distribute these tasks to an external service provider such as scale.ai, AWS mTurk, Prolific, Appen or Web3 data labeling platforms. However, these miners will need to be responsible for quality control and ensuring tasks are completed within the stipulated deadlines. (Coming soon)

Figure 4: Dojo interface for measuring prompt similarity of different UI outputs

Scoring Mechanism

The scoring formula for Miners is the summation of the score of the tasks computed in the past few epochs. Depending on the type of task, the individual task score is a weighted function of some of the following metrics:

  • Weighted Cohen’s Kappa: Calculates the agreement between Miners while controlling for the probability of chance agreement, providing a more robust measure of reliability compared to simple percent agreement. A weighted variant of Cohen’s kappa will be used as we are concerned with the relative ranking of the responses generated by LLMs.

  • Intraclass Correlation Coefficient [ICC(2,1)]: Assesses the consistency of quantitative measurements made by different Miners. ICC(2,1) is suitable for multi-rater agreement and can handle continuous or ordinal data, providing a robust measure of reliability across multiple raters.

  • Spearman’s Correlation: Measures the strength and direction of a monotonic relationship between two continuous or ordinal variables, robust to non-normal distributions and outliers, helps to assess the agreement among Miners and as well as against ground truth.

  • Distance against synthetic ground truth: To address the loss of fidelity of Spearman’s correlation.

Figure 5: Augmented Prompt Deviation as Synthetic Ground Truth

While alignment with synthetic ground truth is important, the scoring mechanism is designed in such a way that a high level of agreement between human contributors will still be prioritized when there is a disagreement with the synthetic ground truth.

Figure 6: Inconsistent Participant Results

The Cohen’s Kappa metric can also be used to monitor data quality, i.e. do not assign miner trust if Weighted Cohen’s Kappa is not above a certain threshold (no consensus between Miners is achieved).

Figure 7: Various failure modes and the corresponding reaction of the scoring mechanism

The scoring mechanism is designed to handle various attack vectors and failure modes, and will be continually improved on to ensure a fair, productive and collaborative environment for all participants.

Figure 8: LLM Leaderboard as Synthetic Ground Truth

The Synthetic Ground Truth could be derived from other sources such as a publicly available LLM leaderboard, where the rank of the leaderboard in specific domain can be used as a proxy for the accuracy score.

Figure 9: Tasks with Partial Ground Truths

Participants can also determine accuracy scores of responses that do not have ground truth. For example, a*, b*, c* can be processed in different manners with the intention to improve the outputs like using various experimental prompt engineering frameworks, or using new LLMs that are not yet benchmarked.

Anti-gaming Measures

Data Sources and Security

In the stage of Task Instruction generation, random samples of task seeds will be used in the instruction synthesis to ensure freshness of synthetic instructions generated by the Instructor.

In the stage of Task Instruction Augmentation, random forms of augmentation will be applied on the Task Instruction in order to obtain different and uncorrelated responses from the LLM by the Output Generator.

In the stage of Output Generator, the LLM being used will be sampled from a pool of LLMs to greatly reduce the potential for miners to cache Task responses for lookup attacks.

In the stage of Obfuscator, various Obfuscator Library will be used to obfuscate the code such that miners are discouraged from using non-human approaches such as LLM to provide human preferences.

Miner-miner Collusion Prevention

Each miner will receive uniquely obfuscated responses to grade on. On top of this, the number of pre-obfuscated responses generated per task can be dynamically adjusted based on participant count. Thus, miners will not save any meaningful resources by employing Sybil attacks on the platform. The overlap of pre-obfuscated tasks is minimal, and any "savings" achieved by the miner can be negated by modifying the reward function to provide a bonus for completing additional tasks, thereby incentivising the miner to consolidate participation onto one account instead.

Figure 10: Task dissemination from validator to potential sybil attacker

Last updated