Whitepaper

Introduction

The artificial intelligence revolution is reshaping our world at an unprecedented pace, but its development remains largely concentrated in the hands of a few powerful entities. This centralisation not only limits innovation, but also raises concerns about the future of AI alignment with human values and interests. While decentralised AI initiatives such as Bittensor have emerged as promising alternatives, they face a critical challenge: meaningful participation has been largely limited to technical experts, leaving a vast, diverse pool of human intelligence untapped.

The Dojo Subnet addresses this challenge by creating an open platform that enables contributors of all technical backgrounds to actively participate in and shape the future of decentralised AI. Built on Bittensor, Dojo transcends traditional data collection to become a comprehensive ecosystem where human intelligence can directly influence AI development, training, and agentic operations both on-chain and in the real world.

Key Features To ensure the quality and integrity of the data collected, Dojo introduces several novel features:

  1. Modular Pipelines for Synthetic Task Generation: Developer-friendly, plug-and-play modules for anyone to generate unique synthetic tasks with state-of-the-art Large Language Models (LLMs) to collect human feedback data through Dojo, which will be used to improve open-source models.

  2. Partial Ground Truth Validation Mechanism: The validators will be able to synthetically generate partial ground truths, allowing them to determine the quality of responses provided by individual contributors while taking general consensus into account.

  3. Obfuscation: Code-level techniques to prevent sybil attacks and ensure contributions are genuinely human. In the coming future, contributor feedback loop mechanism will also be in place which will naturally create additional layers of obfuscation that discourages gaming.

Use Cases

The Dojo Subnet offers multiple use cases:

  1. Synthetically Generated Tasks: These tasks can bootstrap the human participant pool and can be used for model training or fine-tuning from the outset.

  2. Cross-subnet Validation: Validators can use responses to rate the quality of outputs across other Bittensor subnets, thereby introducing new ways to challenge and incentivise miners to improve their performance.

  3. External Data Acquisition: Entities outside the Bittensor ecosystem can tap into the subnet to acquire high-quality human-generated data.

By creating an open platform for gathering human intelligence for AI/ML development, Tensorplex Dojo Subnet aims to solve the challenges of quality control, human verification, and sybil attack prevention while promoting a more equitable distribution of benefits in AI/ML development.

Benefits to contributors participating through the subnet:

  • Open platform: Anyone from all technical levels can contribute, ensuring broad participation and diverse data collection.

  • Flexible work environment: Contributors enjoy the freedom to work on tasks at their convenience from any location.

  • Quick payment: Rewards are streamed consistently to contributors, as long as they complete sufficient tasks within a stipulated deadline and have them accepted by the subnet.

Subnet Mechanism

Responsibilities of Miners

Miners are required to gather Contributors to complete tasks. Miners are expected to build and curate their Contributor pools to strategically complete Tasks based on domain expertise in order to succeed in the Dojo subnet.

Responsibilities of Validators

Validators are responsible to generate tasks to Miners that are either based on synthetic task pipelines or external organic requests, as well as to calculate the scoring, set reward and miner trust. The terms will be described in the next section.

Task Lifecyle, User Journey, and Data Flow

Important Terms:

  • Task: A task consists of an instruction that is accompanied by multiple responses to be ranked by human contributors. The task is considered complete only when sufficient and high quality preference data points are collected for the task.

  • Contributor: The entity used to describe human contributors regardless of the associated miner. Miners are expected to curate their pool of contributors in terms of quality and domain expertise to specialize, and contributors are free to be associated with different miners’ organisations (hotkeys).

  • Synthetic Task API: The module that generates the synthetic data based on developer's requirement to be distributed to Contributors for labelling.

Task generation begins with Validators running Synthetic Task API. First, the initial instructions for Tasks is based on randomly sampled combinations of Task Seeds such as Persona Dataset. The list of Task Seeds is designed to incorporate more diverse task based on organic requests / future collaborations with interested parties. Inspired by the Self Instruct framework, a few-shot prompting technique will be employed on a sample of existing task seeds for SOTA LLMs to generate Tasks with new instructions.

For the Task instructions that are generated successfully, several iterations of augmentation on the initial Task instruction is performed to produce n-set of different Task instructions that deviates from the original Task instruction progressively. The goal of such augmentation is for LLMs to follow the augmented prompts and produce objectively subpar responses in order to build the synthetic ground truth for human preference ranking scoring. This is critical in assessing and assigning Miner trust to the contributor pool to build a reliable Dojo participant community.

The original Task instructions, along with the augmented Task instructions will be sent to LLMs to generate the responses to the corresponding instructions. Depending on the domain, various prompting techniques such as CoT and execution feedback may be applied to ensure reasonable and compilable responses are produced for contributors’ ranking.

Next, the validators will apply data obfuscation of various layers/forms on the responses which prevents the Miners/Contributors from performing lookup attacks. The obfuscation process will not affect the functionality or quality of the response, as the obfuscation will be removed when retrieved from the database.

Finally, after applying data obfuscation, the task which contains the original instruction and the responses generated from the original and augmented instructions will be compiled, shuffled and distributed to Miners.

Once the task is assigned to the Miner, the Miners can decide how to pass this task to the Contributors, who may be a separate entity. The Contributor’s outputs are associated with the respective Miner's hotkey for scoring and reward calculations. These are the various methods for task assignment and completion:

  • Dojo Interface: Miners who prefer to mine conveniently can create a personalized Dojo API key through the provided CLI suite, which will then be used by the Contributors to participate through the Dojo Interface.

  • Local Dojo Interface: Sophisticated miners are recommended to run a localised Dojo interface to eliminate the dependency on the Dojo API.

  • External Platforms: Miners can also choose to distribute these tasks to an external service provider such as scale.ai, AWS mTurk, Prolific, Appen or Web3 data labeling platforms. However, these miners will need to be responsible for quality control and ensuring tasks are completed within the stipulated deadlines.

Scoring Mechanism

The scoring formula for Miners is the summation of the score of the completed tasks computed in the past few epochs. Depending on the type of task, the individual task score is a weighted function of some of the following metrics:

  • Weighted Cohen’s Kappa: Calculates the agreement between Miners while controlling for the probability of chance agreement, providing a more robust measure of reliability compared to simple percent agreement. A weighted variant of Cohen’s kappa will be used as we are concerned with the relative ranking of the responses generated by LLMs.

  • Intraclass Correlation Coefficient [ICC(2,1)]: Assesses the consistency of quantitative measurements made by different Miners. ICC(2,1) is suitable for multi-rater agreement and can handle continuous or ordinal data, providing a robust measure of reliability across multiple raters.

  • Spearman’s Correlation: Measures the strength and direction of a monotonic relationship between two continuous or ordinal variables, robust to non-normal distributions and outliers, helps to assess the agreement among Miners and as well as against ground truth.

  • Distance against synthetic ground truth: To address the loss of fidelity of Spearman’s correlation.

While alignment with synthetic ground truth is important, the scoring mechanism is designed in such a way that a high level of agreement between human contributors will still be prioritized when there is a disagreement with the synthetic ground truth.

The Cohen’s Kappa metric can also be used to monitor data quality, i.e. do not assign miner trust if Weighted Cohen’s Kappa is not above a certain threshold (no consensus between Miners is achieved).

The scoring mechanism is designed to handle various attack vectors and failure modes, and will be continually improved on to ensure a fair, productive and collaborative environment for all participants.

Participants can also determine accuracy scores of responses that do not have ground truth. For example, a*, b*, c* can be processed in different manners with the intention to improve the outputs like using various experimental prompt engineering frameworks, or using new LLMs that are not yet benchmarked.

  • [Coming Soon] Contributor Feedback Loop: To further enhance the value of the human-validated synthetic dataset in Dojo, the Contributor Feedback Loop mechanism is designed to meaningfully iterate and enhance outputs generated from SOTA LLMs. On a high level, after evaluating the original Tasks, Contributors will be asked to provide rich human feedback for Task response that they have rated to be the best in order to further improve the Task response meaningfully. Then, these feedback from various Contributors will be fed back to the LLMs to produce new rounds of Tasks with responses augmented with those feedbacks for further evaluations. Once the consensus has converged onto the best Task response after multiple rounds of feedbacks, the scoring mechanism will retrospectively reward the Miners whose Contributor(s) provided the best feedbacks that has been incorporated into that Task response. With the generated chain-of-feedback, the outputs of the chain can be fed back into the pre-training/finetuning corpus, or the chain itself can be used to iterate on generating higher quality synthetic datasets.

Please note that the scoring mechanism is under constant refinement based on community feedback, regular empirical reviews or external academic studies.

Anti-gaming Measures

Data Sources and Security

In the stage of generating Task Instructions, random samples of task seeds will be used in the instruction synthesis to ensure freshness of synthetic instructions generated by the Synthetic Task API.

In the stage of augmenting Task Instructions, random forms of augmentation will be applied on the Task Instruction in order to obtain different and uncorrelated responses from the LLM.

In the stage of generating the Task Responses, the few-shot examples used will be sampled from a pool to greatly reduce the potential for miners to cache Task Responses for lookup attacks.

In the stage of obfuscating Task Responses, various Obfuscator Library will be used to obfuscate the code such that miners are discouraged from using non-human approaches such as LLM to provide genuine human preferences.

Miner-miner Collusion Prevention

Each miner will receive uniquely obfuscated responses to grade on. On top of this, the number of pre-obfuscated responses generated per task can be dynamically adjusted based on contributor count. Thus, miners will not save any meaningful resources by employing Sybil attacks on the platform. The overlap of pre-obfuscated tasks is minimal, and any "savings" achieved by the miner can be negated by modifying the reward function to provide a bonus for completing additional tasks, thereby incentivising the miner to consolidate participation onto one Miner UID instead.

[Coming Soon] Contributor Feedback Loop

The feedback loop mechanism naturally adds an additional vector of complexity in discouraging Sybil attacks. The feedback collected from the pool of Contributors will be permutated combinatorically in various degrees such that it will difficult to differentiate Contributor-specific feedback against other feedbacks to discourage unfair evaluations or targeted attacks from specific parties of Miners or Contributors.

Last updated