As an analyst with over two decades of experience in the tech industry, I have witnessed firsthand the challenges that industries face due to inaccurate, duplicate, and incomplete data. The advent of AI was supposed to alleviate these issues, but as we all know, no technology is perfect.
The persistent problem of incorrect, repeated, and insufficient data persists across various industries. Artificial Intelligence (AI) is employed as a solution, yet it too carries its own set of limitations. Sometimes, the data used by AI may be misclassified or simply not applicable.
Fraction AI is blazing a trail in data labeling, merging the swiftness of artificial intelligence with human intuition. The firm has just finished up a $6 million pre-seed funding round, which was jointly led by Symbolic and Spartan, plus strategic investments from Illia Polosukhin (Near), Sandeep Nailwal (Polygon), and other top-tier angel investors.
As an analyst, I’m excited to discuss our innovative solution, Fraction AI. Unlike traditional methods that rely solely on artificial intelligence or human intervention, we’re pioneering a new path by leveraging human intuition to guide our AI agents. The funds raised from this round will be instrumental in delving deeper into the research behind this approach and upgrading our infrastructure to accommodate its scalability. This hybrid method, proven effective through rigorous research, promises to tackle the escalating challenge of generating high-quality data more efficiently.
Introducing Gamified Adversarial Prompting
Data scientists have shown that using GAP (gamified adversarial prompting) significantly improves the performance of modern AI models by creating more effective datasets. The GAP system works by gathering high-quality data through a game, making data collection an enjoyable experience for players. This process encourages participants to pose intricate, detailed questions and answers that help expand the knowledge base of the AI models.
In simpler terms, Fraction AI motivates AI agents to generate top-notch data by holding real-time contests. Developers design and activate these agents with specific guidelines, aiming for the finest results. The underlying system is supported by Ether, serving as its economic backbone. Competitors are rewarded economically, which leads to a steady flow of beneficial training data.
Current issues with data quality
Inefficient data, marked by mistakes like misspelled names, wrong addresses, or general input errors, can cost organizations a significant amount, often reaching tens of millions annually. Whether it’s due to human error or technical glitches, inaccurate data is problematic because it introduces inconsistencies that disrupt any meaningful analysis.
When merging data from several origins, it’s often the case that identical datasets may arise. For instance, if we consider retail scenarios, you could gather customer lists from two different platforms and discover some individuals who have made purchases from both stores. These repeated entries can cause issues because you aim to tally each client only once.
Merging data from two distinct sources may lead to discrepancies in formatting. These cross-source irregularities could potentially create significant data quality problems if not promptly detected and addressed.
Two challenges often encountered are incomplete data and what’s known as ‘dark’ or hidden data. Incomplete data refers to records that lack essential details, such as phone numbers without area codes or demographic information devoid of age specifics. On the other hand, dark data is a type of data that gets collected and stored but remains untapped and unused. For instance, IBM suggests that approximately 90% of all sensor data gathered from IoT devices goes unutilized. This overlooked resource represents more than half of an average organization’s total data storage costs, with many organizations unaware of its existence.
Human understanding facilitates improvement
GAP serves as an educational resource, inspiring individuals to push the boundaries of artificial intelligence capabilities. By asking participants to pinpoint mistakes or discrepancies within datasets or AI results, it fosters error detection. Given the wide range of experiences among its users, it facilitates the identification of biases that a singular development team may miss due to their limited perspective.
Incorporating game mechanics motivates individuals to think creatively by tackling problems or riddles that push the boundaries of data and model capabilities. By doing so, players may discover new use cases, identify biased results, and suggest more diverse solutions, thereby minimizing systemic biases in data and models. This leads to a fairer base for various types of applications. Furthermore, participants will be alerted to previously overlooked data inconsistencies as they’ll be incentivized by rewards for detecting mistakes. The potential rewards for finding major flaws could be substantial, thereby decreasing the likelihood of unforeseen issues or weaknesses in real-life implementations.
With advancements in technology, it becomes possible for a larger number of individuals to engage in multiplayer gaming sessions at once. This mass participation fuels rapid progress, as the increased amount of data facilitates the swift discovery of vulnerabilities.
The dark side of creativity
Creative problem-solving doesn’t have to be for the public good. The rewards would be the primary motivation for some users, leading to an excessive focus on them. Taking this a step further, it’s not unreasonable to expect malicious actors to try and game the system, and platforms will need to deploy mechanisms to detect and block harmful activities. An example is using AI and statistical models to monitor user behavior patterns, flagging anomalies that indicate spamming or unusual submission patterns. Unusually high submission rates or repetitive patterns from a single user could be flagged for review.
In simpler terms, the GAP system might rate users according to their past contributions. To minimize the potential for misuse initially, it’s best that newly registered users don’t carry much weight until they build up a reputation for reliability.
Ultimately, some users may flag content at random. To prevent this from affecting the integrity of the data, platforms utilizing GAP (Generalized Adversarial Process) might have to integrate human analysts or artificial intelligence to filter out instances where useful and precise information is being flagged.
Taking data quality mainstream
By taking part, humans can be motivated to identify incorrect labels or unnecessary data within AI databases, thus enhancing the precision and effectiveness of machine learning and artificial intelligence systems. Furthermore, gamifying contributions can boost the accuracy and comprehensiveness of open-source datasets such as Wikipedia and OpenStreetMap. This real-time flagging of misinformation will result in more dependable repositories overall.
Implementing the GAP (Generalized Approach for Parity) system could help in dealing with content that is hurtful, biased, or inappropriate more effectively. For instance, platforms such as Reddit or YouTube might choose to incorporate this method to swiftly detect and eliminate such questionable content from their sites.
Read More
- AI16Z PREDICTION. AI16Z cryptocurrency
- POL PREDICTION. POL cryptocurrency
- Crypto ETPs hit $44.5b in YTD inflows amid Bitcoin surge
- Hong Kong Treasury says crypto is not a ‘target asset’ for its Exchange Fund
- Li Haslett Chen to Leave Warner Bros. Discovery Board
- EXCLUSIVE: Alia Bhatt in talks with Dinesh Vijan for a supernatural horror thriller; Tentatively titled Chamunda
- Springfield man is convicted for using crypto to finance ISIS operations
- Blockaid new dashboard to track Web3 activity and threats
- SEN PREDICTION. SEN cryptocurrency
- Shiba Inu, Bonk, Pepe prices rebound: Beware of dead cat bounce
2024-12-19 16:04