Elon’s fight for ‘open-source AGI’ ignores users and ethical AI training

As a researcher with a background in artificial intelligence and blockchain technology, I strongly believe that the ongoing legal dispute between Elon Musk and OpenAI raises significant concerns about the true intentions of for-profit AI companies. While OpenAI’s transformation into a profit-driven organization might seem appealing on the surface, its excessive focus on profits could potentially lead to detrimental consequences for end-users.

Elon Musk filed a lawsuit against OpenAI, accusing the company of deviating from its original mission to create artificial general intelligence (AGI) “for the betterment of humanity.” Carlos E. Perez speculates that this legal action could potentially transform OpenAI into the next WeWork, a reference to the prominent tech company that faced significant financial and reputational challenges after a period of rapid growth.

In this ongoing legal dispute, OpenAI’s shift towards making a profit is under scrutiny. Yet, an overemphasis on financial gain by the corporation may conceal self-serving agendas. Furthermore, this emphasis detracts from essential considerations for users: ensuring ethical AI development and effective data handling.

Grok, Elon’s brainchild and ChatGPT competitor, can access ‘real-time information’ from tweets. OpenAI is anyway infamous for scraping copyrighted data left, right, and center. Now, Google has struck a $60 million deal to access Reddit users’ data to train Gemini and Cloud AI.

As a crypto investor, I believe that merely advocating for open-source projects isn’t enough to protect the interests of users in the crypto space. Instead, we need mechanisms to ensure genuine consent and fair compensation for individuals contributing to the training of large language models (LLMs). Emerging platforms that facilitate the crowdsourcing of AI training data are essential in addressing this issue. I will delve deeper into this topic later on.

It’s mostly non-profit for users

Approximately 5.3 billion individuals worldwide utilize the internet, and around 93% of this massive user base relies on centralized social media platforms. Consequently, it’s reasonable to assume that a significant portion of the estimated 147 billion terabytes of data generated online in 2023 originated from users. This volume is projected to surpass 180 billion terabytes by 2025.

Although this enormous data collection or “publicly accessible information” significantly powers AI’s learning and development, users often fail to derive substantial advantages from it. They lack both control and true ownership over the data being used. The common method of consent given through an “I Agree” button is questionable at best, and borderline coercive at its worst. Instead, we should strive for more transparent and meaningful ways for users to understand and control how their information is being utilized.

As a data analyst, I’ve come to realize that just as oil has become a valuable resource for industries, data has taken on a similar role in today’s digital economy. However, it’s important to note that the current power dynamics surrounding data ownership are skewed towards big tech companies.

In light of blockchains’ progression into the foundational technology for data distribution and authenticity, we’ve reached an exciting time for users. Crucially, modern AI firms prioritize these solutions for enhanced performance, cost savings, and above all, advancing humanity’s welfare.

Crowdsourcing data for ethical AI training

The “read-write-trust” model of Web2 assumes that entities and stakeholders will behave altruistically. However, as David Hume pointed out centuries ago, people are primarily driven by their own self-interest.

In the Web3 model, which is characterized by “read-write-own,” I employ technologies such as blockchain and cryptography to ensure that no individual participant in the decentralized network can act maliciously or deceitfully. Chris delves deeply into this concept in his book.

As a crypto investor, I believe that the web3 tech stack is fundamentally different from traditional tech systems. It’s community-driven and user-controlled, allowing individuals to take back control of their digital assets and data. The technology provides us with the tools to securely store and manage our financial, social, creative, and other types of information.

In recent times, advancements in privacy and security techniques such as zero-knowledge proofs (zkProofs) and multi-party computation (MPC) have emerged. These innovations offer novel solutions for data verification, exchange, and control by enabling parties to authenticate facts without disclosing the underlying information.

From the perspective of artificial intelligence (AI) development, these extensive capabilities are significant. Nowadays, it’s feasible to obtain trustworthy data without relying on centralized suppliers or verifiers. However, what truly sets web3 apart is its decentralized and intermediary-free structure, which enables direct interactions between data producers – namely, users – and AI projects in need of their data for model training.

As an analyst, I would express it this way: Eliminating the need for “trusted intermediaries” and gatekeepers leads to substantial cost savings. Furthermore, this setup enables projects to reward users directly for their input and efforts. Users can capitalize on opportunities such as recording scripts in their native dialect or identifying and categorizing objects by earning cryptocurrencies in return. These microtasks include tasks like recognizing and labeling images, sorting unstructured data, and more.

Instead of “Companies, on the other hand, can build more accurate models using high-quality data validated by humans in the loop and at a fair price. It’s a win-win,” you could say:

Bottom-up advancements, not merely open-source

In their current form, traditional frameworks heavily favor individuals and user communities, rendering the concept of open-source meaningless in this context. A significant transformation in existing business models and training methods is essential to promote ethical Artificial Intelligence (AI) development.

Shifting from centralized, hierarchical structures to decentralized, participatory processes is a more effective approach. It’s also crucial to build a system based on merit and value, prioritizing ownership, autonomy, and teamwork. In this context, fair distribution of resources leads to greater prosperity rather than amassing as much as possible.

It’s intriguing that these systems will bring significant advantages to large corporations as well as smaller businesses and individual users. In fact, top-notch data, reasonable pricing, and precise AI models are essential for all parties.

As an industry analyst, I strongly believe that with the current incentives in place, it is in our collective best interest to adapt and integrate new-age models. Clinging to outdated methods and narrow gains may provide short-term benefits, but they will not sustain us in the long run as the future holds new and distinct requirements compared to the past.

William Simonin

William Simonin is the chairman at Ta-da, an AI data marketplace that leverages blockchain to gamify data verification. He previously worked as a software engineer and researcher for the French Defense Ministry for about six years and with the Security Association of Epitech Nancy, serving as their President and later as a Professor of Functional Programming. He is a French entrepreneur and co-founder of multiple AI, tech, and cryptocurrency companies.

2024-05-25 16:54

Elon’s fight for ‘open-source AGI’ ignores users and ethical AI training | Opinion

It’s mostly non-profit for users

Crowdsourcing data for ethical AI training

Bottom-up advancements, not merely open-source

Read More