Data Donors: A New Paradigm for Fueling AI for the Public Good

Imagine receiving an email in the near future that says: “Thank you for sharing data with the American Data Collective on May 22, 2025. Your workout data donation to SprintAI, a local startup focused on designing shoes for differently-abled athletes, has also been forwarded to an AI research cluster at a regional university. Your contribution is accelerating AI innovation to address pressing public needs!” This vision of data donations akin to blood donations may soon become a reality, offering community benefits despite not immediately serving individual donors.

Creating data equivalents of blood banks may not seem urgent amid concerns about AI companies using data for profit-driven ends. This narrow view overlooks the hundreds of AI research initiatives and startups needing high-quality data. These efforts, often lacking resources, were highlighted at Meta’s Open Source AI Summit in Austin, Texas. There, leaders like Matt Schwartz, who uses AI for colonoscopy diagnostics, and Edward Chang of the University of California, who explores brain functions with AI, expressed the need for public data contributions.

A tragic irony exists in our data infrastructure: while individuals share data with private entities, AI labs and public interest groups face barriers to acquiring necessary data. Unlike commercial giants, they cannot scrape the internet for data, use social media platforms as data generators, or afford to license data. Thus, mission-driven AI initiatives face chronic data scarcity, hampering innovation needed for societal improvement.

Privacy, security, and misuse concerns deter individuals from sharing personal information, yet data is continuously collected by app developers and platforms without full transparency or consent. This practice can impact individuals’ lives through decisions made by algorithms trained on this data, often perpetuating societal biases.

Consider OMNY Health, a Georgia-based platform compiling a dataset of 85 million de-identified patient records and four billion clinical notes. This resource aids researchers and health tech firms in training AI models for disease prediction and diverse clinical trials. However, access is primarily available to paying customers, limiting nonprofit researchers and smaller startups that could use the data for public initiatives.

Unlocking AI’s potential for societal benefit requires new frameworks and a cultural shift in data sharing norms. Data donation should be seen as a pro-social contribution, akin to blood donations, fostering collective ownership and responsibility. Imagine a system of “Data Donors,” where individuals consent to share anonymized data with vetted startups and nonprofit organizations committed to public good outcomes.

Such a system requires trust, transparency, and independent governance. Ethical guidelines, security protocols, auditable data handling, and user-centric consent management are essential. Multi-stakeholder oversight bodies, including ethicists, legal experts, and citizen representatives, could ensure ethical integrity. Transparent auditing and clear redress mechanisms should be in place for data misuse.

Innovators on the front lines, like those at the Open Source AI Summit, highlight the critical need for high-quality, diverse datasets. The Data Donor model offers a consensual pathway to unlock this resource, transforming data from a commodity into a powerful engine for societal progress. This approach can accelerate discoveries and solutions, fostering a future where data fuels solutions to humanity’s pressing challenges. It’s time to architect an alternative to commercial data extraction, building a future where data is used democratically for public good.

Note: This article is inspired by content from https://www.theregreview.org/2025/06/16/frazier-a-new-paradigm-for-fueling-ai-for-the-public-good/. It has been rephrased for originality. Images are credited to the original source.