OpenAI Reportedly Asking Contractors to Upload Past Work—Lawyers Warn of Major IP Risk
OpenAI is reportedly asking contractors to upload real work from their previous jobs, a practice that intellectual property lawyers say puts the company at significant legal risk. The revelation raises urgent questions about how AI companies are sourcing training data—and whether the industry's appetite for high-quality human-generated content is outpacing its legal guardrails.
The report, first published by TechCrunch, suggests OpenAI has been directing contract workers to provide documents, code, and other materials created during their tenure at other companies. For anyone who's ever signed an employment agreement, the implications are immediately clear: this work almost certainly belongs to those former employers.
The Legal Exposure Is Substantial
Employment contracts typically include intellectual property assignment clauses that give employers ownership of work created during employment. Many also contain non-disclosure agreements that explicitly prohibit sharing proprietary information with third parties—including, presumably, AI companies hungry for training data.
An intellectual property lawyer quoted in the original report didn't mince words: OpenAI is "putting itself at great risk" with this approach. The potential claims are numerous: copyright infringement, trade secret misappropriation, tortious interference with contracts, and potentially even violations of the Computer Fraud and Abuse Act if the data was taken without authorization.
What makes this particularly dangerous for OpenAI is the scale. If contractors are uploading work from dozens of different former employers, the company could face exposure from an equally large number of potential plaintiffs—each with their own legal teams and their own grievances about AI companies using their data without permission.
Why OpenAI Might Be Taking This Risk
The move speaks to a fundamental tension in AI development: the best training data is often the data you can't easily get.
Publicly available text—Wikipedia, Reddit posts, news articles—has been scraped extensively. The low-hanging fruit is picked. What remains is higher-quality, proprietary content: internal corporate documents, professional code repositories, specialized research, and the kind of polished work product that companies pay skilled employees to create.
This is precisely the data that would make AI models better at professional tasks. A language model trained on internal consulting reports would be more useful for consultants. One trained on proprietary code would write better code. The commercial incentive to acquire this data is enormous.
But the legal barriers are equally substantial. Companies don't license their internal documents to AI firms. Employees are bound by NDAs. The only way to get this data is to either negotiate enterprise deals (slow, expensive, limited) or find workarounds.
Asking contractors to upload their old work appears to be the latter.
The Broader Industry Problem
OpenAI isn't operating in a vacuum. The entire AI industry faces a training data crunch, and different companies have taken different approaches to solving it.
Some, like Anthropic, have emphasized synthetic data generation and constitutional AI techniques that require less raw training data. Others have pursued licensing deals with publishers and content platforms. Google and Meta benefit from their own massive data ecosystems. Smaller players have sometimes scraped first and asked questions later.
The question of where training data comes from—and whether its acquisition is legal—has been the subject of multiple ongoing lawsuits. Getty Images sued Stability AI. The New York Times sued OpenAI. Authors have filed class actions. The legal framework is still being written in real-time.
What's notable about this latest report is that it suggests OpenAI may be moving beyond the gray areas of web scraping into territory that's much more clearly problematic. Taking copyrighted images from the internet is one thing; soliciting employees to breach their contracts with former employers is another.
What This Means for Workers
If you're a contractor being asked to upload work from previous jobs, the legal risk doesn't just fall on OpenAI—it falls on you too.
Violating an NDA can result in lawsuits from former employers. Depending on the nature of the information shared, it could constitute trade secret theft, which carries both civil and criminal penalties. The fact that an AI company requested the data is not a defense.
Workers in this position should review their employment agreements carefully. Most will find clauses that explicitly prohibit exactly what's being requested. The safest course is to decline—regardless of what the contractor agreement with OpenAI says.
The Stakes Keep Rising
OpenAI has already faced criticism over its data practices, from the Times lawsuit to questions about how it trained GPT-4. The company has generally defended its practices as fair use while simultaneously pursuing licensing deals with major publishers.
This latest report suggests the company's data appetite may be exceeding what even aggressive interpretations of fair use can support. Soliciting contractors to potentially breach their employment agreements isn't a copyright question—it's a contract law question, and the answers there are much less ambiguous.
For OpenAI, the immediate concern is litigation risk. For the broader industry, the concern is precedent. If AI companies normalize the idea that workers should feed their former employers' proprietary data into training pipelines, the backlash—legal, regulatory, and reputational—could be severe.
The AI industry has long argued that training on human-created content is transformative and beneficial. That argument becomes much harder to make when the data acquisition involves asking people to break their contracts. OpenAI may be discovering that the fastest path to better training data is also the fastest path to the courthouse.