Home News OpenAI questions China's DeepSeek AI data source

OpenAI questions China's DeepSeek AI data source

by Grace Sep 26,2025

DeepSeek accused of using OpenAI's model to train AI system via distillation technique

DeepSeek faces allegations of utilizing OpenAI's model for competitive training through distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.

The developers behind ChatGPT have voiced concerns that China's cost-effective DeepSeek AI models may have incorporated OpenAI-trained data. Recent remarks from Donald Trump characterized DeepSeek as a critical warning for U.S. technology leadership, coming alongside Nvidia's staggering $600 billion market valuation decline.

Market Turbulence Following DeepSeek's Rise

The Chinese AI platform's emergence triggered significant volatility across tech stocks. Nvidia, the dominant GPU provider powering most AI systems, suffered a historic 16.86% single-day plunge - the largest ever recorded on Wall Street. Other major players including Microsoft, Meta, Alphabet (Google's parent company), and Dell Technologies saw declines ranging from 2.1% to 8.7% as investors reassessed AI sector valuations.

DeepSeek promotes its R1 model as an affordable alternative to Western counterparts like ChatGPT, claiming its open-source DeepSeek-V3 framework requires substantially less computing power. The company estimates its training costs at just $6 million - a figure some industry analysts question, yet one that has nonetheless shaken confidence in the billions being invested in AI development by American firms.

Potential IP Violation Investigations

Bloomberg reports OpenAI and Microsoft are examining whether DeepSeek improperly utilized OpenAI's API to integrate its technology. An OpenAI spokesperson stated: "We're aware that PRC-based entities and others consistently attempt to extract knowledge from leading U.S. AI systems." The company highlighted distillation techniques - where developers train models using outputs from more advanced systems - as a violation of its terms of service.

Former President Trump's AI advisor David Sacks commented to Fox News about emerging defensive measures: "Substantial evidence suggests DeepSeek extracted knowledge from OpenAI models. Expect leading U.S. AI firms to implement new protections against such distillation attempts in coming months."

Industry Ironies and Legal Precedents

Technology commentators noted the paradoxical nature of these allegations, given OpenAI's own training practices. Industry analyst Ed Zitron remarked: "It's frankly hilarious watching OpenAI - which fundamentally built ChatGPT by scraping the entire internet - complain about potential model training. The hypocrisy is astounding."

This controversy unfolds against ongoing legal debates about AI training methodologies. OpenAI recently informed UK Parliament that creating tools like ChatGPT without copyrighted materials would be "impossible," arguing contemporary copyright laws encompass nearly all human expression. This admission follows multiple lawsuits, including a New York Times case alleging "unlawful use" of copyrighted content for AI training.

The legal landscape remains complex, with a 2023 federal court ruling upholding that AI-generated content cannot receive copyright protection, based on established principles requiring human creative input. These developments underscore the evolving challenges at the intersection of intellectual property law and artificial intelligence development.