OpenAI, Google and Anthropic are sharing threat intelligence through the Frontier Model Forum to detect adversarial distillation — automated query attacks where Chinese labs extract outputs from frontier US models and use them to train cheaper copycat systems. Anthropic has documented over 16 million suspicious exchanges traced to DeepSeek, Moonshot AI and MiniMax. The collaboration is the rarest kind of move in AI: three direct rivals pooling defensive data because the threat to their economics — and to safety guardrails — outweighs the competitive cost of cooperation.
What just happened?
On April 6, 2026, Bloomberg reported that three of the most fiercely competing AI labs in the world — OpenAI, Anthropic and Google DeepMind — have started sharing information through the Frontier Model Forum, an industry nonprofit that the three tech companies founded with Microsoft Corp. in 2023, to detect so-called adversarial distillation attempts that violate their terms of service.
The unspoken subtext: the companies suing each other over talent, accusing each other of data scraping, and racing each other to AGI have decided that defending against Chinese model extraction is more important than competing with each other. That is not a small statement.
This is the first time the Frontier Model Forum — which most observers had written off as a PR shell since its 2023 launch — has been used as an operational threat-intel channel. And the trigger was not a single incident. It was a year of mounting evidence that adversarial distillation is no longer hypothetical.
What is adversarial distillation, in plain English?
Standard model distillation is one of the most useful techniques in modern ML. A large, expensive “teacher” model generates outputs, and a smaller “student” model is trained to mimic those outputs — keeping most of the capability at a fraction of the inference cost. Every major lab does it internally. It is how you get GPT-4-class behaviour out of a 7B parameter model running on your laptop. (For the mechanics, see our explainer on LoRA and fine-tuning.)
Adversarial distillation is the same technique, but pointed at someone else’s API. Instead of distilling from a model you trained, you fire millions of carefully designed prompts at a competitor’s hosted model, harvest the outputs, and use those input/output pairs as training data for your own system. You skip the billions of dollars spent on pretraining, the months of RLHF, and the entire safety pipeline. You inherit the capability without inheriting the cost — or the guardrails.
Adversarial distillation pipeline: automated extraction → dataset → student model without guardrails.
The safety angle is the part most coverage glosses over. When you distill from outputs alone, you copy the behaviour but you do not copy the alignment infrastructure that produced it. Refusals can be fine-tuned away in hours. RLHF reward models cannot be reverse-engineered from text. The student is, by construction, a less safe version of the teacher.
How big is the problem? Anthropic put a number on it
The most concrete data point in the entire story comes from Anthropic. According to implicator.ai’s reporting, Anthropic documented 16 million exchanges from DeepSeek, Moonshot AI, and MiniMax that the company classified as adversarial distillation traffic. Sixteen million. From three labs. To one model provider.
That number does two things at once. It makes the abstract problem concrete, and it implicitly answers the question “why now?” — because once you can quantify an attack at that scale, internal security teams stop being able to wave it off as background noise. It becomes a board-level item.
| Lab named publicly | Country | What Anthropic / OpenAI alleged |
|---|---|---|
| DeepSeek | China | R1 reasoning model allegedly trained on extracted OpenAI outputs; “free-riding” per OpenAI memo to House Select Committee |
| Moonshot AI | China | Named by Anthropic in February 2026 as extracting Claude capabilities |
| MiniMax | China | Named alongside Moonshot in the same Anthropic disclosure |
OpenAI’s contribution to the public record is its memo to the House Select Committee on China, in which the company accused DeepSeek of trying to “free-ride on the capabilities developed by OpenAI and other US frontier labs”. OpenAI also told lawmakers earlier in 2026 that DeepSeek had continued using increasingly sophisticated extraction tactics despite enhanced platform defences.
Why did DeepSeek’s R1 trigger all of this?
To understand why three rival labs are now sharing data, you have to rewind to January 2025. DeepSeek released R1 — a reasoning model that, on paper, matched O1-class performance at a fraction of the training and inference cost. The market reaction was immediate and brutal: hundreds of billions of dollars wiped off US AI-exposed equities in a single trading session.
Microsoft and OpenAI launched a joint investigation into whether R1’s training data contained large volumes of OpenAI model outputs. They never published a definitive forensic conclusion — distillation is hard to prove from weights alone — but the investigation itself crystallised something that had been only loosely articulated before: open-weight Chinese models were not just competing on price, they may have been competing on capability borrowed without authorisation.
US frontier labs have spent hundreds of billions on data centres, training runs and safety research, then priced their APIs to recover that cost. Open-weight Chinese models are reportedly available at roughly 14× lower cost per token. If a meaningful share of that cost gap comes from skipping the original training work via distillation, then every additional defensive measure US labs deploy is — economically speaking — a tax they pay that their competitors do not.
What can the Frontier Model Forum actually do?
This is where the story gets honest about its own limits. The Forum is an industry nonprofit, not a regulator. It cannot subpoena, fine, or block anyone. What it can do, now that the three biggest US labs are willing to use it as a channel:
Share extraction signatures. If Anthropic detects a particular pattern — query distribution, prompt structure, IP fingerprint, account-creation behaviour — it can pass that signature to OpenAI and Google. Within hours, the same behaviour can be flagged across all three platforms instead of being rediscovered independently weeks apart.
Coordinate account bans. An attacker who gets banned on Claude can no longer simply migrate to GPT and start over. Cross-lab blocklists raise the operational cost of running an extraction operation.
Establish baselines for “normal” API traffic. One of the hardest parts of detecting adversarial distillation is distinguishing it from a legitimate enterprise customer with high query volume. Pooled data lets the labs build better classifiers. (This is the same kind of pattern recognition we covered in our piece on AI for risk management.)
What it cannot do is stop a determined state-backed actor with unlimited fake accounts and rotating residential proxies. The labs know this. The Trump administration has signalled it wants to formalise the effort into something with more teeth: the AI Action Plan includes a proposal for an Information Sharing and Analysis Center (ISAC) specifically for adversarial distillation, modelled on the financial-sector ISAC that has existed since 1999.
Detecting extraction traffic — what it actually looks like in code
This is the part you do not see in mainstream coverage. Here is a stripped-down, illustrative version of the kind of heuristic that goes into a first-pass extraction-detection pipeline. It is not what the Frontier Model Forum runs — those signatures are confidential — but it captures the shape of the problem.
# Toy classifier for "looks like distillation traffic"
# Real systems use ML on dozens of features; this is the intuition.
from collections import Counter
def extraction_score(account_traffic):
"""
account_traffic: list of dicts with prompt, response_tokens, timestamp, ip
Returns a 0-1 score where higher = more likely adversarial distillation.
"""
prompts = [t["prompt"] for t in account_traffic]
n = len(prompts)
if n < 1000:
return 0.0 # too small to matter
# Signal 1: prompt diversity is suspiciously high but topic clustering is low
# Real users have themes. Distillation farms cover the response space evenly.
topic_entropy = compute_topic_entropy(prompts)
# Signal 2: response length distribution is uniform across the API max
# Real users hit a few common lengths. Farms try to capture every length bucket.
length_uniformity = uniformity(
Counter(t["response_tokens"] // 100 for t in account_traffic)
)
# Signal 3: temporal pattern is too regular for human use
# Humans have circadian gaps. Bots don't.
inter_arrival = regularity_score(
[t["timestamp"] for t in account_traffic]
)
# Signal 4: prompts include reasoning-elicitation patterns at high frequency
# ("think step by step", "explain your reasoning", "show your work")
reasoning_elicitation_rate = sum(
1 for p in prompts if is_reasoning_elicitation(p)
) / n
return weighted_combine(
topic_entropy, length_uniformity,
inter_arrival, reasoning_elicitation_rate
)
The catch: every one of those signals has a perfectly legitimate counterexample. A research lab fine-tuning a domain model legitimately wants high topic coverage. A SaaS product with a global user base legitimately runs 24/7. The hard part of building these classifiers is not detecting the obvious bots — it is not banning your largest enterprise customer by mistake. That is exactly the kind of false-positive cost that makes shared intelligence valuable: three labs comparing notes can rule out a flagged account faster than one lab acting alone.
What this means for the broader AI ecosystem
For open-weight model releases, the political climate just got harder. Every major Chinese lab now operates under a presumption of suspicion that it does not have a clean answer to. Even labs that did not extract anything will face questions about their training data provenance.
For US enterprise API customers, expect more friction. More aggressive rate limits, more identity verification, more behavioural challenges on accounts that look unusual. The labs cannot detect adversarial distillation without being more invasive about who is calling them and how.
For the safety conversation, this is the most interesting angle. Anthropic has been the loudest voice arguing that distilled models strip safety guardrails — and Anthropic is also the lab that took the most aggressive public stance, blocking Chinese-controlled companies from Claude entirely last year. (We covered Anthropic’s parallel move to block third-party tool wrappers in our piece on Anthropic blocking third-party tools.) The argument is consistent: alignment is not a property you can copy from outputs alone, and the world ends up less safe when capability propagates faster than the techniques that constrained it.
For the EU AI Act, the timing is awkward. The Act’s general-purpose AI obligations came into force in 2025, and they put most of the compliance burden on the original model provider — not on downstream actors who extract capabilities through API queries. There is a real regulatory gap here. (For the broader compliance picture, see our EU AI Act explainer.)
What does this mean in practice?
The honest take, unflattering to everyone involved: this collaboration is necessary, late, and probably not enough.
It is necessary because the alternative — three labs running parallel, partial defences — was visibly failing. Sixteen million flagged exchanges from one provider is not a leak, it is a flood.
It is late because the Frontier Model Forum has existed since 2023 and was reportedly dormant as an operational channel until the DeepSeek shock made the cost of inaction obvious. The labs needed a market-cap-adjusting event to override their default mode of competing instead of cooperating.
And it is probably not enough because adversarial distillation, as a technique, has economic gravity on its side. As long as a query-and-train approach is dramatically cheaper than original pretraining, someone, somewhere, will keep doing it — through whatever combination of fake accounts, residential proxies, intermediary jurisdictions and open-weight republishing the defenders have not yet learned to filter. The Frontier Model Forum can raise the cost of that game. It cannot end it.
What it can do — and this may be the underrated structural shift — is establish that frontier AI security is a shared infrastructure problem, not a per-company moat. That framing matters. It is the same framing that took a decade to settle in financial cybersecurity, and the same framing that the proposed AI ISAC would formalise. If the next twelve months turn this ad-hoc cooperation into durable infrastructure, the story stops being about three labs ganging up on China and starts being about an industry growing the immune system it should have built three years ago.
FAQ
What is adversarial distillation in AI?
Adversarial distillation is the practice of firing large volumes of automated queries at someone else’s frontier AI model, harvesting the outputs, and using those input-output pairs to train a competing model. It bypasses the cost of original pretraining and inherits capabilities — but not safety alignment — from the teacher model. It typically violates the API provider’s terms of service.
What is the Frontier Model Forum?
The Frontier Model Forum is an industry nonprofit founded in July 2023 by OpenAI, Anthropic, Google and Microsoft. Its stated mission is to advance AI safety research and best practices for frontier models. Until April 2026 it was widely seen as a policy-and-PR vehicle; the Bloomberg report confirmed it is now also being used as an operational threat-intelligence sharing channel.
How many extraction attempts has Anthropic actually documented?
According to reporting on the Bloomberg story, Anthropic has documented approximately 16 million exchanges that it classifies as adversarial distillation traffic, traced to three Chinese AI labs: DeepSeek, Moonshot AI and MiniMax. This is the most concrete public number tied to the scale of the problem.
Is normal model distillation illegal?
No. Distillation is a legitimate and widely used machine learning technique. Every major lab — including OpenAI, Google and Anthropic — uses it internally to compress large models into smaller, cheaper ones. The legal and ethical issue arises only when distillation is performed on a competitor’s hosted model in violation of that provider’s terms of service, which prohibit using outputs to train competing models.
Why is adversarial distillation a national security concern?
Two reasons. First, the resulting student models inherit the teacher’s capabilities but typically strip the safety alignment, producing systems that are less constrained against misuse. Second, US officials have estimated that unauthorised model copying costs US AI companies billions of dollars annually in lost revenue, weakening the economic foundation that funds frontier research and safety work.
Can the Frontier Model Forum actually stop Chinese labs from extracting models?
No, it cannot stop a determined state-backed actor with unlimited fake accounts and rotating IP infrastructure. What the cooperation can do is share attack signatures so detection happens within hours instead of weeks, coordinate cross-lab account bans, and raise the operational cost of running an extraction pipeline. The Trump administration has proposed formalising the effort into a dedicated AI Information Sharing and Analysis Center with more enforcement teeth.
What triggered the three labs to start cooperating now?
The proximate trigger was the accumulation of evidence through 2025 and early 2026 — most visibly DeepSeek’s R1 release in January 2025, which prompted Microsoft and OpenAI to investigate whether R1 was trained on extracted OpenAI outputs. The deeper trigger is that defensive efforts at individual labs were no longer keeping up with extraction at scale, making cooperation cheaper than parallel partial defences.
Bibliography & sources
- Metz, R., Bass, D., & Love, J. (2026, April 6). OpenAI, Anthropic, Google Unite to Combat Model Copying in China. Bloomberg. https://www.bloomberg.com/news/articles/2026-04-06/openai-anthropic-google-unite-to-combat-model-copying-in-china
- Grewal, H. (2026, April 6). OpenAI, Anthropic, Google Share Attack Data on Distillation. Implicator.ai. https://www.implicator.ai/openai-anthropic-google-share-attack-data-to-counter-chinese-ai-distillation/
- Business Today. (2026, April 7). OpenAI, Anthropic, Google team up to stop Chinese AI distillation threat. https://www.businesstoday.in/technology/story/openai-anthropic-google-team-up-to-stop-chinese-ai-distillation-threat-524367-2026-04-07
- The Japan Times. (2026, April 7). OpenAI, Anthropic and Google cooperate to fend off Chinese bids to clone models. https://www.japantimes.co.jp/business/2026/04/07/tech/openai-anthropic-google-china-copy/
- Frontier Model Forum. (2023). Founding Announcement. https://www.frontiermodelforum.org/
- Hinton, G., Vinyals, O., & Dean, J. (2015). Distilling the Knowledge in a Neural Network. arXiv:1503.02531. https://arxiv.org/abs/1503.02531
- European Parliament & Council. (2024). Regulation (EU) 2024/1689 — Artificial Intelligence Act. EUR-Lex. https://eur-lex.europa.eu/eli/reg/2024/1689/oj