Blog · Jun 26, 2026

Anthropic Says Alibaba Distilled Claude With 25,000 Fake Accounts. Here's What That Actually Means.

Anthropic says Alibaba used 25,000 fake accounts to distill Claude into Qwen. Here's what that claim actually means for model moats and your data.

By Ruth Okafor 6 min read Updated Jun 26, 2026

#distillation #ip #geopolitics #frontier-labs #security

The short version

Anthropic claims Alibaba ran roughly 25,000 fake accounts and pumped 28.8 million exchanges through Claude between April and June 2026 to copy its model’s behavior into the Qwen family. If true, it means your expensive frontier model can be partly cloned by anyone with a credit card and patience. The moat was never the weights. It’s the outputs, and outputs leak.

What is Anthropic actually accusing Alibaba of?

Anthropic says Alibaba didn’t hack anything. There’s no break-in here, no stolen weights, no insider walking out with a hard drive. The accusation is sneakier than that, and honestly more interesting.

The claim is that Alibaba set up around 25,000 fake accounts and sent something like 28.8 million queries through Claude over a three-month window, then used Claude’s answers as training material to teach its own Qwen models how to think and respond more like Claude. That technique has a name: distillation. You take a strong “teacher” model, ask it millions of questions, collect its responses, and train a “student” model to imitate them. The student ends up cheaper to run and surprisingly capable, because it learned from a model that already cost hundreds of millions to build.

If you want the specifics, the original report lays out the numbers and the timeline, including a leaked letter to Washington where Anthropic also pushes for tighter semiconductor export controls. Alibaba, for its part, has said nothing.

I want to be careful here. These are allegations. Anthropic has not, as far as I’ve seen, published the kind of forensic proof that would settle it. And “this model sounds like our model” is a genuinely hard thing to prove, because all the frontier models are trained on overlapping internet text and increasingly sound alike anyway. So treat the 25,000 number as a claim, not a verdict.

But the claim is specific enough to be worth thinking about seriously.

Why does the 25,000-fake-account number change anything?

Because it tells you how the copying happened. And the how is the whole story.

For a long time I thought about model moats the way most people do. The big labs have the compute, the data, the talent, the secret training recipes. Smaller players can’t catch up because they can’t afford the GPUs or the research. The moat is the model itself, sitting safely behind an API.

The distillation angle breaks that mental model. You don’t need the weights. You don’t need the training recipe. You just need access to the outputs, which the lab is happily selling you through an API at a few dollars per million tokens. Every answer Claude gives is a tiny lesson. Collect enough of them and you’ve reverse-engineered a meaningful chunk of the teacher’s behavior without ever seeing inside it.

The 25,000 accounts matter because they show the scale problem. A single account hammering an API with millions of identical-looking queries gets flagged and rate-limited fast. So if you spread the load across tens of thousands of accounts, each one looks like an ordinary user. That’s the part that reframes things for me. The defense isn’t “hide the model.” The model was never exposed. The defense is “detect coordinated harvesting,” which is a fraud-and-abuse problem, not a cryptography problem.

And fraud detection is hard. Ask anyone who’s run a marketplace or a free trial. Telling 25,000 bot accounts apart from 25,000 enthusiastic developers is a cat-and-mouse game you never fully win.

Is distillation actually illegal, or just rude?

This is where it gets murky, and where I think a lot of the heat in this story is really coming from.

Distillation as a technique is completely normal. Labs distill their own big models into smaller ones all the time. Researchers publish papers on it. It’s a standard part of the toolkit. There’s nothing inherently shady about training a small model to imitate a big one.

What Anthropic is alleging is something narrower: that Alibaba used Claude this way in direct violation of Anthropic’s terms of service, which forbid using the output to build a competing model, and did it through deception at scale. So the question isn’t “is distillation bad,” it’s “did someone break a contract and lie about who they were to do it.”

That’s a terms-of-service fight dressed up in national-security clothing. And the export-control letter to Washington tells you Anthropic knows the second framing lands harder. “A competitor violated our API terms” is a lawsuit. “A Chinese company siphoned our model and we need stronger chip controls to stop it” is a policy campaign. Same facts, very different volume.

I’m not cynical about Anthropic’s safety concerns. I think they’re real. But I’d be lying if I said the geopolitical packaging didn’t also serve Anthropic’s competitive and regulatory interests. Both things can be true.

What does this mean for everyone who isn’t a frontier lab?

Here’s why I think this matters even if you’ll never train a model.

First, it tells you something about pricing and access. If labs decide that open API access is how their models get cloned, they’ll tighten things up. Expect more aggressive rate limits, more identity verification, more friction for legitimate developers, and possibly higher prices on the cheapest tiers. The bot-harvesting problem gets solved on the backs of normal users, the way it usually does.

Second, it explains why the gap between the best closed models and the best open ones keeps shrinking faster than the resource gap would suggest. If Qwen and other strong open-weight families are partly learning from frontier outputs, then “open models are only six months behind” starts to make sense. They’re not independently reinventing everything. They’re standing on outputs the leaders generated.

Third, and this is the practical one: it’s a reminder that anything you put through an API is data the other side can keep. If a lab can harvest a competitor’s outputs at scale, then everything you type into a chatbot is just as harvestable by whoever runs it. None of this is private in the way people assume. I keep coming back to that. The same mechanism that lets one lab copy another is the mechanism that lets any provider learn from you.

So what should you actually do about it?

If you’re picking models, don’t overweight brand prestige. The behavioral gap between the top closed model and a strong open one is narrower than the marketing implies, partly because of exactly this kind of cross-pollination. For a lot of real work, a cheaper or open model gets you 90 percent of the way.

If you’re building on an API, assume your provider’s access rules could tighten suddenly. Don’t architect something that breaks if rate limits drop or verification gets stricter. Build in fallback to a second model.

If you care about privacy, treat every prompt as logged and potentially used. Don’t paste secrets, client data, or anything you’d hate to see in a training set.

And if you just want to follow the story: watch whether Anthropic files an actual lawsuit or keeps this in the press-and-policy lane. A real complaint means they think they can prove it. A continued PR campaign means the goal was always the chip-control conversation in Washington.

My honest read? The technical claim is plausible and the scale is the genuinely new part. Whether it holds up legally, I have no idea. But the lesson lands either way: the moat was never the model. It was always the outputs, and outputs walk out the door one query at a time.

FAQ

What is model distillation in plain terms? It’s training a smaller, cheaper model to imitate a bigger, smarter one by feeding it the big model’s answers. The student copies the teacher’s behavior without ever seeing how the teacher was built.

Did Alibaba admit to any of this? No. As of now Alibaba has not commented on the accusation. Everything public comes from Anthropic’s side, so treat it as an unproven allegation.

Why does Anthropic want stricter semiconductor export controls? The argument is that limiting advanced chips to Chinese firms slows their ability to train and run competitive models. It also conveniently aligns with Anthropic’s own competitive position, which is worth keeping in mind.

Could this happen to any AI company, not just Anthropic? Yes. Any model exposed through a public API can have its outputs harvested at scale. The defense is fraud detection, not encryption, and that’s a genuinely hard problem.

Does this mean open models are basically copies? Not entirely. Open models do plenty of original work. But if some of them learn from frontier outputs, it helps explain why they stay so close behind despite far smaller budgets.

Found this useful? Read more from the blog →