What’s a fair use?

The media has reported that Meta (the Facebook, Instagram and WhatsApp company) has won a legal case on the use of copyrighted materials in training its AI models, that the use of copyright materials was a ‘fair use’. As often with the law, it’s a bit more complicated than that.

The case in question was Kadrey v Meta, and summary judgement was released last week (the judge, Vince Chhabria, deciding on the basis of arguments that the case did not need to go to jury trial because the plantiffs had not made a convincing case, enabling Meta to succeed in a call to dismiss it). The legal question at issue was whether the accepted abuse of copyrighted works in training AI amounts to a ‘fair use’. As well as considering fairness, the case opens a wider window on AI.

Before delving, I will note that I’ll continue to use the term AI, because it’s used in the case and the term is in general use for these emerging new technologies. But as both recent books The AI Con and AI Snake Oil (the two latest additions to my bookshelf) start off by making clear, there is no such single thing as AI. It is a catch-all term for a range of technologies – some of only very dubious effectiveness – and is really just a brand that is being deployed to raise (enormous amounts of) funding (two headlines from the Financial Times over this weekend cast light on the scale of this financing: Meta seeks $29 billion from private credit giants to fund AI data centres, and Nvidia insiders cash out $1 trillion worth of shares). The best known, and most used, of these new AI technologies are called large language models (LLMs), accurately described as stochastic parrots: models that simply put one word after another according to statistical models developed through their training.

Many legal systems favour the term fairness, and ‘fair use’ is a well-established concept in US law. The country’s Copyright Act (in 17 USC §107) clearly restricts fair use to usage “for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research”. It sets out four factors that should be considered in determining whether a given use is in fact fair:

1. the purpose and character of the use, including whether such use is of a commercial nature or is for non-profit educational purposes;
2. the nature of the copyrighted work;
3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
4. the effect of the use upon the potential market for or value of the copyrighted work.

Deciding what uses are fair is both a matter of law and of the specific facts, meaning that there are multiple cases that have considered these factors. The list of four factors is not exhaustive, but are assistants in reaching the overall conclusion. The fourth factor, whether the use risks substituting for the copyright materials in the marketplace, is generally seen to be the most important. Courts need to apply judgment and consideration is deciding on fair use; as ever, assessing fairness requires thought and judgment.

As judge Chhabria explains in his summary judgement:

“What copyright law cares about, above all else, is preserving the incentive for human beings to create artistic and scientific works. Therefore, it is generally illegal to copy protected works without permission. And the doctrine of “fair use,” which provides a defense to certain claims of copyright infringement, typically doesn’t apply to copying that will significantly diminish the ability of copyright holders to make money from their works.”

He is as rude as a judge ever gets about a fellow judge who reached a recent decision on a fair use case in relation to Anthropic, another AI firm (Order on Fair Use at 28, Bartz v Anthropic PBC, No. 24-cv-5417 (N.D. Cal. June 23, 2025), Dkt. No. 231). That judge was convinced by the argument that training AI was no different from – and had no more impact on the market for copyright products – than training schoolchildren to write. Chhabria says: “when it comes to market effects, using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take. This inapt analogy is not a basis for blowing off the most important factor in the fair use analysis.”

And surprisingly given his overall ruling, Chhabria is very clear that AI companies are breaching copyright law and are damaging the commercial market for copyrighted works. He seems very sure that AI companies fail at the fourth factor in assessing fair use: “by training generative AI models with copyrighted works, companies are creating something that often will dramatically undermine the market for those works, and thus dramatically undermine the incentive for human beings to create things the old-fashioned way”.

Chhabria also notes a simple flaw in one of the AI companies’ arguments: that applying copyright law will stifle the development of this technology. He notes that any finding that this use of copyrighted materials isn’t fair use does not bar that use, it just requires that AI companies need to reach a commercial agreement with copyright holders to compensate them for the – unfair – use of their materials. As he points out, these businesses project that they will make billions, indeed trillions, of dollars from AI services, so should be able readily to afford such licensing. Indeed, the court saw evidence that Meta initially sought to licence book materials for training purposes, and considered spending up to $100 million on doing so. This never happened because book publishers do not hold rights to this use of book materials – like other novel uses, the rights rest with the authors – so there is no central point or points for such a negotiation. The fact that AI companies are seeking direct commercial benefit from their use of copyright materials makes their burden in demonstrating fair use much harder.

Despite Chhabria’s conclusions that seem to strongly favour the copyright-holders who brought the case, he nonetheless found against them. The copyright holders are 13 authors who argued that their works had been used in training Facebook’s Llama LLM models. In essence they failed in their claim because their lawyers focused their efforts and arguments in the wrong place. They made their arguments predominantly under the first three of the four factors in §107 of the Copyright Act, and failed in those. While the fourth factor – the effect of the use on the potential market for the copyrighted work – is generally seen as the most important, that is not an argument they made strongly. They simply did not argue (or were at best “half-hearted” in those arguments) that their works had been used as the basis for a tool which might flood the market with similar works, undermining the value of their copyright, nor did they provide evidence to support such an argument. This was “the potentially winning argument” according to Chhabria; the (weaker) points actually deployed in argument before the court did not succeed.

Chhabria was clear:

“this ruling does not stand for the proposition that Meta’s use of copyrighted materials to train its language models is lawful. It stands only for the proposition that these plaintiffs made the wrong arguments and failed to develop a record in support of the right one.”

It does seem ludicrous that the most valuable companies in the world should argue that it is fair for them to take stolen copies of books subject to copyright protection (the training materials were taken from so-called ‘shadow libraries’, of illegally scanned books) and make what they predict will be huge commercial profits as a direct result, while providing the copyright holders with no compensation. The fact that Meta explored licensing but found it too difficult and delaying helps support the case that this would be the right thing to do.

The Kadrey case reports one other specific element of the training of Llama models – that they are taught not to produce more than 50 words together that are repeated from any one source (even if provided with highly directive prompts to do so). The fact that this is a deliberate part of the training shows just how prone these technologies are just to leaning on what they have read. In a recent Financial Times interview, Professor Emily Bender, coiner of the term stochastic parrots and co-author of both the academic article that brought the term to prominence and of The AI Con, is quoted as calling LLMs “plagiarism machines”.

I have to admit that, as may be apparent from my recent reading habits, that I am an AI sceptic. I suspect that we will look back on this period with puzzlement, and wonder why we threw colossal amounts of computing power – and colossal levels of energy in our carbon-constrained world – at jobs that human brains are better at. AI is neither artificial nor intelligent: it isn’t artificial because it depends on human creativity in the training, and it also depends on significant, horrible, labour (typically cheap precarious labour in emerging economies) in cleansing the models of the filth that it produces because it has been trained on, among other things, the global sewer that is the Internet. It isn’t intelligent, it’s just reproducing others’ language patterns based on statistics, “haphazardly stitching together sequences of linguistic forms it has observed in its vast training data…without any reference to meaning” as the stochastic parrots paper put it. As Bender told the FT, we are “imagining a mind behind the text…the understanding is all on our end”. There will no doubt be jobs that AI technologies are useful for, but like any human tool it is tailored to its task, and not a general purpose vehicle for all activity. Currently we have a hammer and are making the mistake of seeing everything as a nail.

As a result, I suspect that much of the billions being deployed in AI currently will turn out to have been wasted. I should admit also that my view may be coloured by the fact that I entered the investment world exactly at the time of the dotcom bubble. While I avoided losing money in the dotcom bust, I also missed out on investment gains as that bubble inflated.

But this is a blog on fairness, not AI cynicism. The Kadrey decision did not conclude that Meta’s actions were fair, only that the copyright-holders had failed to deploy the arguments that might have shown how unfair the use of their materials was. This will clearly not be the last such case, and while the AI businesses will continue to deploy some of their investors’ millions into their defence, judge Chhabria’s legal conclusions suggest they will have a challenging time winning cases argued on the right basis.

Rather than finding that Meta’s use was fair, the Kadrey decision is highly suggestive that AI is not fair in its use and abuse of copyright materials. That feels right: fairness should always tend to rebalance power away from those with billions towards those of whom they take uncompensated advantage.

See also: Learning from the stochastic parrots
Amazon resurrects the worst of the industrial revolution
A just AI transition?

I am happy to confirm as ever that the Sense of Fairness blog is a purely personal endeavour

Kadrey v Meta, Case No. 23-cv-03417-VC, Summary Judgement 25 June 2025 (Docket Nos 482, 501)

The AI Con: How to Fight Big Tech’s Hype and Create the Future We Want, Emily Bender, Alex Hanna, Bodley Head, 2025

AI Snake Oil: What Artificial Intelligence Can Do, What it Can’t, and How to Tell the Difference, Arvind Narayanan, Sayash Kapoor, Princeton University Press, 2024

Meta seeks $29 billion from private credit giants to fund AI data centres, Eric Platt, Oliver Barnes, Hannah Murphy, Financial Times, 27 June 2025

Nvidia insiders cash out $1 trillion worth of shares, Michael Acton, Patrick Templeton-West, Financial Times, 29 June 2025

The Copyright Act, 17 USC

AI sceptic Emily Bender: ‘The emperor has no clothes’, George Hammond, Financial Times, 20 June 2025

On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?, Emily Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell, Proceedings of FAccT 2021

One thought on “What’s a fair use?

Comments are closed.