Sara Silverman is the author of The Bedwetter, a comedy memoir. Richard Kadrey wrote Sandman Slim, a fantasy novel series. Christopher Golden, a supernatural thriller titled Ararat.
These authors might not seem to have much in common with an academic author who writes in history, physics, or chemistry. Or a journalist. Or a poet. Or, for that matter, me, writing this blog post. And yet, these authors may end up representing us all in court.
A large number of the recent AI copyright lawsuits are class action lawsuits. This means that these lawsuits are brought by a small number of plaintiffs who (subject to judicial approval) are granted the right to represent a much larger class. In many of the AI copyright lawsuits, the proposed classes are extraordinarily broad, including many creators who might be surprised that they are being represented. If you live in the US and wrote something that was published online, there is a good chance that you are included in multiple of these classes.
A very brief background on class action lawsuits
Class actions can be an efficient way of resolving disputes that involve lots of people, allowing for a single resolution that binds many parties when there are common interests and facts. As you can imagine, the class action mechanism can also attract misuse, for example, by plaintiffs (and their attorneys) who may seek large settlements on behalf of a large number of people. Those settlements may benefit the named plaintiffs and their attorneys but they aren’t really aligned with the interests of most class members.
There are rules in place to prevent that kind of abuse. In federal courts (where all copyright lawsuits must be brought), Rule 23 of the Federal Rule of Civil Procedure governs. It provides that:
“One or more members of a class may sue or be sued as representative parties on behalf of all members only if:
(1) the class is so numerous that joinder of all members is impracticable; [“numerosity”]
(2) there are questions of law or fact common to the class; [“commonality”]
(3) the claims or defenses of the representative parties are typical of the claims or defenses of the class; [“typicality”] and
(4) the representative parties will fairly and adequately protect the interests of the class. [“adequacy”]”
The rest of Rule 23 contains a number of other safeguards to protect both class members and defendants. Among them are requirements that the court must certify that the class complies with rule 23, that any proposed settlements be approved by the court, and that class members receive notice of any proposed settlement and an opportunity to object. Additionally, there are a number of rules to ensure that the law firm bringing the suit can fairly and competently represent the class members.
Class definition and class representatives in the copyright AI lawsuits
We believe it’s important for creators to pay attention to these suits because if a class is certified and that class includes those creators, the class representatives will have meaningful legal authority to speak on their behalf.
Rule 23 provides that “at an early practicable time after a person sues,” the court must decide whether to certify the proposed class. Though we are now well over a year into some of the earliest suits filed, this has yet to happen. In the meantime what we have are proposed class definitions offered by plaintiffs. How broadly or narrowly a class is defined by the plaintiffs will be one of the most important factors in whether the class can be certified since it will directly affect the commonality of facts among the class, the typicality of claims, and whether the representatives can fairly and adequately represent the interests of the class. Plaintiffs have the burden of proving that they have satisfied Rule 23.
In these AI lawsuits, we see some themes in terms of class representative and proposed classes, with many offering very broad class definitions. For example, in the now-consolidated In re OpenAI ChatGPT Litigation, the class representatives are 11 fiction writers of books such as The Cabin at the End of the World, The Brief Wondrous Life of Oscar Wao, What the Dead Know and others.
They propose to represent a class defined as follows:
“All persons or entities domiciled in the United States that own a United States copyright in any work that was used as training data for the OpenAI Language Models during the Class Period [defined as June 28, 2020 to the present].”
This kind of broad “anyone with a copyright in a work used for training” approach to class definition is repeated in a few other suits. For example, the consolidated Kadrey v. Meta lawsuit has a similar (and overlapping) grouping of fiction author class representatives and an almost identical proposed class definition. Dubus v. NVIDIA is another suit that takes essentially the same approach.
Other AI lawsuits have more variation in class representatives. Huckabee v. Bloomberg, for example, is another suit with a similar class definition (basically, all copyrighted works owned by someone in the US and used for training Bloomberg’s LLM) but with class representatives that are a bit different: mostly authors of religious books and of course, Mike Huckabee, a politician.
There is at least one class action that is more precise both in terms of proposed class representatives and their relation to the proposed class definition. The now-consolidated Authors Guild v. OpenAI suit has some 28 proposed class representatives, most of whom are authors of best-selling fiction and non-fiction trade books, 14 of whom are members of the Authors Guild. In this suit, the plaintiffs propose two classes: one for fiction authors and one for non-fiction authors. It also places some restrictions around them: class members for fiction works must be “natural persons” who are “sole authors of, and legal or beneficial owners of Eligible Copyrights in” fictional works that were registered with the U.S. Copyright Office and used for training the defendants’ LLMs (and this includes persons who are beneficiaries of works held by literary estates). For nonfiction authors, class members are “[a]ll natural persons, literary trusts, and literary estates in the United States who are legal or beneficial owners of Eligible Nonfiction Copyrights’ which the complaint defines as works used to train defendants’ LLMs and that have an ISBN with the exception of any books classified as reference works (BISAC code REF).
Some challenges and dangers
When you consider the scale and scope of materials used to train the AI models in question, you can immediately see some of the challenges that are likely to arise with relatively small groups of authors attempting to represent practically all individual U.S. copyright owners.
While the exact training materials used for the models at issue remain opaque, it’s definitely true that they were not just trained on modern fiction. There is widespread acknowledgment that these models are trained on a large amount of content scraped from across the internet using data sources such as Common Crawl. This, in effect, means that these suits implicate the rights of millions of rights holders, with interests as diverse as those of YouTube content creators, computer programmers, novelists, academics, and more.
How can these representatives fairly and adequately represent such a broad and diverse group–especially when many may disagree with the underlying motivations for the suit to begin with–is a tough question. Even the Authors Guild consolidated case, which is much more careful in terms of class definition, includes classes that are breathtakingly broad when one considers the diversity of authorship within them. The fiction author class, for example, could include everyone from NY Times bestselling authors to fan fiction writers. The nonfiction class, which is at least limited to nonfiction book authors of works assigned an ISBN, could similarly include everyone from authors of popular self-help books distributed by the millions to scholarly books with print runs in the low hundreds and distributed online on open-access terms. The interests, financial and otherwise, of those authors can vary significantly.
Beyond the adequacy of representatives (along with questions about whether their experiences are really typical of others in the proposed class), there are other challenges unique to copyright law, for example, the opaque nature of ownership (there is no official public record of who owns what), making ascertaining who actually falls within the class an initial challenge. Compounding that, there are a dizzying variety of unique terms under which works are distributed online, some of which may afford AI developers a viable defense for many works. A fair use defense also requires some level of assessment of the nature of the works used, a fact-intensive inquiry that will vary from one work to another. This just scratches the surface of some of the issues that likely mean there really aren’t common questions of law or fact among the class.
Conclusion
There are good reasons to think that the classes as currently defined in these lawsuits are too broad. For some of the reasons mentioned above, I think it will be difficult for courts to certify them as is. But this doesn’t mean authors and other rightsholders should sit back and assume that their interests won’t be co-opted by others in these suits who seek to represent them. We don’t know when the courts will actually address these class certification issues in these suits. When they do, it will be important for authors to speak up.