Category Archives: Blog

Copyright Management Information, 1202(b), and AI

Posted October 30, 2024

This post is by Maria Crusey, a third-year law student at Washington University in St. Louis. Maria has been working with Authors Alliance this semester on a project exploring legal claims in the now 30+ pending copyright AI lawsuits. 

In the recent spate of copyright infringement lawsuits against AI developers, many plaintiffs allege violations of 17 U.S.C. § 1202(b) in their use of copyrighted works for training and development of AI systems.  

Section 1202(b) prohibits the “removal or alteration of copyright management information.” Compared to related provisions in 17 U.S.C. § 1201, which protects against circumvention of copyright protection systems, §1202(b) has seldom been litigated at the appellate level, and there’s a growing divide among district courts about whether §1202(b) should apply to derivative works, particularly those created using AI technology.

At first glance, §1202(b) appears to be a straightforward provision. However, the uptick in §1202(b) claims raises some challenging questions, namely: How does §1202(b) apply to the use of a copyrighted work as part of a dataset that must be cleaned, restructured, and processed in ways that separate copyright management information from the content itself? And how should 1202(b) apply to AI systems that may reproduce small portions of content contained in training data?  Answers to this question may have serious implications in the AI suits because violations of 1202(b) can come with hefty statutory damage awards – between $2,500 and $25,000 for each violation. Spread across millions of works, the damages could be staggering. How the courts resolve this issue could also impact many other reuses of copyrighted works–from analogous uses such as text data mining research to much more routine re-distribution of copyrighted works in other contexts. 

One of these AI cases has requested that the Ninth Circuit Court of Appeals accept an interlocutory appeal on just this issue, and we are waiting to see whether the court will accept it.

For an introduction to §1202(b) and observations on this question, among others, read on:

What is § 1202(b) and what is it intended to do?

Broadly, 17 U.S.C. § 1202 is a provision of the Digital Millennium Copyright Act (DMCA) that protects the integrity of copyright management information (“CMI”). Per §1202(c), CMI comprises certain information identifying a copyrighted work, often including the title, the name of the author, and terms and conditions for the use of a work.

Section 1202(b) forbids the alteration or removal of copyright management information. The section provides that:

“[n]o person shall, without the authority of the copyright owner or the law – 

(1) intentionally remove or alter any CMI,

(2) distribute or import for distribution CMI knowing that the CMI has been removed or altered without authority of the copyright owner or the law, or 

(3) distribute, import for distribution, or publicly perform works, copies of works or phonorecords, knowing that copyright management information has been removed or altered without authority of the copyright owner or the law, knowing, or with respect to civil remedies under section 1203, having reasonable grounds to know that it will induce, enable, facilitate, or conceal an infringement of any right under this title.”

17 U.S.C. § 1202(b).

Congress primarily aimed to limit the assistance and enablement of copyright infringement in its enactment of §1202(b). This purpose is evident in the legislative history of the provision. In an address to a congressional subcommittee prior to the adoption of the DMCA, the then–Register of Copyrights, Marybeth Peters, discussed the aims of §1202(b). First, Peters noted that the requirements of §1202(b) would make CMI more reliable and thus aid in the administrability of copyright law. Second, Peters stated that §1202(b) would help prevent instances of copyright infringement that could come from the removal of CMI. The idea is if a copyrighted work lacks CMI, there is a greater likelihood of infringement since others may use the work under the pretense that they are the author or copyright holder. In creating a statutory violation for a party’s removal of CMI, regardless of later infringing activity, §1202(b) functions as damage control against potential copyright infringement.

What are the essential elements of a § 1202(b) claim?

To have a claim under §1202(b), a plaintiff must allege particularized facts about the existence and alteration or removal of CMI. Additionally, some courts require a plaintiff to demonstrate that the defendant had knowledge that the CMI was being altered or removed and that the alteration or removal would enable copyright infringement. Finally, some courts have required plaintiffs to show that the work with the altered or removed CMI is an exact copy of the original work–what has become known as the “identicality” requirement. This last “identicality” requirement is one of the main issues in the AI lawsuits raising §1202(b) and is detailed further below.

→ The “Identicality” Requirement

Courts that have imposed “identicality” have required that plaintiffs demonstrate that the work with the removed CMI is an exact copy of the original work and thus is “identical,” except for the missing or altered CMI. 

Suppose, for example, a photographer owns the copyright to a photograph they took. The photographer adds CMI to the photograph and takes care to protect the integrity of the work as it is dispersed online. A third party captures the photograph posted on a website by taking a screenshot and removes the CMI from the copied image while keeping all other aspects of the original photograph the same. The screenshot with the removed CMI is an “exact copy” of the original photograph because the only difference between the copyrighted photograph and the screenshot is the removal of the CMI.

Federal courts are divided in imposing the identicality requirement for §1202(b) claims, though the circuit courts have not yet addressed the issue. Notably, district courts of the Ninth Circuit Court of Appeals have varied in their treatments of the identicality requirement. For example, the court for the District of Nevada in Oracle v. Rimini Street declined to impose the identicality requirement because the requirement may weaken the intended protections for copyright holders under §1202(b). Conversely, in Kirk Kara Corp. v. W. Stone & Metal Corp., a court in the Central District of California applied the identicality requirement, though it provided little explanation for why it adopted it. Application of the identicality requirement is also unsettled in district courts beyond the Ninth Circuit (see, for example, this Southern District of Texas case discussing at length the identicality requirement and rejecting it). 

What are the §1202(b) claims at issue in the present suits?

The claims in Doe 1 v. Github exemplify the §1202(b) issues common among the present suits, and it is the Github suit that is presently before the Ninth Circuit Court of Appeals to take, if it wishes, on appeal.  

In Github, owners of copyrights in software code brought a suit against GitHub, a software developer platform. The plaintiffs alleged that Microsoft Copilot, an AI product developed in part by GitHub, illegally removed CMI from their works. The plaintiffs stored their software in GitHub’s publicly accessible software repositories under open-source license agreements. The plaintiffs claimed that GitHub removed CMI from their code and trained the Copilot AI model on the code in violation of the license agreements. Moreover, the plaintiffs claimed that, when prompted to generate software code, Copilot includes unique aspects of the plaintiffs’ code in its outputs. In their complaint, the plaintiffs alleged that all requirements for a valid § 1202(b) claim were met in the present suit. The plaintiffs stressed that, in removing CMI, the defendants failed to prevent users of products from making non-infringing use of the product. Consequently, they claim, the defendants removed the CMI, knowing that it would “induce, enable, facilitate, and/or conceal infringement” of copyrights in violation of the DMCA.

Regarding the §1202(b) claims, the parties contest the application of the identicality requirement. The plaintiffs first argue that § 1202 contains no such requirement: “The plain language of DMCA § 1202 makes it a violation to remove or alter CMI. It does not require that the output work be original or identical to obtain relief. . . By a plain reading of the statute, there is no need for a copy to be identical—there only needs to be copying, which Plaintiffs have amply alleged.” 

As a backstop, the plaintiffs further argue that Copilot does produce “near-identical reproduction[s]” of their copyrighted code and allege this is sufficient to fulfill the identicality requirement under §1202(b). Specifically, plaintiffs claimed that Copilot generates parts of plaintiffs’ code in extra lines of output code that are not relevant to input prompts. Plaintiffs also claimed Copilot generates their code in output code that produces errors due to a mismatch between the directly copied code and the code that would actually fit the prompt. To make this assertion work, plaintiffs distinguish their version of “identicality” –semantically equivalent lines of code–from a reproduction of the whole work. They argue that the defendant’s position, that “the reproduction of short passages that may be part of [a] larger work, rather than the reproduction of an entire work, is insufficient to violate Section 1202,” would lead to absurd results. “By OpenAI’s logic, a party could copy and distribute a fragment of a copyrighted work—say, a chapter of a book, a stanza of a poem, or a scene from a movie—and face no repercussions for infringement.” 

 In their reply, the defendants countered that §1202, which defines CMI as relating to a “copy of a work,” requires a complete and identical copy, not just snippets. Defendants noted that the plaintiffs have conceded that Copilot reproduces only snippets of code rather than complete versions of the code. Therefore, the defendants argue, Copilot does not create “identical copies” of the plaintiffs’ complete copyrighted works. The argument is based on both the text of the statute (they note that the statute only provides for liability when distributing copies that CMI has been stripped from, not derivatives, abridgments, or other adaptations), and they bolster those arguments by suggesting that allowing 1202 claims for incomplete copies would create chaos for ordinary uses of copyrighted works: “On Plaintiffs’ reading of § 1202, if someone opened an anthology of poetry and typed up a modified version of a single “stanza of a poem,” . . . without including the anthology’s copyright page, a § 1202(b) claim would lie. Plaintiffs’ reading effectively concedes that they are attempting to turn every garden-variety claim of copyright infringement into a DMCA claim, only without the usual limitations and defenses applicable under copyright law. Congress intended no such thing.” 

The GitHub court has addressed the issue now several times: it initially dismissed the plaintiffs’ §1202(b)(1) and (b)(3) claims, subsequently denied the plaintiffs’ motion for reconsideration of the claims, allowed the plaintiffs to amend their complaint and try again with more specificity, then dismissed the claims again. The reasoning of the court has been consistent, and largely focused on insufficient allegations of identicality. The court agreed with Defendants that the identicality requirement should apply and that the snippets do not satisfy the requirement. Following the dismissal, the plaintiffs sought and received permission from the district court to file an interlocutory appeal (an appeal on a specific issue before the case is fully resolved– something not usually allowed) to the Court of Appeals for the Ninth Circuit to determine whether § 202(b)(1) and (b)(3) impose an identicality requirement. The Ninth Circuit is presently considering whether to hear the appeal.

What would the Ninth Circuit assess in the appeal, and what are the implications of the appeal for future lawsuits?

If the appeal is accepted, the Ninth Circuit will determine whether §1202(b)(1) and (b)(3) actually impose an identicality requirement. Moreover, with regard to the facts of the Github case, the court will decide whether the identicality requirement requires exact copying of a complete copyrighted work, or perhaps something less. The Ninth Circuit’s hearing of this appeal would be notable for a number of reasons.

First, as mentioned above, §1202(b) is largely unaddressed by the circuit courts, and explicit appellate guidance has only been provided for the knowledge requirement referenced above. Consequently, determinations of §1202(b) claims are largely informed by varying district court decisions that are binding only on the parties to the suits and provide inconsistent interpretations of the requirements for a claim under the provision. An appellate ruling that accepts or rejects the identicality requirement would create additional binding authority to further clarify courts’ interpretations of §1202(b).

Second, a ruling on the identicality requirement from the Ninth Circuit specifically would be notable because it would be binding on the large number of §1202(b) claims presently being litigated in the Ninth Circuit’s lower courts. And, given the centrality of AI developers operating in California and elsewhere in the Ninth Circuit, the outcome of the appeal would significantly impact future lawsuits that involve §1202(b) claims.

It is hard to predict how the Ninth Circuit might rule, but we can work through some of the implications of the choices the court would have before it: 

If the Ninth Circuit interprets the identicality requirement as requiring a complete and exact copy, it would impose a high standard for the requirement and plaintiffs would likely be constrained in their ability to bring §1202(b) claims. If the court did this, the Github plaintiffs’ claims would likely fail as the alleged copied snippets of code generated by Copilot are not exact copies and do not comprise the complete copyrighted works. This hypothetical standard would be advantageous for individuals who remove CMI from copyrighted works in the course of processing them using AI as well as those who deploy AI systems that produce small portions of content similar (but not exactly so) to inputs.  So long as the works being processed or distributed are not complete exact copies, individuals would be free to alter the CMI of the works for ease in analyzing the copyrighted information. 

Alternatively, the Ninth Circuit could adopt a loose interpretation of identicality in which incomplete and inexact copying would be sufficient. One approach would be to require identicality but not copying of the entire work (something the plaintiffs in the Github suit advocate for). How the parties or the Ninth Circuit would formulate what standard would apply to this “less than entire” but still “near identical” standard is hard to say, but presumably, plaintiffs would have an easier time alleging facts sufficient for a §1202(b) claim. Applied to Github, it still seems unclear that the copied snippets of the plaintiffs’ code in the Copilot outputs could pass muster (this is likely a factual question to be determined at later stages of the litigation). But it could allow claims to at least survive an early motion to dismiss. As such, the adoption of this standard could limit how AI developers engage with works but also potentially affect others, such as researchers using similar techniques to process, clean, and distribute small portions of copyrighted works as part of a dataset.

Finally, the Ninth Circuit may decide to do away with the identicality requirement altogether. While this may seem like a potential boon to plaintiffs, who could allege that removal of CMI and distribution of some copied material, no matter how small, plaintiffs would still face substantial challenges.  Elimination of the identicality requirement would likely lead to greater weight being placed on the knowledge requirement in courts’ assessments of §1202(b) claims, which requires that defendants know or have reasonable grounds to know that their actions will “induce, enable, facilitate, or conceal an infringement.” In the context of the Github case, even without an identicality requirement, plaintiffs §1202(b) claims contain scant factual allegations about the defendants’ CMI removal and knowledge in the court filings to date. For other developers and users of AI, the effects of not having an identicality requirement would likely vary on a case-by-case basis. 

Conclusion

Recent copyright infringement suits and the pending appeal to the Ninth Circuit in Doe 1 v. Github demonstrate that §1202(b) is having its day in the sun. Although the provision has been overlooked and infrequently litigated in the past, the scope of protections granted by §1202(b) is important for understanding whether and how AI developers can remove CMI when using copyrighted works to process, restructure, and analyze copyrighted works for AI development. Thus, as lawsuits against AI developers and users continue to progress, the requirements to have a valid §1202(b) claim are sure to become even more contentious.

Text Data Mining Research DMCA Exemption Renewed and Expanded

Posted October 25, 2024
U.S. Copyright Office 1201 Rulemaking Process, taken from https://www.copyright.gov/1201/

Earlier today, the Library of Congress, following recommendations from the U.S. Copyright Office, released its final rule adopting exemptions to the Digital Millenium Copyright Act’s prohibition on circumvention of technological protection measures (e.g.,  DRM).  

As many of you know, we’ve been working closely with members of the text and data-mining community as well as our co-petitioners, the Library Copyright Alliance (LCA) and the American Association of University Professors (AAUP), to petition for renewal of the existing TDM research exemption and to expand it to allow researchers to share their research corpora with other researchers outside of their university (something not previously allowed). The process began over a year ago and followed an in-depth review process by the U.S. Copyright Office, and we’re incredibly grateful for the expert legal representation before the Office over this past year by UC Berkeley Law’s Samuelson Law, Technology & Public Policy Clinic, and in particular clinic faculty Erik Stallman, Jennifer Urban and Berkeley Law students Christian Howard-Sukhil, Zhudi Huang, and Matthew Cha.

We are very pleased to see that the Librarian of Congress both approved the renewal of the existing exemption and approved an expansion that allows for research universities to provide access to TDM corpora for use by researchers at other universities. 

The expanded rule is poised to make an immediate impact in helping the TDM researchers collaborate and build upon each other’s work. As Allison Cooper,  director of Kinolab and Associate Professor of Romance Languages and Literatures and Cinema Studies at Bowdoin College, explains:

“This decision will have an immediate impact on the ongoing close-up project that Joel Burges, Emily Sherwood, and I are working on by allowing us to collaborate with researchers like David Bamman, whose expertise in machine learning will be valuable in answering many of the ‘big picture’ questions about the close-up that have come up in our work so far.”

These are the main takeaways from the new rule: 

  • The exemption has been expanded to allow “access” to corpora by researchers at other institutions “solely for purposes of text and data mining research or teaching.” There is no more requirement that access be granted as part of a “collaboration,” so new researchers can ask new and different questions of a corpus. Access must be credentialed and authenticated.
  • The issue of whether a researcher can engage in “close viewing” of a copyrighted work has been resolved—as the explanation for the revised rule puts it, researchers can “view the contents of copyrighted works as part of their research, provided that any viewing that takes place is in furtherance of research objectives (e.g., processing or annotating works to prepare them for analysis) and not for the works’ expressive value.” This is a very helpful clarification!
  • The new rule also modified the existing security requirements, which provide that researchers must put in place adequate security protocols to protect TDM corpora from unauthorized reuse and must share information about those security protocols with rightsholders upon request. That rule has been limited in some ways and expanded in others. The new rule clarifies that trade associations can send inquiries on behalf of rightsholders. However, inquiries must be supported by a “reasonable belief” that the sender’s works are in a corpus being used for TDM research.

Later on, we will post a more in-depth analysis of the new rules–both TDM and others that apply to authors. The Librarian of Congress also authorized the renewal of a number of other rules that support research, teaching, and library preservation. Among them is a renewal of another exemption that Authors Alliance and AAUP petitioned for, allowing for the circumvention of digital locks when using motion picture excerpts in multi-media ebooks. 

Thank you to all of the many, many TDM researchers and librarians we’ve worked with over the last several years to help support this petition. 

You can learn more about TDM and our work on this issue through our TDM resources page, here.

Who Represents You in the AI Copyright Lawsuits? 

Posted October 16, 2024

Sara Silverman is the author of The Bedwetter, a comedy memoir.  Richard Kadrey wrote Sandman Slim, a fantasy novel series. Christopher Golden, a supernatural thriller titled Ararat. 

These authors might not seem to have much in common with an academic author who writes in history, physics, or chemistry. Or a journalist. Or a poet. Or, for that matter, me, writing this blog post.  And yet, these authors may end up representing us all in court. 

A large number of the recent AI copyright lawsuits are class action lawsuits. This means that these lawsuits are brought by a small number of plaintiffs who (subject to judicial approval) are granted the right to represent a much larger class. In many of the AI copyright lawsuits,  the proposed classes are extraordinarily broad, including many creators who might be surprised that they are being represented. If you live in the US and wrote something that was published online, there is a good chance that you are included in multiple of these classes. 

A very brief background on class action lawsuits

Class actions can be an efficient way of resolving disputes that involve lots of people, allowing for a single resolution that binds many parties when there are common interests and facts. As you can imagine, the class action mechanism can also attract misuse, for example, by plaintiffs (and their attorneys) who may seek large settlements on behalf of a large number of people. Those settlements may benefit the named plaintiffs and their attorneys but they aren’t really aligned with the interests of most class members. 

There are rules in place to prevent that kind of abuse.  In federal courts (where all copyright lawsuits must be brought), Rule 23 of the Federal Rule of Civil Procedure governs. It provides that:

“One or more members of a class may sue or be sued as representative parties on behalf of all members only if: 
(1) the class is so numerous that joinder of all members is impracticable; [“numerosity”]
(2) there are questions of law or fact common to the class; [“commonality”]
(3) the claims or defenses of the representative parties are typical of the claims or defenses of the class; [“typicality”] and
(4) the representative parties will fairly and adequately protect the interests of the class. [“adequacy”]”

The rest of Rule 23 contains a number of other safeguards to protect both class members and defendants. Among them are requirements that the court must certify that the class complies with rule 23,  that any proposed settlements be approved by the court,  and that class members receive notice of any proposed settlement and an opportunity to object. Additionally, there are a number of rules to ensure that the law firm bringing the suit can fairly and competently represent the class members. 

Class definition and class representatives in the copyright AI lawsuits

We believe it’s important for creators to pay attention to these suits because if a class is certified and that class includes those creators, the class representatives will have meaningful legal authority to speak on their behalf.  

Rule 23  provides that “at an early practicable time after a person sues,” the court must decide whether to certify the proposed class. Though we are now well over a year into some of the earliest suits filed, this has yet to happen. In the meantime what we have are proposed class definitions offered by plaintiffs. How broadly or narrowly a class is defined by the plaintiffs will be one of the most important factors in whether the class can be certified since it will directly affect the commonality of facts among the class, the typicality of claims, and whether the representatives can fairly and adequately represent the interests of the class. Plaintiffs have the burden of proving that they have satisfied Rule 23. 

In these AI lawsuits, we see some themes in terms of class representative and proposed classes, with many offering very broad class definitions. For example, in the now-consolidated In re OpenAI ChatGPT Litigation, the class representatives are 11 fiction writers of books such as The Cabin at the End of the World, The Brief Wondrous Life of Oscar Wao, What the Dead Know and others. 

They propose to represent a class defined as follows:  

“All persons or entities domiciled in the United States that own a United States copyright in any work that was used as training data for the OpenAI Language Models during the Class Period [defined as June 28, 2020 to the present].” 

This kind of broad “anyone with a copyright in a work used for training” approach to class definition is repeated in a few other suits. For example, the consolidated Kadrey v. Meta lawsuit has a similar (and overlapping) grouping of fiction author class representatives and an almost identical proposed class definition. Dubus v. NVIDIA is another suit that takes essentially the same approach. 

Other AI lawsuits have more variation in class representatives. Huckabee v. Bloomberg, for example, is another suit with a similar class definition (basically, all copyrighted works owned by someone in the US and used for training Bloomberg’s LLM) but with class representatives that are a bit different: mostly authors of religious books and of course, Mike Huckabee, a politician. 

There is at least one class action that is more precise both in terms of proposed class representatives and their relation to the proposed class definition. The now-consolidated Authors Guild v. OpenAI suit has some 28 proposed class representatives, most of whom are authors of best-selling fiction and non-fiction trade books, 14 of whom are members of the Authors Guild. In this suit, the plaintiffs propose two classes: one for fiction authors and one for non-fiction authors. It also places some restrictions around them: class members for fiction works must be “natural persons” who are “sole authors of, and legal or beneficial owners of Eligible Copyrights in” fictional works that were registered with the U.S. Copyright Office and used for training the defendants’ LLMs (and this includes persons who are beneficiaries of works held by literary estates). For nonfiction authors, class members are “[a]ll natural persons, literary trusts, and literary estates in the United States who are legal or beneficial owners of Eligible Nonfiction Copyrights’ which the complaint defines as works used to train defendants’ LLMs and that have an ISBN with the exception of any books classified as reference works (BISAC code REF). 

Some challenges and dangers
When you consider the scale and scope of materials used to train the AI models in question, you can immediately see some of the challenges that are likely to arise with relatively small groups of authors attempting to represent practically all individual U.S. copyright owners. 

While the exact training materials used for the models at issue remain opaque, it’s definitely true that they were not just trained on modern fiction. There is widespread acknowledgment that these models are trained on a large amount of content scraped from across the internet using data sources such as Common Crawl. This, in effect, means that these suits implicate the rights of millions of rights holders, with interests as diverse as those of YouTube content creators, computer programmers, novelists, academics, and more. 

How can these representatives fairly and adequately represent such a broad and diverse group–especially when many may disagree with the underlying motivations for the suit to begin with–is a tough question. Even the Authors Guild consolidated case, which is much more careful in terms of class definition, includes classes that are breathtakingly broad when one considers the diversity of authorship within them. The fiction author class, for example, could include everyone from NY Times bestselling authors to fan fiction writers. The nonfiction class, which is at least limited to nonfiction book authors of works assigned an ISBN, could similarly include everyone from authors of popular self-help books distributed by the millions to scholarly books with print runs in the low hundreds and distributed online on open-access terms. The interests, financial and otherwise, of those authors can vary significantly. 

Beyond the adequacy of representatives (along with questions about whether their experiences are really typical of others in the proposed class), there are other challenges unique to copyright law, for example, the opaque nature of ownership (there is no official public record of who owns what), making ascertaining who actually falls within the class an initial challenge. Compounding that, there are a dizzying variety of unique terms under which works are distributed online, some of which may afford AI developers a viable defense for many works. A fair use defense also requires some level of assessment of the nature of the works used, a fact-intensive inquiry that will vary from one work to another. This just scratches the surface of some of the issues that likely mean there really aren’t common questions of law or fact among the class. 

Conclusion
There are good reasons to think that the classes as currently defined in these lawsuits are too broad. For some of the reasons mentioned above, I think it will be difficult for courts to certify them as is. But this doesn’t mean authors and other rightsholders should sit back and assume that their interests won’t be co-opted by others in these suits who seek to represent them. We don’t know when the courts will actually address these class certification issues in these suits. When they do, it will be important for authors to speak up. 

Artist Left with Heavy Fees by Copyright Troll Law Firm

Posted October 11, 2024

Facts of the Case & Fair Use

On September 18, the 5th Circuit decided in Keck v. Mix Creative Learning Center that using copyrighted artwork to teach children how to make art in a similar style does not constitute copyright infringement. The case adds to the well-developed jurisprudence that teaching with copyrighted materials is often protected by fair use.

This case was initially filed in 2021 by plaintiff’s counsel, Mathew Kidman Higbee, a known and prolific copyright litigation firm sometimes accused of troll-like behavior.  During the pandemic, the defendant sold a total of six art kits (out of the six kits sold, two were purchased by the plaintiff) that included images of the plaintiff’s dog-themed artworks, biographical information, and details on her artistic styles. Additionally, the kit included paint, paintbrushes, and collage paper. The plaintiff’s side argued that including the artworks in teaching kits constituted willful copyright infringement and therefore demanded $900,000 in damages—to make up for the $250 the defendant made in sales. 

The district court dismissed all infringement claims in 2022; and last month, the 5th Circuit court affirmed that including copies of plaintiff’s artwork in a teaching kit is fair use. 

The courts found the first and fourth fair use factors to favor the defendant. Under the first factor, even though the defendant’s use was commercial in nature, by accompanying the artworks with art theory and history, the teaching kit transformed the original decorative purpose of the dog-themed artworks. The 5th Circuit distinguished this case from Warhol by pointing out that, in the Warhol case, the infringing use served the same illustrative purpose as the original work, while in this case, “the art kits had educational objectives, while the original works had aesthetic or decorative objectives.”  

Under the fourth factor, courts explained that they cannot imagine how the market value of plaintiff’s dog-themed artworks could decrease when included in children’s art lesson kits. The 5th Circuit Court further pointed out that there was no evidence that a market for licensing artworks for similar teaching kits exists now or is ever likely to develop. 

Because these “two most important” factors favored the defendant, the defendant’s use was fair use.

Fee Shifting: Plaintiffs Beware of Copyright Troll Law Firms!

The final outcome of the case: the plaintiff was ordered to cover $102,404 in fees and $165.72 in costs for the defendant.

Even though we are happy for the defendant and her counsel that, after a prolonged legal battle, this well-deserved victory is finally won, it is nevertheless disheartening to see the plaintiff-artist left alone in the end to face the high legal fees of this ill-conceived lawsuit. The plaintiff’s counsel not only failed to advise the plaintiff to act in her own best interest (whether it is to settle the case at the right moment or to pursue more plausible claims), but also conjured up willful infringement claims that were clearly meritless to any trained eye. Even the 5th Circuit Court lamented over this in its opinion, as it begrudgingly upheld the district court’s decision based on the abuse of discretion standard it must follow:

It is troubling that Keck alone will be liable for the high fees incurred by Defendants largely because of Higbee & Associates’ overly aggressive litigation strategy. From our review of the record, the law firm lacked a firm evidentiary basis to pursue hundreds of thousands of dollars in statutory damages against Defendants for willful infringement. Nevertheless, we cannot say, on an abuse of discretion standard, that the district court erred by determining that there was insufficient evidence that the firm’s conduct was both unreasonable and vexatious. … But we warn Higbee & Associates that future conduct of this nature may well warrant sanctions, and nothing in this opinion prevents Higbee & Associates from compensating its client, if appropriate, for the fees that she is now obliged to pay Defendants.

This should serve as a cautionary tale for would-be plaintiffs: copyright lawsuits, like any other type of litigation, are primarily meant to address the damages plaintiffs actually suffered, and the final settlement should make plaintiffs whole again—that is, as if no infringement has ever occurred. Copyright lawsuits (or the threat to sue) should not be undertaken as a way to create brand new income streams, such as was the case in the lawsuit described above. 

When someone aggressively enforces dubious copyright claims with the sole purpose of collecting exorbitant fees rather than protecting any underlying copyrights, they are called a “copyright troll.” Regrettably, beyond the disreputable law firms that are enthused to pursue aggressive claims, many services now exist to tempt creators into troll-like behavior by promising “new licensing income.” The true aim of these services is solely to collect high representation charges from creators, when users of the creators’ works are harassed into paying exorbitant settlements. Many victims often agree to pay just for the nuisance to stop. This predatory business model has been repeatedly exposed by creators and authors, including famously by Cory Doctorow

Needless to say, copyright trolls are harmful to the copyright ecosystem. Obviously, innocent users are harmed when slapped with unreasonable demand letters or even frivolous lawsuits. Worse, creators are misled into supporting this unethical practice while deluded into believing they are doggedly following the spirit of the law—sometimes, as was in this case, they are left to face the inevitable consequences of bringing a frivolous lawsuit, while the lawyer or agent that originally led them into the mire gets off free, upward and onward to their next “representation.” 

It was very unfortunate that the district court did not fully study the plaintiff’s counsel’s track record and issue appropriate disciplinary orders against him. The problem of copyright trolls will have to be addressed soon in order to preserve a healthy copyright system. 

What is “Derivative Work” in the Digital Age?

Posted October 7, 2024
on the top, Seltzer v. Green Day; on the bottom, Kienitz v. Sconnie Nation

Part I: The Problem with “Derivative Work”

The right to prepare derivative works is one of the exclusive rights copyright holders have under §106 of the Copyright Act. Other copyright holders’ exclusive rights include the right to make and distribute copies, and to display or perform a work publicly. 

Lately, we’ve seen a congeries of novel conceptions about “derivative works.” For example, a reader of our blog stated that when looking at AI models and AI outputs, works should be considered infringing “derivatives” even when there is no substantial similarity between the infringing AI model/outputs and the ingested originals. Even in the courts, we’ve seen confusion, for example, Hachette v. Internet Archive presented us with the following statement about derivative works:

Changing the medium of a work is a derivative use rather than a transformative one. . . . In fact, we have characterized this exact use―“the recasting of a novel as an e-book”―as a “paradigmatic” example of a derivative work. [citation omitted; emphasis added]

These statements leave one to wonder—what is a copy, a derivative work, an infringing use, and a transformative fair use in the context of U.S. copyright law? In order to have some clarity on these questions, it’s helpful to juxtapose “derivative works” first with “copies” and then with “transformative uses.” We think the confusion about derivative work and its related concepts arises out of using the phrase to mean “a work that is substantially similar to the original work” as well as “a work that is so in an unauthorized way, not excused from liabilities.”

There are many immediate real world implications for confusion over the meaning of “derivative work.” In privately negotiated agreements, licensees who have a right to make reproductions but not derivative works may be confused as to what medium their use is restricted to. For example, a publisher of a book with a license that allows it to make reproductions but not derivatives might be confused as to whether, under the Hachette court’s reasoning, it is allowed to republish a print book in a digital format such as a simple PDF of a scan. Similarly, for public licenses, such as the CC ND licenses, where a licensor stipulates restriction on the creation of derivative works, it causes confusion for downstream users whether, say, changing a pdf into a Word document is allowed. 

This is also an important topic to explore both in the recent hot debates over Controlled Digital Lending and generative artificial intelligence, as well as in an author’s everyday work—for instance, would quoting someone else’s work make your article/book a derivative work of the original? 

Part II: “Copies” and “Derivatives”

Our basic understanding of derivative works comes from the 1976 Copyright Act. The §101 definition tells us:

A “derivative work” is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a “derivative work”.

The U.S. Copyright Office published Circular 14 gives some further helpful guidance as to what a §106 derivative work would look like:

To be copyrightable, a derivative work must incorporate some or all of a preexisting “work” and add new original copyrightable authorship to that work. The derivative work right is often referred to as the adaptation right. The following are examples of the many different types of derivative works: 

  • A motion picture based on a play or novel 
  • A translation of an novel written in English into another language
  • A revision of a previously published book 
  • A sculpture based on a drawing 
  • A drawing based on a photograph 
  • A lithograph based on a painting 
  • A drama about John Doe based on the letters and journal entries of John Doe 
  • A musical arrangement of a preexisting musical work 
  • A new version of an existing computer program 
  • An adaptation of a dramatic work 
  • A revision of a website

One immediate observation that can be made from reading these, is that “ebook” or “digitized version of a work” is not listed as, nor similar to any of the exemplary derivative works in the Copyright Act or the Copyright Office Circular. By contrast, “ebook” or “digitized version of a work” seems to fit much better under the § 101 definition of “copies”:

“Copies” are material objects, other than phonorecords, in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device. The term “copies” includes the material object, other than a phonorecord, in which the work is first fixed.

The most crucial difference between a “copy” and a “derivative work” is whether new authorship is added. If no new authorship is added, merely changing the material that the work is fixed on does not create a new copyrightable derivative work. This, in fact, is observed by many courts before Hachette. For example, in Corel v. Bridgeman Art Gallery, the court unequivocally held that there is no new copyright granted to photos of public domain paintings. 

Additionally, as we know from Feist v. Rural Tel., “[t]he mere fact that a work is copyrighted does not mean that every element of the work may be protected.” Copyright protection is only limited to the original elements of a work. We cannot call a work “derivative” of another if it does not incorporate any copyrightable elements from the original copyrighted work. For example, the “Game Genie” device, which let players change elements of a Nintendo game, was not found to be a derivative work by the court because it didn’t incorporate any part of the Nintendo game. 

It is clear from this examination that sometimes a later-created work is a copy, sometimes a derivative, and sometimes it may not implicate any of the exclusive rights of the original.

Part III: “Derivative” and “Transformative” Works

Let’s quickly recap the context in which courts are confusing “derivative” and “transformative” works—

A prima facie case of copyright infringement requires the copyright holder to prove (1) ownership of a valid copyright, and (2) inappropriate copying of original elements. We will not go into more details here, but essentially, the inappropriate copying prong requires plaintiffs to assert and prove defendant’s access to the plaintiff’s work as well as a level of similarity between the works in question that shows improper appropriation of the plaintiff’s work. If the similarity between the defendant’s work and protectable elements in the plaintiff’s work is minimal, then there is no infringement. As seen in the  “Game Genie” example above, courts can rely on substantial similarity analysis to determine whether a work is indeed a potentially-infringing copy or derivative of the plaintiff’s work.

Once the plaintiff establishes a prima facie infringement case—e.g., the defendant’s work is shown to be a derivative or a copy of the plaintiff’s registered work—the defendant may still nevertheless be free to make the use if the use falls outside the ambit of the copyright holder’s §106 rights, such as uses that are fair use. Whether a work is a derivative work under § 106 is no longer a relevant inquiry after establishing a prima facie case: this point is starkly obvious when looking at the many plausible defenses a defendant can raise (including fair use) where even the verbatim copying of a work is authorized by law. 

As the court stated in Authors Guild v. Hathitrust, “there are important limits to an author’s rights to control original and derivative works. One such limit is the doctrine of ‘fair use,’ which allows the public to draw upon copyrighted materials without the permission of the copyright holder in certain circumstances.” When a prima facie infringement case is already established, yet a court still discusses whether the defendant’s work is a “derivative work,” at a minimum, the court adds confusion by beyond the § 101  definition of a derivative work. 

In fact, a distinct new significance is being given to “derivative work” in recent years in the context of the “purpose and character” factor of fair use, specifically, when analyzing if a use has a transformative purpose. The shift in a word’s meaning or a concept is not per se unimaginable or objectionable. It is misguided to consider the copyright legal landscape static. As law professor Pamela Samuelson pointed out, before the mid-19th century, most courts did not even think copyright holders were entitled to demand compensation from others preparing derivative works. The 1976 Copyright Act finally codified copyright holders’ exclusive right to prepare derivative works. And, now, some rights holders want the courts to say there are categorical derivative uses that can never be considered fair use.

The Hachette court is among those that have unfortunately bought into this novel approach. The court seems not only to misconstrue the salient distinction between a ‘copy of a work’ and a ‘derivative work’, they appear to give heightened protections to works they now define as ‘derivative’. If this misconception becomes widespread, we will be living in a world where if a use is new-derivative, then it is never transformative (and, if it is not transformative, it is likely not fair). Ultimately, it is purely circular for a court to say that the reason for denying the fair use defense is that the use is derivative. When we buy into this setup of “derivative v.s. transformative,” it is difficult to ever say with confidence that a work is transformative, because at the same time we remember how a transformative use should often fit in the actual definition of derivative work under § 101, “derivative”—just like the Green Day rendition of the plaintiff’s art in Seltzer v. Green Day.  

Clearly, if we take “derivative work” at its true § 101 definition, out of all potentially infringing works, “transformative fair use” is not an absolute complement, but a possible subset, of derivative works. We know from Campbell v. Acuff-Rose that “transformativeness is a matter of degree, not a binary;” whereas no such sliding scale is plausible for derivative works. A work is either a derivative or it is not: there’s never a “somewhat derivative” work in copyright. All in all, it makes little sense to frame the issues as “transformative v.s. derivative work”—such discussions inevitably buy into the rhetorics of copyright expansionists. We have already warned the court in Warhol against the danger of speaking heedlessly about derivative works in the context of fair use. We must ensure that the “derivative v.s. transformative” dichotomy does not come to dominate future discussions of fair use, so that we conserve the utility and clarity of the fair use doctrine.

The expansion of the relevance of “derivative work” beyond the establishment of a prima facie infringement case not only creates a circular reasoning for denying fair use, but also makes it impossible to make sense of the case law we have accumulated on fair use. Take Seltzer v. Green Day for example, the court held that a work can be transformative even if that work “makes few physical changes to the original.” The Green Day concert background art with a red cross superimposed was found to be a fair use of the original street art—a classic example of how a prima facie infringing derivative work can nevertheless be a transformative, and thus fair, use. Similarly, in Kienitz v. Sconnie Nation, a derivative use of a photo on a tshirt was found to be a fair use. Ideas and concepts, including “derivative works,” are only important to the extent they elucidate our understanding of the world. When the use of “derivative works” leads to more confusion than clarity, we should be cautious in adopting the new meaning being superimposed on “derivative works.”

Antitrust Lawsuit Filed Against Large Academic Publishers

Posted September 17, 2024

On September 12, a San Francisco-based law firm filed an antitrust lawsuit on behalf of UCLA professor Lucina Uddin against six prominent academic publishers and the trade association that represents them: Elsevier, John Wiley & Sons, Sage Publications, Springer Nature, Taylor & Francis, Wolters Kluwer, and the International Association of Scientific, Technical, and Medical Publishers (“STM”). The suit is brought on behalf of a class that it defines as “All natural persons residing in the United States who performed peer review services for, or submitted a manuscript for publication to, any of the Publisher Defendants’ peer-reviewed journals from September 12, 2020 to the present.” The complaint lists just one claim for relief: that “Publisher Defendants and their co-conspirators entered into and engaged in unlawful agreements in restraint of the trade and commerce described above in violation of Section 1 of the Sherman Act, 15 U.S.C. § 1.” 

To support this claim, the plaintiff makes three key allegations. Namely, that the publishers have illegally agreed amongst each other to abide by: 

  1. a “Single Submission Rule,” where researchers are only allowed to submit a manuscript to one journal for consideration unless the journal rejects it;
  2. a “Unpaid Peer Review Rule,” where journals implement policies to not compensate peer reviewers for their labor; and
  3. a “Gag Rule,” where researchers are not allowed to share or discuss their manuscript once they have submitted it to a journal for consideration before the journal publishes it.

Why would any of these actions constitute an antitrust violation? We thought a little background could be helpful: 

To understand this lawsuit, we must first consider the purpose of U.S. antitrust law. The fundamental goal of antitrust law is to encourage competition and ultimately to promote consumer welfare. The Supreme Court explains that: “Congress designed the Sherman Act as a consumer welfare prescription.” 

Section 1 of the Sherman Antitrust Act does this by prohibiting “[e]very contract, combination in the form of trust or otherwise, or conspiracy, in restraint of trade.” This generally requires proving two things: (1) some sort of agreement or business arrangement, and (2) that this agreement is “in restraint of trade,” i.e., unreasonably harmful to competition.  

Proving an agreement can sometimes be a complicated factual question, though often there are good clues, especially when joint activity is coordinated through a trade association (antitrust lawyers love to quote Adam Smith on trade associations: “People of the same trade seldom meet together, even for merriment and diversion, but the conversation ends in a conspiracy against the public, or in some contrivance to raise prices.”)  In this case, the plaintiff says that the agreements are so obvious that they are in fact published openly in several portions of the STM’s “International Ethical Principles for Scholarly Publication” which are then implemented and enforced by each publisher.  

Proving the second part, that the agreement is in restraint of trade or unreasonably harmful to competition can be more complicated. The courts have developed three different analytical frameworks for evaluating whether conduct harms competition in this context:

  1. A “per se” rule for agreements that are “nakedly” anticompetitive. Examples include agreements to fix prices (what’s alleged here, at least with respect to payment for peer review), bid-rigging, agreements to divide markets, and a few other kinds of less common agreements. 
  2. A “rule of reason” test—which applies in most cases—that weighs the pro-competitive effects of the agreement against anti-competitive effects and prohibits agreements only when the anti-competitive harms outweigh the benefits.
  3. “Intermediate” or “quick-look” scrutiny, which covers a small number of cases in which agreements look suspicious on their face but are not so obviously anticompetitive as to fall under the “per se” rule. 

The plaintiff’s complaint claims that the publishers’ agreements violate the Sherman Act regardless of which of these three tests apply.  But often, some of the most significant battles in antitrust lawsuits are about which of these standards apply since the costs associated with litigating a suit can change dramatically depending on which is used. If the court accepts that the “per se” standard applies, the plaintiff likely wins. If the “quick-look” rule applies, the burden is on the defendant to show that its conduct is not anticompetitive. But if the “rule of reason” standard applies, the suit will likely involve extensive discovery, expert witnesses, and other factual evidence about what exactly the market is, whether the defendants had sufficient market power to negatively affect competition, and whether the agreement would negatively or positively affect competition. 

In these cases, defining the relevant market is often crucial. In most antitrust cases, defining the relevant market involves identifying substitutes for the product under review. On this point, the plaintiff argues:  

 “Publication by peer-reviewed journals is a relevant antitrust market. For scholars who seek to communicate their scientific research, there is no adequate substitute for publication in a peer-reviewed journal. Peer-reviewed journals establish the validity of scientific research through the peer review process, communicate that research to the scientific community, and avoid competing claims to the same scientific discovery.”

For market definition, it is important to include only close substitutes and exclude those that are distant. For academic publishing, even though there are many ways authors can share their manuscripts—from emailing their colleagues, to posting on their personal websites, to publishing with upstart new journals—the fact remains that the journals in question are the primary means of dissemination and are the most heavily read and heavily cited. So at first glance, the complaint seems realistic in its formulation of the market at issue. 

The complaint then goes on to argue that the publishers hold significant power in this market and misuse that power, touching on familiar themes such as how these academic publishers extract significant profits, charge high rates for access and increasingly high fees for authors to publish openly, and so on. The plaintiffs allege that this market power has allowed the publishers to make agreements amongst each other (the three allegations noted above) in ways that allow them to maximize profits while also maintaining their market power. We note that it’s true that there may be some other explanations for these practices—in fact, some authors may be proponents of some of them, for example, the single-submission rule. But even if there are other explanations for these rules, with the current arrangement agreed through STM, the allegation is that no member publisher will even try to compete or develop other approaches that may drive up the price for peer reviewers, compete for placement of papers, etc.

How this lawsuit will turn out is hard to predict. This lawsuit flags some very problematic practices enforced on the academic publishing industry by prominent publishers. But there are many other problems with the academic publishing industry not discussed in the complaint. We’ve long thought that the public-interest nature of academic research and publishing is complicated when paired with commercial publishers who have strong incentives to maximize profits. Of course, even for-profit firms are expected to operate within the law; profit-maximizing to the point of adopting anti-competitive practices is fundamentally at odds with their essential social responsibilities.

Hachette v. Internet Archive Update: Second Circuit Court of Appeals Rules Against Internet Archive

Posted September 5, 2024

We got a disappointing decision yesterday from the Second Circuit Court of Appeals in the long-running Hachette v. Internet Archive (IA) copyright lawsuit about IA’s digitization and lending of books. The Court affirmed the district court’s decision that IA cannot circulate digital copies of books they have legitimately acquired in physical copies, even when only the same number of copies as legitimately acquired are circulated to a single user at a time—just as a physical book would be loaned.

The Court, focusing on IA’s lending of digitized books that were available for license as ebooks from the publishers, concluded that IA’s fair use defense fails. We think this decision will result in a meaningful reduction in access to knowledge. This is sad news for many authors who have relied on IA’s Open Library for research and discovery, and  for readers who have used Open Library to find authors works. However, we also view it as a decision limited to its facts—that is, IA’s particular implementation of controlled digital lending (CDL), and more specifically, its lending of books that are already available in licensed digital formats. 

We plan to do a more in-depth analysis of the Court’s decision later, but for now, we offer some initial thoughts. First, there are a couple of bright spots in the opinion: 

1) The Court rejected the district court’s conclusion that IA was engaged in commercial use when looking at the first factor of fair use. The publishers argued IA’s lending of digitized books was commercial in nature because IA received a few thousand dollars from a for-profit used-bookseller and also solicited donations on its website. The Court rightly pointed out that if that was the standard, virtually every nonprofit that solicits donations would by default only be able to engage in commercial use. This was an issue we and others strongly urged the Court to address, and we’re glad it did. 

2)  For the most part, the Court focused its analysis on the facts of the case, which was really about IA lending digitized copies of books that were already available in ebook form and licensable from the publishers. The legal analysis in several places turned on this fact, which we think leaves room to make fair use arguments regarding programs to digitize and make available other books, such as print books for which there is no licensed ebook available, out-of-print books, or orphan works. CDL will remain an important framework, especially considering the lack of an existing digital first-sale doctrine.  

We are also disappointed by several key points in the decision: 

One was the Court’s assessment of the first fair use factor, “purpose and character of the use.” The Court’s analysis of this factor was in some ways unsurprising but nevertheless disappointing. The Court did little more than conclude that the use was not transformative and, therefore, not fair use. Though we think there are strong arguments that CDL is transformative, whether CDL is “transformative” is just one of the supporting rationales for the argument that CDL is fair use. The other justifications—that CDL supports teaching, scholarship, and research, along with complementing the first sale doctrine and supporting the public-interest mission of libraries—are at the heart of CDL. The Court didn’t engage with those other arguments at all and also ignored meaningful discussion of cases where non-transformative copying supported a fair use finding because of the public benefits.

A second key issue is about whether IA’s digital lending negatively impacts the market for the original works. This issue probably deserves a whole blog post to itself, but in short the analysis came down to who shoulders the burden of proving or disproving market harm, and what default assumptions the court has about market harm.  The following quotes from the decision will give you a sense of how the Court analyzed the issue: 

[a]lthough they do not provide empirical data of their own, Publishers assert that they (1) have suffered market harm due to lost eBook licensing fees and (2) will suffer market harm in the future if IA’s practices were to become widespread.  IA argues that Publishers cannot rely on the “common-sense inference” of market harm without data to back that up, citing American Society for Testing & Materials v. Public.Resource.Org, Inc. [citations omitted]. . . . We agree with Publishers’ assessment of market harm. 

Despite IA’s experts having offered meaningful data and analysis indicating a lack of market harm on sales of publishers’ books, the Court went on to say: 

We are likewise convinced that “unrestricted and widespread conduct of the sort engaged in by [IA] would result in a substantially adverse impact on the potential market for [the Works in Suit]. . . . Though Publishers have not provided empirical data to support this observation, we routinely rely on such logical inferences where appropriate in assessing the fourth fair use factor. . . . Thus, we conclude it is “self-evident” that if IA’s use were to become widespread, it would adversely affect Publishers’ markets for the Works in Suit.

We are also disappointed by how the Court portrayed the overall public benefit of IA’s lending and its long-term effect: “while IA claims that prohibiting its practices would harm consumers and researchers, allowing its practices would―and does―harm authors.” We think this is a gross generalization and mischaracterization of how IA’s digital lending affects most authors. Authors are researchers. Authors are readers. IA’s digital library helps authors create new works and supports their interests in having their works read. This ruling may benefit the largest publishers and most prominent authors, but for most, it will end up harming more than it will help. 

The AI Copyright Hype: Legal Claims That Didn’t Hold Up

Posted September 3, 2024

Over the past year, two dozen AI-related lawsuits and their myriad infringement claims have been winding their way through the court system. None have yet reached a jury trial. While we all anxiously await court rulings that can inform our future interaction with generative AI models, in the past few weeks, we are suddenly flooded by news reports with titles such as “US Artists Score Victory in Landmark AI Copyright Case,” “Artists Land a Win in Class Action Lawsuit Against A.I. Companies,” “Artists Score Major Win in Copyright Case Against AI Art Generators”—and the list goes on. The exuberant mood in these headlines mirror the enthusiasm of people actually involved in this particular case (Andersen v. Stability AI). The plaintiffs’ lawyer calls the court’s decision “a significant step forward for the case.” “We won BIG,” writes the plaintiff on X

In this blog post, we’ll explore the reality behind these headlines and statements. The “BIG” win in fact describes a portion of the plaintiffs’ claims surviving a pretrial motion to dismiss. If you are already familiar with the motion to dismiss per Federal Rules of Civil Procedure Rule 12(b)(6), please refer to Part II to find out what types of claims have been dismissed early on in the AI lawsuits. 

Part I: What is a motion to dismiss?

In the AI lawsuits filed over the last year, the majority of the plaintiffs’ claims have struggled to survive pretrial motions to dismiss. That may lead one to believe that claims made by plaintiffs are scrutinized harshly at this stage. But that is far from the truth. In fact, when looking at the broader legal landscape beyond the AI lawsuits, Rule 12(b)(6) motions are rarely successful.

In order to survive a Rule 12(b)(6) motion to dismiss filed by AI companies, plaintiffs in these lawsuits must make “plausible” claims in their complaint. At this stage, the court will assume that all of the factual allegations made by the plaintiffs are true and interpret everything in a way most favorable to plaintiffs. This allows the court to focus on the key legal questions without getting caught up in disputes about facts. When courts look at plaintiffs’ factual claims in the best possible light, if the defendant AI companies’ liability can plausibly be inferred based on facts stated by plaintiffs, then the claims will survive a motion to dismiss. Notably, the most important issues at the core of these AI lawsuits—namely, whether there has been direct copyright infringement and what may count as a fair use—are rarely decided at this stage, because these claims raise questions about facts as well as the law. 

On the other hand, if the AI companies will prevail as a matter of law even when the plaintiffs’ well-pleaded claims are taken as entirely true, then the plaintiffs’ claims will be dismissed by court. Merely stating that it is possible that the AI companies have done something unlawful, for instance, will not survive a motion to dismiss; there must be some reasonable expectation that evidence can be found later during discovery to support the plaintiffs’ claims. 

Procedurally, when a claim is dismissed, the court will often allow the plaintiffs to amend their complaint. That is exactly what happened with Andersen v. Stability AI (the case mentioned at the beginning of this blog post): the plaintiffs’ claims were first dismissed in October last year, and the court allowed the plaintiffs to amend their complaint to address the deficiencies in their allegations. The newly amended complaint contains infringement claims that survived new motions to dismiss, as well as other breach of contract, unjust enrichment, and DMCA claims that again were dismissed.

As you may have guessed, including something like the “motion to dismiss” in our court system can help save time and money, so parties don’t waste precious resources on meritless claims at trial. One judge dismissed a case against OpenAI earlier this year, stating that “the plaintiffs need to understand that they are in a court of law, not a town hall meeting.” The takeaway: plaintiffs need to bring claims that can plausibly entitle them to relief.

Part II: What claims are dismissed so far?

Most of the AI lawsuits are still at an early stage, and most of the court rulings we have seen so far are in response to the defendants’ motions to dismiss. From these rulings, we have learned which claims are viewed as meritless by courts. 

The removal of copyright management information (“CMI,” which includes information such as the title, the copyright holder, and other identifying information in a copyright notice) is a claim included in almost all plaintiffs’ complaints in the AI lawsuits, and this claim has failed to survive motions to dismiss without exception. DMCA Section 1202(b) restricts the intentional, unauthorized removal of CMI. Experts initially considered DMCA 1202(b) one of the biggest hurdles for non-licensed AI training. But courts so far have dismissed all DMCA 1202(b) claims, including in J. Doe 1 v. GitHub, Tremblay v. OpenAI, Andersen v. Stability AI, Kadrey v. Meta Platforms, and Silverman v. OpenAI. The plaintiffs’ DMCA Section 1202(b)(1) claims have failed because plaintiffs were not able to offer any evidence showing their CMI has been intentionally removed by the AI companies. For example, in Tremblay v. OpenAI and Silverman v. OpenAI, the courts held that the plaintiffs did not argue plausibly that OpenAI has intentionally removed CMI when ingesting plaintiffs’ works for training. Additionally, plaintiffs’ DMCA Section 1202(b)(3) have failed thus far because the plaintiffs’ claims did not fulfill the identicality requirement. For example, in J. Doe 1 v. GitHub, the court pointed out that Copilot’s output did not tend to represent verbatim copies of the original ingested code. We now see plaintiffs voluntarily dropping the DMCA claims in their amended complaints, such as in Leovy v Google (formerly J.L. vs Alphabet). 

Another claim that has been consistently dismissed by courts is that AI models are infringing derivative works of the training materials. The law defines a derivative work as “a work based upon one or more preexisting works, such as a translation, musical arrangement, … art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted.” To most of us, the idea that the model itself (as opposed to, say, outputs generated by the model) can be considered a derivative work seems to be a stretch. The courts have so far agreed. On November 20, 2023, the court in Kadrey v. Meta Platforms said it is “nonsensical” to consider an AI model a derivative work of a book just because the book is used for training. 

Similarly, claims that all AI outputs should be automatically considered infringing derivative works have been dismissed by courts, because the claims cannot point to specific evidence that an instance of output is substantially similar to an ingested work. In Andersen v. Stability AI, plaintiffs tried to argue “that all elements of [] Anderson’s copyrighted works [] were copied wholesale as Training Images and therefore the Output Images are necessarily derivative;” the court dismissed the argument because—besides the fact that plaintiffs are unlikely able to show substantial similarity—“it is simply not plausible that every Training Image used to train Stable Diffusion was copyrighted [] or that all [] Output Images rely upon (theoretically) copyrighted Training Images and therefore all Output images are derivative images. … [The argument for dismissing these claims is strong] especially in light of plaintiffs’ admission that Output Images are unlikely to look like the Training Images.”

Several of these AI cases have raised claims of vicarious liability—that is, liability for the service provider based on the actions of others, such as users of the AI models. Because a vicarious infringement claim must be based on a showing of direct infringement, the vicarious infringement claims are also dismissed in Tremblay v. OpenAI and Silverman v. OpenAI, when plaintiffs cannot point to any infringing similarity between AI output and the ingested books.

Many plaintiffs have also raised a number of non-copyright, state law claims (such as negligence or unfair competition) that have largely been dismissed based on copyright preemption. Copyright preemption prevents duplicitous state law claims when those state law claims are based on an exercise of rights that are equivalent to those provided for under the federal Copyright Act. In Andersen v. Stability AI, for example, the court dismissed the plaintiffs’ unjust enrichment claim because the plaintiffs failed to add any new elements that would distinguish their claim based on California’s Unfair Competition Law or common law from rights under the Copyright Act.

It is interesting to note that many of the dismissed claims in different AI lawsuits closely mimic one another, such as in Kadrey v. Meta Platforms, Andersen v. Stability AI, Tremblay v. OpenAI, and Silverman v. OpenAI. It turns out that the similarities are no coincidence—all these lawsuits are filed by the same law firm. These mass-produced complaints not only contain overbroad claims that are prone to dismissal, they also have overbroad class designations. In the next blog post, we will delve deeper into the class action aspect of the AI lawsuits. 

Authors Alliance and SPARC Supporting Legal Pathways to Open Access for Scholarly Works

Posted August 27, 2024

Authors Alliance and SPARC are excited to announce a new collaboration to address critical legal issues surrounding open access to scholarly publications. 

One of our goals with this project is to clarify legal pathways to open access in support of federal agencies working to comply with the Memorandum on “Ensuring Free, Immediate, and Equitable Access to Federally Funded Research,” (the “Nelson Memo”) which was issued by the White House’s Office of Science and Technology Policy in 2022. For more than a decade, federal open access policy was based on an earlier memo instructing federal agencies with research and development budgets over $100 million to make their grant-funded research publicly accessible for free online. The Nelson Memo, drawing from lessons learned during the COVID-19 Pandemic, provides important updates to the prior policy. Among the key changes are extending the requirements to all agencies, regardless of budget, and eliminating the 12-month post-publication embargo period on articles. 

The Nelson Memo raises important legal questions for agencies, universities, and individual researchers to consider. To help ensure smooth implementation of the Nelson Memo, we plan to produce a series of white papers addressing these questions. For example, a central issue is the nature and extent of the pre-existing license, known as the “Federal Purpose License,” which all federal grant-making agencies have in works produced using federal funds.  The white papers will outline the background and history of the License, and also address commonly raised questions, including whether the License would support the application of Creative Commons or other public licenses; possible constitutional or statutory obstacles to the use of the License for public access; whether the License may apply to all versions of a work; and whether the use of the License for public access would require modification of university intellectual property policies. 

In addition to the white paper series, we plan to convene a group of experts to update the SPARC Author Addendum. The Addendum was created in 2007 and has been an extremely useful tool in educating authors on how to retain their rights, both to provide open access to their scholarship and to allow for wide use of their work. However, in the nearly two decades since its creation, models for open access and scholarly publishing have changed dramatically. We aim to update the Addendum to more closely reflect the present open access landscape and to help authors to better achieve their scholarship goals.

A final piece of the project is to develop a framework for universities looking to recover rights for faculty in their works, particularly backlist and out-of-print books that are unavailable in electronic form. Though the open access movement has made significant strides in advancing free availability and reuse of scholarly articles, that progress has generally not extended to books and other monographic works, in part because of the non-standard and often complicated nature of book publishing licenses. It has also not done as much to open backfile access to older journal articles. We think a framework for identifying opportunities to recover rights and relicense them under an open access license will help advance open access of these works.

Eric Harbeson

The project will be spearheaded by Eric Harbeson, who joined the Authors Alliance this week as Scholarly Publications Legal Fellow. Eric is a recent graduate of the University of Oregon School of Law. Prior to law school, Eric had a dual career as a librarian/archivist and a musicologist. Eric did extensive work advocating for libraries’ and archives’ copyright interests, especially with respect to preservation of music and sound recordings. Eric’s publications include a well-regarded report on the Music Modernization Act, as well as two scholarly music editions. Eric can be reached at eric@authorsalliance.org.

Clickbait arguments in AI Lawsuits (will number 3 shock you?)

Posted August 15, 2024

Image generated by Canva

The booming AI industry has sparked heated debates over what AI developers are legally allowed to do. So far, we have learned from the US Copyright Office and courts that AI created works are not protectable, unless it is combined with human authorship. 

As we monitor two dozen ongoing lawsuits and regulatory efforts that address various aspects of AI’s legality, we see legitimate legal questions that must be resolved. However, we also see some prominent yet flawed arguments that have been used to enflame discussions, particularly by publisher-plaintiffs and their supporters. For now, let’s focus on some clickbait arguments that sound appealing but are fundamentally baseless. 

Will AI doom human authorship?

Based on current research, AI tools can actually help authors improve creativity, productivity, as well as the longevity of their career

When AI tools such as ChatGPT first appeared online, many leading authors and creators publicly endorsed it as a useful tool like any other tech innovation that came before it. At the same time, many others claimed that authors and creators of lesser caliber will be disproportionately disadvantaged by the advent of AI. 

This intuition-driven hypothesis, that AI will be the bane of average authors, has so far proved to be misguided.

We now know that AI tools can greatly help authors during the ideation stage, especially for less creative authors. According to a study published last month, AI tools had minimal impact on the output of highly creative authors, but were able to enhance the works of less imaginative authors. 

AI can also serve as a readily-accessible editor for authors. Research shows that AI enhances the quality of routine communications. Without AI-powered tools, a less-skilled person will often struggle with the cognitive burden of managing data, which limits both the quality and quantity of their potential output. AI helps level the playing field by handling data-intensive tasks, allowing writers to focus more on making creative and other crucial decisions about their works. 

It is true that entirely AI-generated works of abysmal quality are available for purchase on some platforms. Some of these works are using human authors’ names without authorization. These AI-generated works may infringe on authors’ right of publicity, but they do not present commercially-viable alternatives to books authored by humans. Readers prefer higher-quality works produced with human supervision and interference (provided that digital platforms do not act recklessly towards their human authors despite generating huge profits from human authors).

Are lawsuits against AI companies brought with authors’ best interest in mind? 

In the ongoing debate over AI, publishers and copyright aggregators have suggested that they have brought these lawsuits to defend the interests of human authors. Consider the New York Times for example, in its complaint against OpenAI, NY Times describes their operations as “a creative and deeply human endeavor (¶31)” that necessitates “investment of human capital (¶196).” NY Times argues that OpenAI has built innovation on the stolen hard work and creative output from journalists, editors, photographers, data analysts, and others—an argument contrary to what the NY Times once argued in court in New York Times v. Tasini,  that authors’ rights must take a backseat to NY Times’ financial interests in new digital uses.  

It is also hard to believe that many of the publishers and aggregators are on the side of authors when we look at how they have approached licensing deals for AI training. These licensing deals can be extremely profitable for the publishers. For example, Taylor and Francis sold AI training data to OpenAI for 10 million USD. John Wiley and Sons earned $23 million from a similar deal with a non-disclosed tech company. Though we don’t have the details of these agreements, it seems easy to surmise that in return for the money received, the publishers will not harass the AI companies with future lawsuits. (See our previous blog post about these licensing deals and what you can do as an author.) It is ironic how an allegedly unethical and harmful practice quickly becomes acceptable once the publishers are profiting from it.

How much of the millions of dollars changing hands will go to individual authors? Limited data exist. We know that Cambridge University Press, a good-faith outlier, is offering authors 20% royalties if their work is licensed for AI training. Most publishers and aggregators are entirely opaque about how authors are to be compensated in these deals. Take the Copyright Clearance Center (CCC) for example, it offers zero information about how individual authors are consulted or compensated when their works are sold for AI training under CCC AI training license.

This is by no means a new problem for authors. We know that traditionally-published book authors receive around 10% of royalties from their publishers: a little under $2 per copy for most books. On an ebook, authors receive a similar amount for each “copy” sold. This little amount handed to authors only starts to look generous when compared to academic publishing, where authors increasingly pay publishers to have their articles published in journals. The journal authors receive zero royalties, despite the publishers’ growing profit

Even before the advent of AI technology, most authors were struggling to make a living on writing alone. According to an Authors Guild’s survey in 2018, the median income for full-time writers was $20,300, and for part-time writers, a mere $6,080. Fair wage and equitable profit sharing is an issue that needs to be settled between authors and publishers, even if publishers try to scapegoat AI companies. 

It’s worth acknowledging that it’s not just publishers and copyright industry organizations filing these lawsuits. Many of these ongoing lawsuits have been filed as class actions, with the plaintiffs claiming to represent a broad class of people who are similarly situated and (thus they alleged) hold similar views. Most notably, in Authors Guild v. OpenAI, Authors Guild and its named individual plaintiffs claim to represent all fiction writers in the US who have sold more than 5000 copies of a work. There’s also another case where plaintiff claims to represent all copyright holders of non-fiction works, including authors of academic journal articles, which got support from Authors Guild, and several others in which an individual plaintiff asserts the right to represent virtually all copyright holders of any type

As we (along with many others) have repeatedly pointed out, many authors disagree with the publishers and aggregators’ restrictive view on fair use in these cases, and don’t want or need a self-appointed guardian to “protect” their interests.  We have seen the same over-broad class designation in the Authors Guild v. Google case, which caused many authors to object, including many of our own 200 founding members.

Respect for copyright and human authors’ hard work means no more AI training under US copyright law? 

While we wait for courts to figure out the key questions on infringement and fair use, let’s take a moment to remember what copyright law does not regulate.

Copyright law in the US exists to further the Constitutional goal to “promote the Progress of Science and useful Arts.” In 1991, the Supreme Court held in Feist v. Rural Telephone Service that copyright cannot be granted solely based on how much time or energy authors have expended. “Compensation for hard work“ may be a valid ethical discussion, but it is not a relevant topic in the context of copyright law.

Publishers and aggregators preach that people must “respect copyright,” as if copyright is synonymous with the exclusive rights of the copyright holder. This is inaccurate and misleading. In order to safeguard the freedom of expression, copyright is designed to embody not only the rightsholders’ exclusive rights but also many exceptions and limitations to the rightsholders’ exclusive rights. Similarly, there’s no sound legal basis to claim that authors must have absolute control over their own work and its message. Knowledge and culture thrives because authors are permitted to build upon and reinterpret the works of others

Does this mean I should side with the AI companies in this debate?

Many of the largest AI companies exhibit troubling traits that they have in common with many publishers, copyright aggregators, digital platforms (e.g., Twitter, TikTok, Youtube, Amazon, Netflix, etc.), and many other companies with dominant market power. There’s no transparency or oversight afforded to the authors or the public. The authors and the public have little say in how the AI models are trained, just like how we have no influence over how content is moderated on digital platforms, how much royalties authors receive from the publishers, or how much publishers and copyright aggregators can charge users. None of these crucial systematic flaws will be fixed by granting publishers a share of AI companies’ revenue. 

Copyright also is not the entire story. As we’ve seen recently, there are some significant open questions about the right of publicity and somewhat related concerns about the ability of AI to churn out digital fakes for all sorts of purposes, some of which are innocent, but others are fraudulent, misleading, or exploitative. The US Copyright Office released a report on digital replicas on July 31 addressing the question of digital publicity rights, and on the same day the NO FAKES Act was officially introduced. Will the rights of authors and the public be adequately considered in that debate? Let’s remain vigilant as we wait to see the first-ever AI-generated public figure in a leading role to hit theaters in September 2024.