Tag Archives: Copyright

Independent Publisher’s Lawsuit Against Audible Fails, Highlighting Challenges to Receive Fair Streaming Compensation

Posted February 21, 2025
Adobe Stock Image

Last November, we covered a case where a group of authors complained about McGraw Hill’s interpretation of publishing agreements related to compensation for ebooks. As subscription-based models become increasingly dominant in the publishing industry, authors must be vigilant about how their contracts define compensation. Platforms like Kindle Unlimited, Audible, and academic ebook services are reshaping traditional royalty structures. This is not just a concern for trade books; academic publishing is also shifting towards subscription-based access, as evidenced by ProQuest’s recent announcement that it is ending print sales and moving toward a “Netflix for books” model. 

Here we see yet another case where ambiguous contractual terms resulted in financial loss for an author— 

On Feb. 19th, the Second Circuit affirmed the lower court’s dismissal of Teri Woods Publishing’s copyright infringement and breach-of-contract claims against Audible and other audiobook distributors in Teri Woods Publ’g, LLC v. Amazon.com, Inc. The Plaintiff initially granted the rights (that are the subject of this dispute) to Urban Audios in a licensing agreement. Thereafter, Urban Audio granted the rights under that agreement to Blackstone, which then sublicensed its rights to Amazon and Audible.

The Plaintiff in this case, Teri Woods Publishing, is an independent publisher founded by urban fiction author Teri Woods. The Plaintiff argued—and the courts ultimately disagreed—that the licensing agreement did not unambiguously permit Defendants to distribute Teri Woods’ audiobooks through the Defendants’ online audiobook streaming subscription services. More specifically, on the question of compensation for online streaming, Plaintiff and Defendants disagreed on whether (1) online streaming counted as “internet downloads” or alternatively “other contrivances, appliances, mediums and means,” and (2) the licensing terms dealing with royalties prohibit subscription streaming.

The licensing terms in question are contained in the licensing agreement Plaintiff entered into in 2018, granting Urban Audios the 

“exclusive unabridged audio publishing rights, to manufacture, market, sell and distribute copies throughout the World, and in all markets, copies of unabridged readings of the [Licensed Works] on cassette, CD, MP3-CD, pre-loaded devices, as Internet downloads and on, and in, other contrivances, appliances, mediums and means (now known and hereafter developed) which are capable of emitting sounds derived for the recording of audiobooks.”

In exchange of this assignment of rights, Urban Audio—as the Licensee—must pay Plaintiff: 

“(a) Ten percent (10%) of Licensee’s net receipts from catalog, wholesale and other retail sales and rentals of the audio recordings of said literary work; 

(b) Twenty Five percent (25%) of net receipts on all internet downloads of said literary work. 

(c) Twenty Five percent (25%) of net receipts on Playaway format [under certain conditions].”

In case you are not familiar with the services Amazon Audible provides: members of Audible generally pay a monthly fee to digitally stream or download audiobooks, instead of making any specific payment for the specific audiobooks they are streaming or downloading. This method of distribution, the Plaintiff argues, led to drastically lower compensation than expected, as the audiobooks were made available to subscribers at a fraction of their retail price. 

Audible has a history of relying on ambiguous contractual terms to reduce author payouts. The “Audiblegate” controversy, for instance, exposed how Audible’s return policy allowed listeners to return audiobooks after extensive use, deducting royalties from authors without transparency. That practice came under legal scrutiny inn Golden Unicorn Enters. v. Audible Inc., where authors alleged that Audible deliberately structured its payment model to significantly reduce their earnings (unfortunately, the court in that case also largely sided with Audible)

Despite Audible’s track record, the courts were unsympathetic to Plaintiff’s grievance in the Teri Woods case, and held that the plain meaning of the phrase “other contrivances, appliances, mediums and means (now known and hereafter developed)” in the licensing agreement included digital streams and other future technological developments in distribution services. The courts also observed that the underlying licensing agreement did not provide for the payment of royalties on a per-unit basis; Plaintiff was only entitled to a percentage of “net receipts” received by Urban Audio for sales, rentals, and internet downloads. 

The ambiguity in defining what constitutes an “internet download,” and whether payment was due on a per unit basis, ultimately were interpreted in favor of Audible. This case serves to remind us again of the importance of adopting clear contractual language. 

Licensing agreements should be drafted with clear and precise language regarding revenue models and payment structures. Subscription-based compensation models, like those employed by Audible, fundamentally differ from traditional sales models, often leading to lower per-unit earnings for authors. By failing to anticipate and address these nuances, authors risk losing control over how their works are monetized. Ensuring that rights, distribution methods, and payment structures are clearly defined can prevent disputes and financial losses down the line.

Many authors assume that digital rights are similar to traditional print rights, but as this case demonstrates, vague phrasing can allow distributors to exploit gaps in understanding. If authors do not explicitly outline limitations on emerging distribution technologies, they may find themselves receiving significantly less compensation than they anticipate when signing the agreement. For example, authors should ensure their contracts specify whether subscription-based revenue falls under traditional royalty calculations, and whether distribution via new technological formats require renegotiation. Beyond the issues with ambiguous contractual terms, this case also highlights the broader issue of how digital platforms can negatively impact readers and authors alike. Readers no longer own the books they purchase; instead, they receive licensed access that can be revoked or restricted at any time. This shift undermines the traditional relationship between books and their readers. Authors are equally threatened by these digital intermediaries, who have the power to dictate distribution methods and unilaterally alter revenue models; an author’s right to fair compensation is too often sacrificed along the way. The situation is especially dire with audiobooks, where Audible dominates the market.

Copyrightability and Artificial Intelligence: A new report from the U.S. Copyright Office

Posted February 20, 2025
Uncopyrightable image generated using Google Gemini, illustrating a group of photographers excited to learn that their nearly identical photos of the public domain Washington Monument are all copyrightable) (“The Office receives ten applications, one from each member of a local photography club. All of the photographs depict the Washington Monument and all of them were taken on the same afternoon. Although some of the photographs are remarkably similar in perspective, the registration specialist will register all of the claims.”) (Compendium of Copyright Office Practices, Section 909.1)

Recently, the United States Copyright Office published its Report on Copyright and Artificial Intelligence, Part 2: Copyrightability,  the second report in a three-part series. The Office’s reports and additional related resources can be found on the USCO’s Copyright and Artificial Intelligence webpage.

This latest report was the product of longstanding Copyright Office practices, the USCO’s evolving work and registration guidance in this area, rapid technological developments related to Artificial Intelligence, and over 10,000 reply comments to the Office’s August 2023 Notice of Inquiry. Among those commenters, the Authors Alliance submitted both an initial comment and a reply comment in late 2023.  

In our comments, we urged the Copyright Office to not pursue revisions to the Copyright Act at this time and instead work towards providing greater clarity for authors of AI-generated and AI-assisted works (“Instead of proposing revisions to the Copyright Act to enshrine the human authorship requirement in law or clarify the human authorship requirement in the context of AI-generated works, the Office should continue to promulgate guidance for would-be registrants.”) We also noted that, as technology evolves in the coming years, our ideas about the copyrightability of AI-generated and AI-assisted works will likely shift as well.    

We are happy to see that the USCO heard our voice and that of many others regarding no need for legislative change at this time (“The vast majority of commenters agreed that existing law is adequate in this area…”) (Report, page ii). We likewise continue to be aligned with the USCO’s view that works wholly generated by Artificial Intelligence are not copyrightable. In reading through the entirety of the report, it is clear that the Office appreciates that some elements of AI-assisted works will be copyrightable, but believes that the level of human control over the AI output will be central to the copyrightability inquiry (“Whether human contributions to AI-generated outputs are sufficient to constitute authorship must be analyzed on a case-by-case basis.”) (“Based on the functioning of current generally available technology, prompts do not alone provide sufficient control.”) (Report, page iii)

The Office’s report does provide some useful clarity. At the same time, it takes some positions that fail to adequately address the complexity of AI-generated works. Below, we will unpack a number of elements of the report that are noteworthy.  

Modifying or arranging AI-generated content

The report makes it clear that the USCO views selection and arrangement of AI-generated work as a viable path towards copyrightability of works where AI was an element in the creation of the work. In 2023, when reviewing the graphic novel Zarya of the Dawn, “the Office concluded that a graphic novel comprised of human-authored text combined with images generated by the AI service Midjourney constituted a copyrightable work, but that the individual images themselves could not be protected by copyright.” (Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence, page 2) Thus, authors who incorporate AI-generated work into a larger work will often be successful in registering the whole work, but will typically need to disclaim any AI-generated elements.  

Alternatively, an author who modifies an AI-generated work outside of the AI environment (e.g., an artist who uses Photoshop to make substantial modifications to an AI-generated image), will usually have a path to copyright registration with the USCO. 

The USCO takes the position that most AI-assisted works are not copyrightable

Unlike an AI-generated image later modified manually by a human (which may be copyrightable), when prompt-based modifications to AI generated works are performed entirely within the AI environment, it is clear that the USCO is reluctant to view the resulting work as copyrightable. 

Here, the Office’s position regarding Jason Allen’s attempts to register copyright in the two dimensional artwork Théâtre D’opéra Spatial is illuminating. In developing the image using Midjourney, Allen claimed to have used over 600 text prompts to both generate and alter the image, and further used Photoshop to “beautify and adjust various cosmetic details/flaws/artifacts, etc.,” a process which he viewed as copyrightable authorship.  In denying his claim, the Office responded that “when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the ‘traditional elements of authorship’ are determined and executed by the technology—not the human user.” (88 FR 16190 – Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence, page 16192). 

The USCO dismisses the idea that the process of revising prompts to modify AI output is sufficient to claim copyright in the resulting work. (“Inputting a revised prompt does not appear to be materially different in operation from inputting a single prompt. By revising and submitting prompts multiple times, the user is “re-rolling” the dice, causing the system to generate more outputs from which to select, but not altering the degree of control over the process. No matter how many times a prompt is revised and resubmitted, the final output reflects the user’s acceptance of the AI system’s interpretation, rather than authorship of the expression it contains.”) (Report, page 20) (emphasis added).

Within the report, there is no direct examination of the Théâtre D’opéra Spatial copyright claim and lessons to be learned from it. This is likely due to ongoing litigation between Allen and the USCO. While the USCO has significant practical influence on what materials are protectable under copyright, ultimately the decision falls to the courts. So, this suit and others like it will be important to watch.  Still, the lack of a deeper dive into such a real-world example is unfortunate—such examples offer fertile territory for exploring the boundary lines between copyrightable AI-assisted works and those that will remain uncopyrightable.  

The report offers a sense of possibility with regard to copyrightable AI-assisted works

Towards the end of its report, the USCO briefly explores AI platforms that allow for greater control of the final work. Interestingly, they point to specific features of Midjourney, which allows users to select and modify specific regions of an image. The Office views this as meaningfully different from modifying an AI-generated work through prompts alone, but takes no position as to whether that level of control will result in copyrightable works ( “Whether such modifications rise to the minimum standard of originality required under Feist will depend on a case-by-case determination. In those cases where they do, the output should be copyrightable.”) (Report, page 27).   

Unanswered Questions

Despite the complexity of these issues, the Office has been able to draw some bright lines (e.g., see this webinar on Registration Guidance for Works Containing AI-generated Content). 

Yet, the Office also acknowledges that there are remaining unanswered questions (“So I know that everyone in their particular area of creativity is looking for, you know, more examples and brighter lines. And I think at this point in time, we’re going to be learning as everyone else is learning…we will be providing more guidance as we learn more.”) (Webinar Transcript, Robert Kasunic, page 10) This recognition that the USCO, like everyone, is still learning is refreshing and welcome, given that it’s fairly easy to see that there are murky waters all around. AI-generated works are already frequently a complex hybrid of AI expression and human expression. 

What are some of these questions? 

  1. The technology is still developing and it seems likely that the legal complexity will become even more pronounced as sophisticated generative AI evolves to respond to fine-grained feedback from users, while also offering expression and suggestions that many users will ultimately adopt. Navigating this complexity will be challenging and will require answering a fundamental question: what is the threshold level of human control over AI-generated expression that is necessary as a prerequisite for copyright protection?  
  1. Similarly, what standards might the Copyright Office or the courts develop to prove sufficient human authorship when it is intermingled with AI-generated content? The copyright registration process currently requires very little information and no documentation related to this question. For now, creators don’t have clear guidance on what types of documentation will be most effective if a future dispute arises. 
  1. To the extent that protection does exist in human-guided, but AI-produced content, how will or should the courts determine what are uncopyrightable, AI-generated elements in what will appear to users as a single unified work? Separating human expression that is enmeshed and embedded within uncopyrightable AI expression will require some framework for distinguishing the two in cases of infringement. Although the courts have already developed methods that may shape this (selection, filtration, abstraction, for example) it remains far from clear whether such tests will perform adequately for AI-produced content

We will be watching developments in this space closely and will continue to advocate for reasonable and flexible approaches to copyrightability that align with the practical realities of authorship in an emerging technological landscape.  

Thomson Reuters v. Ross: The First AI Fair Use Ruling Fails to Persuade

Posted February 13, 2025
A confused judge, generated by Gemini AI

Facts of the Case

On February 11, Third Circuit Judge Stephanos Bibas (sitting by designation for the U.S.  District Court of Delaware) issued a new summary judgment ruling in Thomson Reuters v. ROSS Intelligence. He overruled his previous decision from 2023 which held that a jury must decide the fair use question. The decision was one of the first to address fair use in the context of AI, though the facts of this case differ significantly from the many other pending AI copyright suits. 

This ruling focuses on copyright infringement claims brought by Thomson Reuters (TR), the owner of Westlaw, a major legal research platform, against ROSS Intelligence. TR alleged that ROSS improperly used Westlaw’s headnotes and the Key Number System to train its AI system to better match legal questions with relevant case law. 

Westlaw’s headnotes summarize legal principles extracted from judicial opinions. (Note: Judicial opinions are not copyrightable in the US.) The Key Number System is a numerical taxonomy categorizing legal topics and cases. Clicking on a headnote takes users to the corresponding passage in the judicial text. Clicking on the key number associated with a headnote takes users to a list of cases that make the same legal point. 

Importantly, ROSS did not directly ingest the headnotes and the Key Number System to train its model. Instead, ROSS hired LegalEase, a company that provides legal research and writing services, to create training data based on the headnotes and the Key Number System. LegalEase created Bulk Memos—a collection of legal questions paired with four to six possible answers. LegalEase instructed lawyers to use Westlaw headnotes as a reference to formulate the questions in Bulk Memos. LegalEase instructed the lawyers not to copy the headnotes directly. 

ROSS attempted to license the necessary content directly from TR, but TR refused to grant a license because it thought the AI tool contemplated by ROSS would compete with Westlaw.

The financial burden of defending this lawsuit has caused ROSS to shut down its operations. ROSS has countered TR’s copyright infringement claims with antitrust claims but the claims were dismissed by the same Judge. 

The New Ruling

The court found that ROSS copied 2,243 headnotes from Westlaw. The court ruled that these headnotes and the Key Number System met the low legal threshold for originality and were copyrightable. The court rejected the merger and scenes à faire defense by ROSS, because, according to the court, the headnotes and the Key Number System were not dictated by necessity. The court also rejected ROSS’s fair use defense on the grounds that the 1st and 4th factors weighed in favor of TR. At this point, the only remaining issue for trial is whether some headnotes’ copyrights had expired or were untimely registered.

The new ruling has drawn mixed reactions—some saying it undermines potential fair use defenses in other AI cases, while others dismiss its significance since its facts are unique. In our view, the opinion is poorly reasoned and disregards well-established case law. Future AI cases must demonstrate why the ROSS Court’s approach is unpersuasive. Here are three key flaws we see in the ruling.   

Problems with the Opinion

  1. Near-Verbatim Summaries are “Original”?

“A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. … A headnote is a short, key point of law chiseled out of a lengthy judicial opinion.” 

— the ROSS court

(↑example of a headnote and the uncopyrightable judicial text the headnote was based on↑)

The court claims that the Westlaw headnotes are original both individually and as a compilation, and the Key Number System is original and protected as a compilation. 

“Original” has a special meaning in US copyright law: It means that a work has a modicum of human creativity that our society would want to protect and encourage. Based on the evidence that survived redaction, it is near impossible to find creativity in any individual headnotes. The headnotes consist of verbatim copying of uncopyrightable judicial texts, along with some basic paraphrasing of facts. 

As we know, facts are not copyrightable, but expressions of facts often are. One important safeguard for protecting our freedom to reference facts is the merger doctrine. US law has long recognized that when there are only limited ways to express a fact or an idea, those expressions are not considered “original.” The expressions “merge” with the underlying unprotectable fact, and become unprotectable themselves. 

Judge Bibas gets merger wrong—he claims merger does not apply here because “there are many ways to express points of law from judicial opinions.” This view misunderstands the merger doctrine. It is the nature of human language to be capable of conveying the same thing in many different ways, as long as you are willing to do some verbal acrobatics. But when there are only a limited number of reasonable, natural ways to express a fact or idea—especially when textual precision and terms of art are used to convey complex ideas—merger applies. 

There are many good reasons for this to be the law. For one, this is how we avoid giving copyright protection to concise expression of ideas. Fundamentally, we do not need to use copyright to incentivize the simple restatement of facts. As the Constitution intended, copyright law is designed to encourage creativity, not to grant exclusive rights to basic expressions of facts. We want people to state facts accurately and concisely. If we allowed the first person to describe a judicial text in a natural, succinct way to claim exclusive rights over that expression, it would hinder, rather than facilitate, meaningful discussion of said text, and stifle blog posts like this one. 

As to the selection and arrangement of the Key Number System, the court claims that originality exists here, too, because “there are many possible, logical ways to organize legal topics by level of granularity,” and TR exercised some judgment in choosing the particular “level” with its Key Number System. However, the cases are tagged with Key Number System by an automated computer system, and the topics closely mirror what law schools teach their first-year students. 

The court does not say much about why the compilation of the headnotes should receive separate copyright protection, other than that it qualifies as original “factual compilations.” This claim is dubious because the compilation is of uncopyrightable materials, as discussed, and the selection is driven by the necessity to represent facts and law, not by creativity. Even if the compilation of headnotes is indeed copyrightable, using portions of it that are uncopyrightable is decidedly not an infringement, because the US does not protect sui generis database rights.

  1. Can’t Claim Fair Use When Nobody Saw a Copy?

 “[The intermediate-copying cases] are all about copying computer code. This case is not.” 

— the ROSS court conveniently ignoring Bellsouth Advertising & Publishing Corp. v. Donnelley Information Publishing, Inc., 933 F.2d 952 (11th Cir. 1991) and Sundeman v. Seajay Society, Inc., 142 F. 3d 194 (4th Cir. 1998).

In deciding whether ROSS’s use of Westlaw’s headnotes and the Key Number System is transformative under the 1st factor, the court took a moment to consider whether the available intermediate copying case law is in favor of ROSS, and quickly decided against it. 

Even though no consumer ever saw the headnotes or the Key Number System in the AI products offered by ROSS, the court claims that the copying of these constitutes copyright infringement because there existed an intermediate copy that contained copyright-restricted materials authored by Westlaw. And, according to the court, intermediate copying can only weigh in favor of fair use for computer codes.

Before turning to the actual case law the court is overlooking here, we wonder if Judge Bibas is in fact unpersuaded by his own argument: under the 3rd fair use factor, he admits that only the content made accessible to the public should be taken into consideration when deciding what amount is taken from a copyrighted work compared to the copyrighted work as a whole, which is contrary to what he argues under the 1st factor—that we must examine non-public intermediate copies. 

Intermediate copying is the process of producing a preliminary, non-public work as an interim step in the creation of a new public-facing work. It is well established under US jurisprudence that any type of copying, whether private or public, satisfies a prima facie copyright infringement claim, but, the fact that a work was never shared publicly—nor intended to be shared publicly—strongly favors fair use. For example, in Bellsouth Advertising & Publishing Corp. v. Donnelley Information Publishing, Inc., the 11th Circuit Court decided that directly copying a competitor’s yellow pages business directory in order to produce a competing yellow pages was fair use when the resulting publicly accessible yellow pages the defendant created did not directly incorporate the plaintiff’s work. Similarly, in Sundeman v. Seajay Society, Inc., the Fourth Circuit concluded that it was fair use when the Seajay Society made an intermediary, entire copy of plaintiffs’ unpublished manuscript for a scholar to study and write about it. The scholar wrote several articles about it mostly summarizing important facts and ideas (while also using short quotations).  

There are many good reasons for allowing intermediate copying. Clearly, we do not want ALL unlicensed copies to be subject to copyright infringement lawsuits, particularly when intermediate copies are made in order to extract unprotectable facts or ideas. More generally, intermediate copying is important to protect because it helps authors and artists create new copyrighted works (e.g., sketching a famous painting to learn a new style, translating a passage to practice your language skills, copying the photo of a politician to create a parody print t-shirt). 

  1. Suddenly, We Have an AI Training Market?

“[I]t does not matter whether Thomson Reuters has used [the headnotes and the Key Number System] to train its own legal search tools; the effect on a potential market for AI training data is enough.”

 — the ROSS court

The 4th fair use factor is very much susceptible to circular reasoning: if a user is making a derivative use of my work, surely that proves a market already exists or will likely develop for that derivative use, and, if a market exists for such a derivative use, then, as the copyright holder, I should have absolute control over such a market.

The ROSS court runs full tilt into this circular trap. In the eyes of the court, ROSS, by virtue of using Westlaw’s data in the context of AI training, has created a legitimate AI training data market that should be rightfully controlled by TR.

Only that our case law suggests the 4th factor “market substitution” considers only markets which are traditional, reasonable or likely to be developed. As we have already pointed out in a previous blog post, copyright holders must offer concrete evidence to prove the existence, or likelihood of developing, licensing market, before they can argue a secondary use serves as “market substitute.” If we allowed a copyright holder’s protected market to include everything that he’s willing to receive licensing fees for, it will all but wipe out fair use in the service of stifling competition. 

Conclusion

The impact of this case is currently limited, both because it is a district court ruling and because it concerns non-generative AI. However, it is important to remain vigilant, as the reasoning put forth by the ROSS court could influence other judges, policymakers, and even the broader public, if left unchallenged.

This ruling combines several problematic arguments that, if accepted more widely, could have significant consequences. First, it blurs the line between fact and expression, suggesting that factual information can become copyrightable simply by being written down by someone in a minimally creative way. Second, it expands copyright enforcement to intermediate copies, meaning that even temporary, non-public use of copyrighted material could be subject to infringement claims. Third, it conjures up a new market for AI training data, regardless of whether such a licensing market is legitimate or even likely to exist.

If these arguments gain traction, they could further entrench the dominance of a few large AI companies. Only major players like Microsoft and Meta will be able to afford AI training licenses, consolidating control over the industry. The AI training licensing terms will be determined solely between big AI companies and big content aggregators, without representation of individual authors or public interest.  The large content aggregators will get to dictate the terms under which creators must surrender rights to their works for AI training, and the AI companies will dictate how their AI models can be used by the general public. 

Without meaningful pushback and policy intervention, smaller organizations and individual creators cannot participate fairly. Let’s not rewrite our copyright laws to entrench this power imbalance even further.

Artificial Intelligence, Authorship, and the Public Interest

Posted January 9, 2025
Photo by Robert Anasch on Unsplash

Today, we’re pleased to announce a new project generously supported by the John S. and James L. Knight Foundation. The project, “Artificial Intelligence, Authorship, and the Public Interest,” aims to identify, clarify, and offer answers to some of the most challenging copyright questions posed by artificial intelligence (AI) and explain how this new technology can best advance knowledge and serve the public interest.

Artificial intelligence has dominated public conversation about the future of authorship and creativity for several years. Questions abound about how this technology will affect creators’ incentives, influence readership, and what it might mean for future research and learning. 

At the heart of these questions is copyright law. Over two dozen class-action copyright lawsuits have been filed between November 2022 and today against companies such as Microsoft, Google, OpenAI, Meta, and others. Additionally, congressional leadership, state legislatures, and regulatory agencies have held dozens of hearings to reconcile existing intellectual property law with artificial intelligence. As one of the primary legal mechanisms for promoting the “progress of science and the useful arts,” copyright law plays a critical role in creating, producing, and disseminating information. 

We are convinced that how policymakers shape copyright law in response to AI will have a lasting impact on whether and how the law supports democratic values and serves the common good. That is why Authors Alliance has already devoted considerable effort to these issues, and this project will allow us to expand those efforts at this critical moment. 

AI Legal Fellow
As part of the project, we’re pleased to add an AI Legal Fellow to our team to support the project. The position requires a law degree and demonstrated interest and experience with artificial intelligence, intellectual property, and legal technology issues. We’re particularly interested in someone with a demonstrated interest in how copyright law can serve the public interest. This role will require significant research and writing. Pay is $90,000/yr, and it is a two-year term position. Read more about the position here. We’ll begin reviewing applications immediately and do interviews on a rolling basis until filled. 

As we get going, we’ll have much more to say about this project. We will have some funds available to support research subgrants, organize several workshops and symposia, and offer numerous opportunities for public engagement. 

About the John S. and James L. Knight Foundation
We are social investors who support democracy by funding free expression and journalism, arts and culture in community, research in areas of media and democracy, and in the success of American cities and towns where the Knight brothers once had newspapers. Learn more at kf.org and follow @knightfdn on social media.

Developing a public-interest training commons of books

Posted December 5, 2024
Photo by Zetong Li on Unsplash

Authors Alliance is pleased to announce a new project, supported by the Mellon Foundation, to develop an actionable plan for a public-interest book training commons for artificial intelligence. Northeastern University Library will be supporting this project and helping to coordinate its progress.

Access to books will play an essential role in how artificial intelligence develops. AI’s Large Language Models (LLMs) have a voracious appetite for text, and there are good reasons to think that these data sets should include books and lots of them. Over the last 500 years, human authors have written over 129 million books. These volumes, preserved for future generations in some of our most treasured research libraries, are perhaps the best and most sophisticated reflection of all human thinking. Their high editorial quality, breadth, and diversity of content, as well as the unique way they employ long-form narratives to communicate sophisticated and nuanced arguments and ideas make them ideal training data sources for AI.

These collections and the text embedded in them should be made available under ethical and fair rules as the raw material that will enable the computationally intense analysis needed to inform new AI models, algorithms, and applications imagined by a wide range of organizations and individuals for the benefit of humanity. 

Currently, AI development is dominated by a handful of companies that, in their rush to beat other competitors, have paid insufficient attention to the diversity of their inputs, questions of truth and bias in their outputs, and questions about social good and access. Authors Alliance, Northeastern University Library, and our partners seek to correct this tilt through the swift development of a counterbalancing project that will focus on AI development that builds upon the wealth of knowledge in nonprofit libraries and that will be structured to consider the views of all stakeholders, including authors, publishers, researchers, technologists, and stewards of collections. 

The main goal of this project is to develop a plan for either establishing a new organization or identifying the relevant criteria for an existing organization (or partnership of organizations) to take on the work of creating and stewarding a large-scale public interest training commons of books.

We seek to answer several key questions, such as: 

  • What are the right goals and mission for such an effort, taking into account both the long and short-term;
  • What are the technical and logistical challenges that might differ from existing library-led efforts to provide access to collections as data;
  • How to develop a sufficiently large and diverse corpus to offer a reasonable alternative to existing sources;
  • What a public-interest governance structure should look like that takes into account the particular challenges of AI development;
  • How do we, as a collective of stakeholders from authors and publishers to students, scholars, and libraries, sustainably fund such a commons, including a model for long-term sustainability for maintenance, transformation, and growth of the corpus over time;
  • Which combination of legal pathways is acceptable to ensure books are lawfully acquired in a way that minimizes legal challenges;
  • How to respect the interests of authors and rightsholders by accounting for concerns about consent, credit, and compensation; and
  • How to distinguish between the different needs and responsibilities of nonprofit researchers, small market entrants, and large commercial actors.

The project will include two meetings during 2025 to discuss these questions and possible ways forward, additional research and conversations with stakeholders, and the development and release of an ambitious yet achievable roadmap.

New White Paper on Open Access and U.S. Federal Information Policy

Posted November 18, 2024
Photo by Sara Cottle on Unsplash

Authors Alliance and SPARC have released the first of four planned white papers addressing legal issues surrounding open access to scholarly publications under the 2022 OSTP memo (the “Nelson Memo”). The white papers are part of a larger project (described here) to support legal pathways to open access. 

This first paper discusses the “Federal Purpose License,” which is newly relevant to discussions of federal public access policies in light of the Nelson Memo.

The white paper is available here and supporting materials are here.

The FPL, found in 2 C.F.R. § 200.315(b), works like any other copyright licensing agreement between two parties. It is a voluntary agreement between author and agency that, as a condition of federal funding, the agency reserves a nonexclusive license to “reproduce, publish, or otherwise use the work for Federal purposes and to authorize others to do so.” The FPL was updated, effective October 1, to clarify that the reserved license specifically includes the right to deposit copyrighted works produced pursuant to a grant in agency-designated public access repositories.

With the OSTP memos instructing all agencies to make the results of federally-funded projects available to the public immediately upon publication, the FPL provides an elegant legal basis for doing so. Because the FPL is a signed, written, non-exclusive license that springs to life immediately when copyright in the works vest, it survives any future transfers of rights in the work. As a part of Uniform Guidance for all grant-making agencies, it provides consistency across federal grants, simplifying things for grant recipients, who have plenty of other things to worry about (it’s not entirely uniform, though, since some agencies have supplemented the FPL with License text of their own, expanding their rights under the License).

This protects both agencies and authors. Agencies must have permission in order to host and distribute works in their repositories. The FPL ensures that the agency has that authorization and that it continues even after publication rights have been subsequently assigned to a publisher. Meanwhile, authors are—or will be—required under their grant agreements to deposit their federally-funded peer-reviewed articles in the agency’s designated repository. The FPL ensures that, even if an author were to sign exclusive rights in a work to a publisher prior to complying with the deposit mandate, the author could still do so, despite no longer having any rights in the work herself.

The paper analyzes two ambiguous points in the FPL, namely, the scope of what rights agencies have as “Federal purposes” and what rights the agency may subsequently authorize for third parties. As there are no clear answers to these questions, the paper does not draw conclusions; it does, however, attempt to give some context and basis for how to interpret the FPL.

The next papers in this series will explore issues surrounding the legal authority underlying the public access policy, article versioning, and the policy’s interaction with institutional IP policies. Stay tuned for more!

Revived Class Action Against McGraw Hill: the Importance of Publishing Contracts

Posted November 15, 2024

open book with glasses on top

On November 6th, the 2nd Circuit Court of Appeals overturned the lower court’s dismissal in Flynn v. McGraw Hill, and allowed the plaintiffs’ breach of contract claim to move forward. 

The breach of contract claim involves McGraw Hill’s alleged practice of reducing or ceasing royalty payments on revenues generated through McGraw Hill’s online platform, Connect, which hosts electronic textbooks and related course materials since its launch in 2009. The publishing contracts at issue specified that McGraw Hill would publish the plaintiffs’ textbooks “at its own expense” and that royalties would be based on “Publisher’s net receipts”—defined mostly as “the Publisher’s selling price, less discounts, credits, and returns, or a reasonable reserve for returns;” although the initially signed contracts only covered print works, McGraw Hill later amended the contracts to cover electronic works under the same royalties structure. McGraw Hill paid royalties based on the entire revenue from ebook sales through Connect, which included both the ebook and its accompanying materials such as PowerPoint lesson plans and test banks.

This changed in 2020, according to the plaintiffs, when McGraw Hill started paying royalties solely on sales attributed to the ebooks, excluding the revenue derived from the accompanying materials, despite the fact that the accompanying materials cannot be bought independent of the ebook. Under the new practice, McGraw Hill would unilaterally determine which part of the revenue is attributable to the ebooks, their accompanying materials, or the Connect platform, even though the sales are always based on a “single unitary price”.

The plaintiffs argue that this new arrangement violated McGraw Hill’s promise to publish the works “at its own expense,” a provision that should have meant authors wouldn’t be charged for the cost of operating or maintaining the publisher’s infrastructure; this claim is now allowed to go forward. The claim related to “net receipts” was again dismissed.

While the ongoing developments in this case are worth watching closely, it also serves as a timely reminder—especially in light of publishers’ licensing content for AI training—for authors to carefully review and negotiate their publishing agreements, and to rely on the contractual terms that hold publishers accountable to their promises.

Let’s take this opportunity to quickly remind ourselves of a couple of less-discussed contractual terms that may in fact be too important to ignore.

1. “…media now known and may be developed in the future”

The harm plaintiffs are claiming, in this case, is a whopping 25% to 35% drop in royalties when works are published on McGraw Hill’s online platform. Although this case only arose out of the electronic rights of textbooks, it reminds us how the advent of new technology could easily undermine instead of boost the income of authors.

Barely a decade ago, most experts of the publishing industry believed that the economics of e-book publishing were more favorable to publishers, as e-books are cheaper to produce than print books. As a result, authors should expect to receive a much larger share of the revenue—well above the typical 10-15% of the retail price for trade books.

The Flynn case confirms many authors’ suspicion that authors may not necessarily share in the financial boon brought by new technologies. It is thus important for authors to be wary of a broad copyright license that allows all future technology for disseminating the authors’ works. 

It’s worth reviewing terms that address the publisher’s ability to license your works in specific contexts, including digital platforms and emerging technology that are not named. Instead of “media now known and may be developed in the future,” authors should consider limiting the publication of their works to specific, enumerated media, such as print books or ebooks. Failing that, authors should propose alternative terms that could safeguard their interests, such as a clause that allows for rights reversion if royalties fall below a certain level.

2. Royalty Audit

A common feature of publishing contracts is a clause that allows authors to audit the publisher’s accounting. While it may not seem like a top priority at first glance, authors should absolutely take advantage of this provision if it’s included in their agreement. An audit right provides authors with the legal right to review the publisher’s financial records to verify whether they are being compensated fairly and according to the terms of the contract.

Authors in the Flynn case learned about the new royalties arrangement through an email from the publisher. It is of course important for authors to monitor any communications sent by their publishers. However, it is not certain that publishers will always disclose it when they adopt a new method of calculating royalties, and certainly not a given that their accounting never makes any mistake. When authors become suspicious of their publisher’s deductions or other financial practices, the ability to audit can be crucial. Publishers may make deductions or shift expenses that are not immediately obvious to authors based on the royalties they receive. An audit can help uncover if a publisher is deducting expenses that are unjustified (such as fees for maintaining online systems, as in this case). The audit right can be an essential tool for discovering accounting discrepancies and ensuring the publisher is acting in good faith.

As generative AI tools become more prevalent, many authors are concerned about how their works may be used for AI training without their knowledge or consent. It’s important to remember that not all contracts automatically grant publishers or other entities the right to license works for use in AI training. If you have retained sublicensing rights, or your publishing contract offers a broader definition of net receipts or profits, you could be entitled to the revenue your publishers earned from selling your works to train AI. 

Just as with traditional royalties, income from AI licensing should be distributed according to the terms of the contract. If you’re uncertain about whether you are getting fairly compensated, don’t hesitate to utilize the auditing right to request detailed information from your publisher.

Final Thoughts: Be Proactive and Stay Informed

At the heart of the Flynn v. McGraw Hill case is a breach of contract claim. The plaintiffs argue that McGraw Hill’s royalty deductions for maintaining its online system violated the terms of the agreement. Central to the argument is the publisher’s promise to ‘publish at its own expense.’ This case serves as a prime example of how important it is to scrutinize the details of a publishing agreement, where the devil often lies.

Many publishing agreements are complex and may contain clauses that, while seemingly minor, can have significant financial and creative consequences. It’s essential that authors take the time to review their contracts thoroughly, ideally consulting with colleagues and mentors who have more extensive experience with similar situations, to fully understand—at the very least—how their income will be calculated and what rights they are granting to the publisher.

The DMCA 1201 Rulemaking: Summary, Key Takeaways, and Other Items of Interest

Posted November 8, 2024

Last month, we blogged about the key takeaways from the 2024 TDM exemptions recently put in place by the Librarian of Congress, including how the 2024 exemptions (1) expand researchers’ access to existing corpora, (2) definitively allow the viewing and annotation of copyrighted materials for TDM research purposes, and (3) create new obligations for researchers to disclose security protocols to trade associations. Beyond these key changes, the TDM exemptions remain largely the same: researchers affiliated with universities are allowed to circumvent TPMs to compile corpora for TDM research, provided that those copies of copyrighted materials are legally obtained and adequate security protocols are put in place.

We have since updated our resources page on Text and Data Mining and have incorporated the new developments into our TDM report: Text and Data Mining Under U.S. Copyright Law: Landscape, Flaws & Recommendations.

In this blog post, we share some further reflections on the newly expanded TDM exemptions—including (1) the use of AI tools in TDM research, (2) outside researchers’ access to existing corpora, (3) the disclosure requirement, and (4) a potential TDM licensing market—as well as other insights that emerged during the 9th triennial rulemaking.

The TDM Exemption

In other jurisdictions, such as the EU, Singapore, and Japan, legal provisions that permit “text data mining” also allow a broad array of uses, such as general machine learning and generative AI model training. In the US, exemptions allowing TDM so far have not explicitly addressed whether AI could be used as a tool for conducting TDM research. In this round of remaking, we were able to gain clarity on how AI tools are allowed to aid TDM research. Advocates for the TDM exemptions provided ample examples of how machine learning and AI are key to conducting TDM research and asked that “generative AI” not be deemed categorically impermissible as a tool for TDM research. The Copyright Office agreed that a wide array of tools could be utilized for TDM research under the exemptions, including AI tools, as long as the purpose is to conduct “scholarly text and data mining research and teaching.” The Office was careful to limit its analysis to those uses and not address other applications such as compiling data—or reusing existing TDM corpora—for training generative AI models; those are an entirely separate issue from facilitating non-commercial TDM research.

Besides clarifying that AI tools are allowed for TDM research and that viewing and annotation are permitted for copyrighted materials, the new exemptions offer meaningful improvement to TDM researchers’ access to corpora. The previous 2021 exemptions allowed access for purposes of “collaboration,” but many researchers interpreted that narrowly, and the Office confirmed that “collaboration” was not meant to encompass outside research projects entirely unrelated to the original research for which the corpus was created. Under the 2021 exemptions, a TDM corpus could only be accessed by outside researchers if they are working on the same research project as the original compiler of the corpus. The 2024 exemptions’ expansion of access to existing corpora has two main components and advantages. 

The expansion now allows for new research projects to be conducted on existing corpora, permitting institutions that have created a corpus to provide access “to researchers affiliated with other nonprofit institutions of higher education, with all access provided only through secure connections and on the condition of authenticated credentials, solely for purposes of text and data mining research or teaching.” At the same time, it also opens up new possibilities for researchers at institutions who otherwise would not have access, as the new exemption does not require a precondition that the outside researchers’ institutions otherwise own copies of works in the corpora. The new exemptions pose some important limitations: only researchers at institutions of higher education are allowed this access, and nothing more than “access” is allowed—it does not, for example, allow the transfer of a corpus for local use. 

The Office emphasized the need for adequate security protections, pointing back to cases such as Authors Guild v. Google and Authors Guild v. HathiTrust, which emphasized how careful both organizations were, respectively, to prevent their digitized corpora from being misused. To take advantage of this newly expanded TDM exemption, it will be crucial for universities to provide adequate IT support to ensure that technical barriers do not impede TDM researchers. That said, the record for the exemption shows that existing users are exceedingly conscientious when it comes to security. There have been zero reported instances of security breaches or lapses related to TDM corpora being compiled and used under the exemptions. 

As we previously explained, the security requirements are changed in a few ways. The new rule clarifies that trade associations can send inquiries on behalf of rightsholders. However, inquiries must be supported by a “reasonable belief” that the sender’s works are in a corpus being used for TDM research. It remains to be seen how the new obligation to disclose security measures to trade associations would impact TDM researchers and their institutions. The Register circuitously called out demands by trade associations sent to digital humanities researchers in the middle of the exemption process with a two-week response deadline as unreasonable and quoted NTIA (which provides input on the exemptions) in agreement that  “[t]he timing, targeting, and tenor of these requests [for institutions to disclose their security protocols] are disturbing.”  We are hopeful that this discouragement from the Copyright Office will prevent any future large-scale harassment towards TDM researchers and their institutions, but we will also remain vigilant in case trade associations were to abuse this new power. 

Alongside the concerns over disclosure requirements, we have some questions about the Copyright Office’s treatment of fair use as a rationale for circumventing TPMs for TDM research. The Register restated her 2021 conclusion that “under Authors Guild, Inc. v. HathiTrust, lost licensing revenue should only be considered ‘when the use serves as a substitute for the original.’” The Office, in its recommendations, placed considerable weight on the lack of a viable licensing market for TDM, which raises a concern that, in the Office’s view, a use that once was fair and legal might lose that status when the rightsholder starts to offer an adequate licensing option. While this may never become a real issue for the existing TDM exemptions (because no sufficient licensing options exist for TDM researchers, and for the breadth and depth of content needed, it seems unlikely to ever develop), it nonetheless contributes to the growing confusion surrounding the stability of a fair use defense in the face of new licensing markets. 

These concerns highlight the need for ongoing advocacy in the realm of TDM research. Overall, the Register of Copyright recognizes TDM as “a relatively new field that is quickly evolving.” This means that we could ask the Library of Congress to relax the limitations placed on TDM if we can point to legitimate research-related purposes. But, due to the nature of this process, it also means TDM researchers do not have a permanent and stable right to circumvent TPMs. As the exemptions remain subject to review every three years, many large trade associations advocate for the TDM exemptions to be greatly limited or even canceled, wishing to stifle independent TDM research. We will continue to advocate for TDM researchers, as we did during the 8th and 9th triennial rulemaking. 

Looking beyond the TDM exemption, we noted a few other developments: 

Warhol has not fundamentally changed fair use

First, the Opponents of the renewal of the existing exemptions repeatedly pointed to Warhol Foundation v. Goldsmith—the Supreme Court’s most recent fair use opinion—to argue that it has changed the fair use analysis such that the existing exemptions should not be renewed. For example, the Opponents argued that the fair use analysis for repairing medical devices changed under Warhol because, according to them, commercial nontransformative uses were less likely to be fair. The Copyright Office did not agree. The Register said that the same fair use analysis as in 2021 applied and that the Opponents failed “to show that the Warhol decision constitutes intervening legal precedent rendering the Office’s prior fair use analysis invalid.” In another instance where the Opponents tried to argue that commerciality must be given more weight under Warhol, the Register pointed out that under Warhol commerciality is not dispositive and must be weighed against the purpose of the new use.  The arguments for revisiting the 2021 fair use analyses were uniformly rejected, which we think is good news for those of us who believe Warhol should be read as making a modest adjustment to fair use and not a wholesale reworking of the fair use doctrine. 

Does ownership and control of copies matter for access? 

One of the requests before the Office was an expansion of an exemption that allows for access to preservation copies of computer programs and video games. The Office rejected the main thrust of the request but, in doing so, also provided an interesting clarification that may reveal some of the Office’s thinking about the relationship between fair use and access to copies owned by the user: 

The Register concludes that proponents did not show that removing the single user limitation for preserved computer programs or permitting off-premises access to video games are likely to be noninfringing. She also notes the greater risk of market harm with removing the video game exemption’s premises limitation, given the market for legacy video games. She recommends clarifying the single copy restriction language to reflect that preservation institutions can allow a copy of a computer program to be accessed by as many individuals as there are circumvented copies legally owned.”

That sounds a lot like an endorsement of the idea that the owned-to-loaned ratio, a key concept in the controlled digital lending analysis, should matter in the fair use analysis (which is something the Hachette v. Internet Archive controlled digital lending court gave zero weight to). For future 1201 exemptions, we will have to wait and see whether the Office will use this framework in other contexts. 

Addressing other non-copyright and AI questions in the 1201 process

The Librarian of Congress’s final rule included a number of notes on issues not addressed by the rulemaking: 

“The Librarian is aware that the Register and her legal staff have invested a great deal of time over the past two years in analyzing the many issues underlying the 1201 process and proposed exemptions. 

Through this work, the Register has come to believe that the issue of research on artificial intelligence security and trustworthiness warrants more general Congressional and regulatory attention. The Librarian agrees with the Register in this assessment. As a regulatory process focused on technological protection measures for copyrighted content, section 1201 is ill-suited to address fundamental policy issues with new technologies.” 

Proponents tried to argue that the software platforms’ restrictions and barriers to conducting AI research, such as their account requirements, rate limits, and algorithmic safeguards, are circumventable TPMs under 1201, but the Register disagreed. The Register maintained that the challenges Proponents described arose not out of circumventable TPMs but out of third-party controlled Software as a Service platforms. This decision can be illuminating for TDM researchers seeking to conduct TDM research on online streaming media or social media posts.

The Librarian’s note went on to say: “The Librarian is further aware of the policy and legal issues involving a generalized ‘‘right to repair’’ equipment with embedded software. These issues have now occupied the White House, Congress, state legislatures, federal agencies, the Copyright Office, and the general public through multiple rounds of 1201 rulemaking. 

Copyright is but one piece in a national framework for ensuring the security, trustworthiness, and reliability of embedded software, as well as other copyright-protected technology that affects our daily lives. Issues such as these extend beyond the reach of 1201 and may require a broader solution, as noted by the NTIA.”

These notes give an interesting, though a bit confusing, insight into how the Librarian of Congress and the Copyright Office think about the role of 1201 rulemaking when they address issues that go beyond copyright’s core concerns. While we can agree that 1201 is ill-suited to address fundamental policy issues with new technology, it is also somewhat concerning that the Office and the Librarian view copyright more generally as part of a broader “national framework for ensuring the security, trustworthiness, and reliability of embedded software.”  While of course, copyright is sometimes used to further ends outside of its intended purpose, these issues are far from the core constitutional purpose of copyright law and we think they are best addressed through other means. 

Copyright Management Information, 1202(b), and AI

Posted October 30, 2024

This post is by Maria Crusey, a third-year law student at Washington University in St. Louis. Maria has been working with Authors Alliance this semester on a project exploring legal claims in the now 30+ pending copyright AI lawsuits. 

In the recent spate of copyright infringement lawsuits against AI developers, many plaintiffs allege violations of 17 U.S.C. § 1202(b) in their use of copyrighted works for training and development of AI systems.  

Section 1202(b) prohibits the “removal or alteration of copyright management information.” Compared to related provisions in 17 U.S.C. § 1201, which protects against circumvention of copyright protection systems, §1202(b) has seldom been litigated at the appellate level, and there’s a growing divide among district courts about whether §1202(b) should apply to derivative works, particularly those created using AI technology.

At first glance, §1202(b) appears to be a straightforward provision. However, the uptick in §1202(b) claims raises some challenging questions, namely: How does §1202(b) apply to the use of a copyrighted work as part of a dataset that must be cleaned, restructured, and processed in ways that separate copyright management information from the content itself? And how should 1202(b) apply to AI systems that may reproduce small portions of content contained in training data?  Answers to this question may have serious implications in the AI suits because violations of 1202(b) can come with hefty statutory damage awards – between $2,500 and $25,000 for each violation. Spread across millions of works, the damages could be staggering. How the courts resolve this issue could also impact many other reuses of copyrighted works–from analogous uses such as text data mining research to much more routine re-distribution of copyrighted works in other contexts. 

One of these AI cases has requested that the Ninth Circuit Court of Appeals accept an interlocutory appeal on just this issue, and we are waiting to see whether the court will accept it.

For an introduction to §1202(b) and observations on this question, among others, read on:

What is § 1202(b) and what is it intended to do?

Broadly, 17 U.S.C. § 1202 is a provision of the Digital Millennium Copyright Act (DMCA) that protects the integrity of copyright management information (“CMI”). Per §1202(c), CMI comprises certain information identifying a copyrighted work, often including the title, the name of the author, and terms and conditions for the use of a work.

Section 1202(b) forbids the alteration or removal of copyright management information. The section provides that:

“[n]o person shall, without the authority of the copyright owner or the law – 

(1) intentionally remove or alter any CMI,

(2) distribute or import for distribution CMI knowing that the CMI has been removed or altered without authority of the copyright owner or the law, or 

(3) distribute, import for distribution, or publicly perform works, copies of works or phonorecords, knowing that copyright management information has been removed or altered without authority of the copyright owner or the law, knowing, or with respect to civil remedies under section 1203, having reasonable grounds to know that it will induce, enable, facilitate, or conceal an infringement of any right under this title.”

17 U.S.C. § 1202(b).

Congress primarily aimed to limit the assistance and enablement of copyright infringement in its enactment of §1202(b). This purpose is evident in the legislative history of the provision. In an address to a congressional subcommittee prior to the adoption of the DMCA, the then–Register of Copyrights, Marybeth Peters, discussed the aims of §1202(b). First, Peters noted that the requirements of §1202(b) would make CMI more reliable and thus aid in the administrability of copyright law. Second, Peters stated that §1202(b) would help prevent instances of copyright infringement that could come from the removal of CMI. The idea is if a copyrighted work lacks CMI, there is a greater likelihood of infringement since others may use the work under the pretense that they are the author or copyright holder. In creating a statutory violation for a party’s removal of CMI, regardless of later infringing activity, §1202(b) functions as damage control against potential copyright infringement.

What are the essential elements of a § 1202(b) claim?

To have a claim under §1202(b), a plaintiff must allege particularized facts about the existence and alteration or removal of CMI. Additionally, some courts require a plaintiff to demonstrate that the defendant had knowledge that the CMI was being altered or removed and that the alteration or removal would enable copyright infringement. Finally, some courts have required plaintiffs to show that the work with the altered or removed CMI is an exact copy of the original work–what has become known as the “identicality” requirement. This last “identicality” requirement is one of the main issues in the AI lawsuits raising §1202(b) and is detailed further below.

→ The “Identicality” Requirement

Courts that have imposed “identicality” have required that plaintiffs demonstrate that the work with the removed CMI is an exact copy of the original work and thus is “identical,” except for the missing or altered CMI. 

Suppose, for example, a photographer owns the copyright to a photograph they took. The photographer adds CMI to the photograph and takes care to protect the integrity of the work as it is dispersed online. A third party captures the photograph posted on a website by taking a screenshot and removes the CMI from the copied image while keeping all other aspects of the original photograph the same. The screenshot with the removed CMI is an “exact copy” of the original photograph because the only difference between the copyrighted photograph and the screenshot is the removal of the CMI.

Federal courts are divided in imposing the identicality requirement for §1202(b) claims, though the circuit courts have not yet addressed the issue. Notably, district courts of the Ninth Circuit Court of Appeals have varied in their treatments of the identicality requirement. For example, the court for the District of Nevada in Oracle v. Rimini Street declined to impose the identicality requirement because the requirement may weaken the intended protections for copyright holders under §1202(b). Conversely, in Kirk Kara Corp. v. W. Stone & Metal Corp., a court in the Central District of California applied the identicality requirement, though it provided little explanation for why it adopted it. Application of the identicality requirement is also unsettled in district courts beyond the Ninth Circuit (see, for example, this Southern District of Texas case discussing at length the identicality requirement and rejecting it). 

What are the §1202(b) claims at issue in the present suits?

The claims in Doe 1 v. Github exemplify the §1202(b) issues common among the present suits, and it is the Github suit that is presently before the Ninth Circuit Court of Appeals to take, if it wishes, on appeal.  

In Github, owners of copyrights in software code brought a suit against GitHub, a software developer platform. The plaintiffs alleged that Microsoft Copilot, an AI product developed in part by GitHub, illegally removed CMI from their works. The plaintiffs stored their software in GitHub’s publicly accessible software repositories under open-source license agreements. The plaintiffs claimed that GitHub removed CMI from their code and trained the Copilot AI model on the code in violation of the license agreements. Moreover, the plaintiffs claimed that, when prompted to generate software code, Copilot includes unique aspects of the plaintiffs’ code in its outputs. In their complaint, the plaintiffs alleged that all requirements for a valid § 1202(b) claim were met in the present suit. The plaintiffs stressed that, in removing CMI, the defendants failed to prevent users of products from making non-infringing use of the product. Consequently, they claim, the defendants removed the CMI, knowing that it would “induce, enable, facilitate, and/or conceal infringement” of copyrights in violation of the DMCA.

Regarding the §1202(b) claims, the parties contest the application of the identicality requirement. The plaintiffs first argue that § 1202 contains no such requirement: “The plain language of DMCA § 1202 makes it a violation to remove or alter CMI. It does not require that the output work be original or identical to obtain relief. . . By a plain reading of the statute, there is no need for a copy to be identical—there only needs to be copying, which Plaintiffs have amply alleged.” 

As a backstop, the plaintiffs further argue that Copilot does produce “near-identical reproduction[s]” of their copyrighted code and allege this is sufficient to fulfill the identicality requirement under §1202(b). Specifically, plaintiffs claimed that Copilot generates parts of plaintiffs’ code in extra lines of output code that are not relevant to input prompts. Plaintiffs also claimed Copilot generates their code in output code that produces errors due to a mismatch between the directly copied code and the code that would actually fit the prompt. To make this assertion work, plaintiffs distinguish their version of “identicality” –semantically equivalent lines of code–from a reproduction of the whole work. They argue that the defendant’s position, that “the reproduction of short passages that may be part of [a] larger work, rather than the reproduction of an entire work, is insufficient to violate Section 1202,” would lead to absurd results. “By OpenAI’s logic, a party could copy and distribute a fragment of a copyrighted work—say, a chapter of a book, a stanza of a poem, or a scene from a movie—and face no repercussions for infringement.” 

 In their reply, the defendants countered that §1202, which defines CMI as relating to a “copy of a work,” requires a complete and identical copy, not just snippets. Defendants noted that the plaintiffs have conceded that Copilot reproduces only snippets of code rather than complete versions of the code. Therefore, the defendants argue, Copilot does not create “identical copies” of the plaintiffs’ complete copyrighted works. The argument is based on both the text of the statute (they note that the statute only provides for liability when distributing copies that CMI has been stripped from, not derivatives, abridgments, or other adaptations), and they bolster those arguments by suggesting that allowing 1202 claims for incomplete copies would create chaos for ordinary uses of copyrighted works: “On Plaintiffs’ reading of § 1202, if someone opened an anthology of poetry and typed up a modified version of a single “stanza of a poem,” . . . without including the anthology’s copyright page, a § 1202(b) claim would lie. Plaintiffs’ reading effectively concedes that they are attempting to turn every garden-variety claim of copyright infringement into a DMCA claim, only without the usual limitations and defenses applicable under copyright law. Congress intended no such thing.” 

The GitHub court has addressed the issue now several times: it initially dismissed the plaintiffs’ §1202(b)(1) and (b)(3) claims, subsequently denied the plaintiffs’ motion for reconsideration of the claims, allowed the plaintiffs to amend their complaint and try again with more specificity, then dismissed the claims again. The reasoning of the court has been consistent, and largely focused on insufficient allegations of identicality. The court agreed with Defendants that the identicality requirement should apply and that the snippets do not satisfy the requirement. Following the dismissal, the plaintiffs sought and received permission from the district court to file an interlocutory appeal (an appeal on a specific issue before the case is fully resolved– something not usually allowed) to the Court of Appeals for the Ninth Circuit to determine whether § 202(b)(1) and (b)(3) impose an identicality requirement. The Ninth Circuit is presently considering whether to hear the appeal.

What would the Ninth Circuit assess in the appeal, and what are the implications of the appeal for future lawsuits?

If the appeal is accepted, the Ninth Circuit will determine whether §1202(b)(1) and (b)(3) actually impose an identicality requirement. Moreover, with regard to the facts of the Github case, the court will decide whether the identicality requirement requires exact copying of a complete copyrighted work, or perhaps something less. The Ninth Circuit’s hearing of this appeal would be notable for a number of reasons.

First, as mentioned above, §1202(b) is largely unaddressed by the circuit courts, and explicit appellate guidance has only been provided for the knowledge requirement referenced above. Consequently, determinations of §1202(b) claims are largely informed by varying district court decisions that are binding only on the parties to the suits and provide inconsistent interpretations of the requirements for a claim under the provision. An appellate ruling that accepts or rejects the identicality requirement would create additional binding authority to further clarify courts’ interpretations of §1202(b).

Second, a ruling on the identicality requirement from the Ninth Circuit specifically would be notable because it would be binding on the large number of §1202(b) claims presently being litigated in the Ninth Circuit’s lower courts. And, given the centrality of AI developers operating in California and elsewhere in the Ninth Circuit, the outcome of the appeal would significantly impact future lawsuits that involve §1202(b) claims.

It is hard to predict how the Ninth Circuit might rule, but we can work through some of the implications of the choices the court would have before it: 

If the Ninth Circuit interprets the identicality requirement as requiring a complete and exact copy, it would impose a high standard for the requirement and plaintiffs would likely be constrained in their ability to bring §1202(b) claims. If the court did this, the Github plaintiffs’ claims would likely fail as the alleged copied snippets of code generated by Copilot are not exact copies and do not comprise the complete copyrighted works. This hypothetical standard would be advantageous for individuals who remove CMI from copyrighted works in the course of processing them using AI as well as those who deploy AI systems that produce small portions of content similar (but not exactly so) to inputs.  So long as the works being processed or distributed are not complete exact copies, individuals would be free to alter the CMI of the works for ease in analyzing the copyrighted information. 

Alternatively, the Ninth Circuit could adopt a loose interpretation of identicality in which incomplete and inexact copying would be sufficient. One approach would be to require identicality but not copying of the entire work (something the plaintiffs in the Github suit advocate for). How the parties or the Ninth Circuit would formulate what standard would apply to this “less than entire” but still “near identical” standard is hard to say, but presumably, plaintiffs would have an easier time alleging facts sufficient for a §1202(b) claim. Applied to Github, it still seems unclear that the copied snippets of the plaintiffs’ code in the Copilot outputs could pass muster (this is likely a factual question to be determined at later stages of the litigation). But it could allow claims to at least survive an early motion to dismiss. As such, the adoption of this standard could limit how AI developers engage with works but also potentially affect others, such as researchers using similar techniques to process, clean, and distribute small portions of copyrighted works as part of a dataset.

Finally, the Ninth Circuit may decide to do away with the identicality requirement altogether. While this may seem like a potential boon to plaintiffs, who could allege that removal of CMI and distribution of some copied material, no matter how small, plaintiffs would still face substantial challenges.  Elimination of the identicality requirement would likely lead to greater weight being placed on the knowledge requirement in courts’ assessments of §1202(b) claims, which requires that defendants know or have reasonable grounds to know that their actions will “induce, enable, facilitate, or conceal an infringement.” In the context of the Github case, even without an identicality requirement, plaintiffs §1202(b) claims contain scant factual allegations about the defendants’ CMI removal and knowledge in the court filings to date. For other developers and users of AI, the effects of not having an identicality requirement would likely vary on a case-by-case basis. 

Conclusion

Recent copyright infringement suits and the pending appeal to the Ninth Circuit in Doe 1 v. Github demonstrate that §1202(b) is having its day in the sun. Although the provision has been overlooked and infrequently litigated in the past, the scope of protections granted by §1202(b) is important for understanding whether and how AI developers can remove CMI when using copyrighted works to process, restructure, and analyze copyrighted works for AI development. Thus, as lawsuits against AI developers and users continue to progress, the requirements to have a valid §1202(b) claim are sure to become even more contentious.

Text Data Mining Research DMCA Exemption Renewed and Expanded

Posted October 25, 2024
U.S. Copyright Office 1201 Rulemaking Process, taken from https://www.copyright.gov/1201/

Earlier today, the Library of Congress, following recommendations from the U.S. Copyright Office, released its final rule adopting exemptions to the Digital Millenium Copyright Act’s prohibition on circumvention of technological protection measures (e.g.,  DRM).  

As many of you know, we’ve been working closely with members of the text and data-mining community as well as our co-petitioners, the Library Copyright Alliance (LCA) and the American Association of University Professors (AAUP), to petition for renewal of the existing TDM research exemption and to expand it to allow researchers to share their research corpora with other researchers outside of their university (something not previously allowed). The process began over a year ago and followed an in-depth review process by the U.S. Copyright Office, and we’re incredibly grateful for the expert legal representation before the Office over this past year by UC Berkeley Law’s Samuelson Law, Technology & Public Policy Clinic, and in particular clinic faculty Erik Stallman, Jennifer Urban and Berkeley Law students Christian Howard-Sukhil, Zhudi Huang, and Matthew Cha.

We are very pleased to see that the Librarian of Congress both approved the renewal of the existing exemption and approved an expansion that allows for research universities to provide access to TDM corpora for use by researchers at other universities. 

The expanded rule is poised to make an immediate impact in helping the TDM researchers collaborate and build upon each other’s work. As Allison Cooper,  director of Kinolab and Associate Professor of Romance Languages and Literatures and Cinema Studies at Bowdoin College, explains:

“This decision will have an immediate impact on the ongoing close-up project that Joel Burges, Emily Sherwood, and I are working on by allowing us to collaborate with researchers like David Bamman, whose expertise in machine learning will be valuable in answering many of the ‘big picture’ questions about the close-up that have come up in our work so far.”

These are the main takeaways from the new rule: 

  • The exemption has been expanded to allow “access” to corpora by researchers at other institutions “solely for purposes of text and data mining research or teaching.” There is no more requirement that access be granted as part of a “collaboration,” so new researchers can ask new and different questions of a corpus. Access must be credentialed and authenticated.
  • The issue of whether a researcher can engage in “close viewing” of a copyrighted work has been resolved—as the explanation for the revised rule puts it, researchers can “view the contents of copyrighted works as part of their research, provided that any viewing that takes place is in furtherance of research objectives (e.g., processing or annotating works to prepare them for analysis) and not for the works’ expressive value.” This is a very helpful clarification!
  • The new rule also modified the existing security requirements, which provide that researchers must put in place adequate security protocols to protect TDM corpora from unauthorized reuse and must share information about those security protocols with rightsholders upon request. That rule has been limited in some ways and expanded in others. The new rule clarifies that trade associations can send inquiries on behalf of rightsholders. However, inquiries must be supported by a “reasonable belief” that the sender’s works are in a corpus being used for TDM research.

Later on, we will post a more in-depth analysis of the new rules–both TDM and others that apply to authors. The Librarian of Congress also authorized the renewal of a number of other rules that support research, teaching, and library preservation. Among them is a renewal of another exemption that Authors Alliance and AAUP petitioned for, allowing for the circumvention of digital locks when using motion picture excerpts in multi-media ebooks. 

Thank you to all of the many, many TDM researchers and librarians we’ve worked with over the last several years to help support this petition. 

You can learn more about TDM and our work on this issue through our TDM resources page, here.