Category Archives: News

Independent Publisher’s Lawsuit Against Audible Fails, Highlighting Challenges to Receive Fair Streaming Compensation

Posted February 21, 2025
Adobe Stock Image

Last November, we covered a case where a group of authors complained about McGraw Hill’s interpretation of publishing agreements related to compensation for ebooks. As subscription-based models become increasingly dominant in the publishing industry, authors must be vigilant about how their contracts define compensation. Platforms like Kindle Unlimited, Audible, and academic ebook services are reshaping traditional royalty structures. This is not just a concern for trade books; academic publishing is also shifting towards subscription-based access, as evidenced by ProQuest’s recent announcement that it is ending print sales and moving toward a “Netflix for books” model. 

Here we see yet another case where ambiguous contractual terms resulted in financial loss for an author— 

On Feb. 19th, the Second Circuit affirmed the lower court’s dismissal of Teri Woods Publishing’s copyright infringement and breach-of-contract claims against Audible and other audiobook distributors in Teri Woods Publ’g, LLC v. Amazon.com, Inc. The Plaintiff initially granted the rights (that are the subject of this dispute) to Urban Audios in a licensing agreement. Thereafter, Urban Audio granted the rights under that agreement to Blackstone, which then sublicensed its rights to Amazon and Audible.

The Plaintiff in this case, Teri Woods Publishing, is an independent publisher founded by urban fiction author Teri Woods. The Plaintiff argued—and the courts ultimately disagreed—that the licensing agreement did not unambiguously permit Defendants to distribute Teri Woods’ audiobooks through the Defendants’ online audiobook streaming subscription services. More specifically, on the question of compensation for online streaming, Plaintiff and Defendants disagreed on whether (1) online streaming counted as “internet downloads” or alternatively “other contrivances, appliances, mediums and means,” and (2) the licensing terms dealing with royalties prohibit subscription streaming.

The licensing terms in question are contained in the licensing agreement Plaintiff entered into in 2018, granting Urban Audios the 

“exclusive unabridged audio publishing rights, to manufacture, market, sell and distribute copies throughout the World, and in all markets, copies of unabridged readings of the [Licensed Works] on cassette, CD, MP3-CD, pre-loaded devices, as Internet downloads and on, and in, other contrivances, appliances, mediums and means (now known and hereafter developed) which are capable of emitting sounds derived for the recording of audiobooks.”

In exchange of this assignment of rights, Urban Audio—as the Licensee—must pay Plaintiff: 

“(a) Ten percent (10%) of Licensee’s net receipts from catalog, wholesale and other retail sales and rentals of the audio recordings of said literary work; 

(b) Twenty Five percent (25%) of net receipts on all internet downloads of said literary work. 

(c) Twenty Five percent (25%) of net receipts on Playaway format [under certain conditions].”

In case you are not familiar with the services Amazon Audible provides: members of Audible generally pay a monthly fee to digitally stream or download audiobooks, instead of making any specific payment for the specific audiobooks they are streaming or downloading. This method of distribution, the Plaintiff argues, led to drastically lower compensation than expected, as the audiobooks were made available to subscribers at a fraction of their retail price. 

Audible has a history of relying on ambiguous contractual terms to reduce author payouts. The “Audiblegate” controversy, for instance, exposed how Audible’s return policy allowed listeners to return audiobooks after extensive use, deducting royalties from authors without transparency. That practice came under legal scrutiny inn Golden Unicorn Enters. v. Audible Inc., where authors alleged that Audible deliberately structured its payment model to significantly reduce their earnings (unfortunately, the court in that case also largely sided with Audible)

Despite Audible’s track record, the courts were unsympathetic to Plaintiff’s grievance in the Teri Woods case, and held that the plain meaning of the phrase “other contrivances, appliances, mediums and means (now known and hereafter developed)” in the licensing agreement included digital streams and other future technological developments in distribution services. The courts also observed that the underlying licensing agreement did not provide for the payment of royalties on a per-unit basis; Plaintiff was only entitled to a percentage of “net receipts” received by Urban Audio for sales, rentals, and internet downloads. 

The ambiguity in defining what constitutes an “internet download,” and whether payment was due on a per unit basis, ultimately were interpreted in favor of Audible. This case serves to remind us again of the importance of adopting clear contractual language. 

Licensing agreements should be drafted with clear and precise language regarding revenue models and payment structures. Subscription-based compensation models, like those employed by Audible, fundamentally differ from traditional sales models, often leading to lower per-unit earnings for authors. By failing to anticipate and address these nuances, authors risk losing control over how their works are monetized. Ensuring that rights, distribution methods, and payment structures are clearly defined can prevent disputes and financial losses down the line.

Many authors assume that digital rights are similar to traditional print rights, but as this case demonstrates, vague phrasing can allow distributors to exploit gaps in understanding. If authors do not explicitly outline limitations on emerging distribution technologies, they may find themselves receiving significantly less compensation than they anticipate when signing the agreement. For example, authors should ensure their contracts specify whether subscription-based revenue falls under traditional royalty calculations, and whether distribution via new technological formats require renegotiation. Beyond the issues with ambiguous contractual terms, this case also highlights the broader issue of how digital platforms can negatively impact readers and authors alike. Readers no longer own the books they purchase; instead, they receive licensed access that can be revoked or restricted at any time. This shift undermines the traditional relationship between books and their readers. Authors are equally threatened by these digital intermediaries, who have the power to dictate distribution methods and unilaterally alter revenue models; an author’s right to fair compensation is too often sacrificed along the way. The situation is especially dire with audiobooks, where Audible dominates the market.

Copyrightability and Artificial Intelligence: A new report from the U.S. Copyright Office

Posted February 20, 2025
Uncopyrightable image generated using Google Gemini, illustrating a group of photographers excited to learn that their nearly identical photos of the public domain Washington Monument are all copyrightable) (“The Office receives ten applications, one from each member of a local photography club. All of the photographs depict the Washington Monument and all of them were taken on the same afternoon. Although some of the photographs are remarkably similar in perspective, the registration specialist will register all of the claims.”) (Compendium of Copyright Office Practices, Section 909.1)

Recently, the United States Copyright Office published its Report on Copyright and Artificial Intelligence, Part 2: Copyrightability,  the second report in a three-part series. The Office’s reports and additional related resources can be found on the USCO’s Copyright and Artificial Intelligence webpage.

This latest report was the product of longstanding Copyright Office practices, the USCO’s evolving work and registration guidance in this area, rapid technological developments related to Artificial Intelligence, and over 10,000 reply comments to the Office’s August 2023 Notice of Inquiry. Among those commenters, the Authors Alliance submitted both an initial comment and a reply comment in late 2023.  

In our comments, we urged the Copyright Office to not pursue revisions to the Copyright Act at this time and instead work towards providing greater clarity for authors of AI-generated and AI-assisted works (“Instead of proposing revisions to the Copyright Act to enshrine the human authorship requirement in law or clarify the human authorship requirement in the context of AI-generated works, the Office should continue to promulgate guidance for would-be registrants.”) We also noted that, as technology evolves in the coming years, our ideas about the copyrightability of AI-generated and AI-assisted works will likely shift as well.    

We are happy to see that the USCO heard our voice and that of many others regarding no need for legislative change at this time (“The vast majority of commenters agreed that existing law is adequate in this area…”) (Report, page ii). We likewise continue to be aligned with the USCO’s view that works wholly generated by Artificial Intelligence are not copyrightable. In reading through the entirety of the report, it is clear that the Office appreciates that some elements of AI-assisted works will be copyrightable, but believes that the level of human control over the AI output will be central to the copyrightability inquiry (“Whether human contributions to AI-generated outputs are sufficient to constitute authorship must be analyzed on a case-by-case basis.”) (“Based on the functioning of current generally available technology, prompts do not alone provide sufficient control.”) (Report, page iii)

The Office’s report does provide some useful clarity. At the same time, it takes some positions that fail to adequately address the complexity of AI-generated works. Below, we will unpack a number of elements of the report that are noteworthy.  

Modifying or arranging AI-generated content

The report makes it clear that the USCO views selection and arrangement of AI-generated work as a viable path towards copyrightability of works where AI was an element in the creation of the work. In 2023, when reviewing the graphic novel Zarya of the Dawn, “the Office concluded that a graphic novel comprised of human-authored text combined with images generated by the AI service Midjourney constituted a copyrightable work, but that the individual images themselves could not be protected by copyright.” (Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence, page 2) Thus, authors who incorporate AI-generated work into a larger work will often be successful in registering the whole work, but will typically need to disclaim any AI-generated elements.  

Alternatively, an author who modifies an AI-generated work outside of the AI environment (e.g., an artist who uses Photoshop to make substantial modifications to an AI-generated image), will usually have a path to copyright registration with the USCO. 

The USCO takes the position that most AI-assisted works are not copyrightable

Unlike an AI-generated image later modified manually by a human (which may be copyrightable), when prompt-based modifications to AI generated works are performed entirely within the AI environment, it is clear that the USCO is reluctant to view the resulting work as copyrightable. 

Here, the Office’s position regarding Jason Allen’s attempts to register copyright in the two dimensional artwork Théâtre D’opéra Spatial is illuminating. In developing the image using Midjourney, Allen claimed to have used over 600 text prompts to both generate and alter the image, and further used Photoshop to “beautify and adjust various cosmetic details/flaws/artifacts, etc.,” a process which he viewed as copyrightable authorship.  In denying his claim, the Office responded that “when an AI technology receives solely a prompt from a human and produces complex written, visual, or musical works in response, the ‘traditional elements of authorship’ are determined and executed by the technology—not the human user.” (88 FR 16190 – Copyright Registration Guidance: Works Containing Material Generated by Artificial Intelligence, page 16192). 

The USCO dismisses the idea that the process of revising prompts to modify AI output is sufficient to claim copyright in the resulting work. (“Inputting a revised prompt does not appear to be materially different in operation from inputting a single prompt. By revising and submitting prompts multiple times, the user is “re-rolling” the dice, causing the system to generate more outputs from which to select, but not altering the degree of control over the process. No matter how many times a prompt is revised and resubmitted, the final output reflects the user’s acceptance of the AI system’s interpretation, rather than authorship of the expression it contains.”) (Report, page 20) (emphasis added).

Within the report, there is no direct examination of the Théâtre D’opéra Spatial copyright claim and lessons to be learned from it. This is likely due to ongoing litigation between Allen and the USCO. While the USCO has significant practical influence on what materials are protectable under copyright, ultimately the decision falls to the courts. So, this suit and others like it will be important to watch.  Still, the lack of a deeper dive into such a real-world example is unfortunate—such examples offer fertile territory for exploring the boundary lines between copyrightable AI-assisted works and those that will remain uncopyrightable.  

The report offers a sense of possibility with regard to copyrightable AI-assisted works

Towards the end of its report, the USCO briefly explores AI platforms that allow for greater control of the final work. Interestingly, they point to specific features of Midjourney, which allows users to select and modify specific regions of an image. The Office views this as meaningfully different from modifying an AI-generated work through prompts alone, but takes no position as to whether that level of control will result in copyrightable works ( “Whether such modifications rise to the minimum standard of originality required under Feist will depend on a case-by-case determination. In those cases where they do, the output should be copyrightable.”) (Report, page 27).   

Unanswered Questions

Despite the complexity of these issues, the Office has been able to draw some bright lines (e.g., see this webinar on Registration Guidance for Works Containing AI-generated Content). 

Yet, the Office also acknowledges that there are remaining unanswered questions (“So I know that everyone in their particular area of creativity is looking for, you know, more examples and brighter lines. And I think at this point in time, we’re going to be learning as everyone else is learning…we will be providing more guidance as we learn more.”) (Webinar Transcript, Robert Kasunic, page 10) This recognition that the USCO, like everyone, is still learning is refreshing and welcome, given that it’s fairly easy to see that there are murky waters all around. AI-generated works are already frequently a complex hybrid of AI expression and human expression. 

What are some of these questions? 

  1. The technology is still developing and it seems likely that the legal complexity will become even more pronounced as sophisticated generative AI evolves to respond to fine-grained feedback from users, while also offering expression and suggestions that many users will ultimately adopt. Navigating this complexity will be challenging and will require answering a fundamental question: what is the threshold level of human control over AI-generated expression that is necessary as a prerequisite for copyright protection?  
  1. Similarly, what standards might the Copyright Office or the courts develop to prove sufficient human authorship when it is intermingled with AI-generated content? The copyright registration process currently requires very little information and no documentation related to this question. For now, creators don’t have clear guidance on what types of documentation will be most effective if a future dispute arises. 
  1. To the extent that protection does exist in human-guided, but AI-produced content, how will or should the courts determine what are uncopyrightable, AI-generated elements in what will appear to users as a single unified work? Separating human expression that is enmeshed and embedded within uncopyrightable AI expression will require some framework for distinguishing the two in cases of infringement. Although the courts have already developed methods that may shape this (selection, filtration, abstraction, for example) it remains far from clear whether such tests will perform adequately for AI-produced content

We will be watching developments in this space closely and will continue to advocate for reasonable and flexible approaches to copyrightability that align with the practical realities of authorship in an emerging technological landscape.  

On The New NIH Indirect Cost Guidance

Posted February 18, 2025
Photo of an emergency room with multiple "emergency" signs; red color enhanced.
NIH cuts are an emergency for hospitals (Photo: Eric Harbeson, CC-BY)

A little over a week ago, the National Institutes of Health issued a new guidance policy on indirect costs in Federal grant awards. Presently, NIH negotiates the indirect cost rate with individual institutions through a carefully regulated process that ensures an appropriate rate for a given institution’s unique circumstances, while also providing robust safeguards and auditing requirements to ensure that the rate is no greater than necessary. The new policy—similar to what the previous Trump administration proposed in 2017—would replace the negotiated rates with a standard rate of 15%. For comparison, the average rate among grantee institutions is around 27%, and many of the top research institutions currently have negotiated rates exceeding 50% or even 60%, which amount to tens of millions of dollars in some cases. The rate cap would apply both prospectively to new grants, as well as to all in-progress grants.

Indirect costs are the institutional expenditures that cannot be attributed to a particular research project. These are the costs of keeping the lights on, and the lab clean, and the MRI machine running. They pay for biocontainment labs, or clinical testing facilities, or computer systems to analyze data, facilities each of which might be shared by multiple NIH-funded projects. Though indirect, they are significant costs incurred by the institution and are an unavoidable part of conducting grant-funded research. From a government efficiency standpoint they are also highly desirable, in that they reduce unnecessary redundancy as well as exceedingly time-consuming and expensive bookkeeping.

Support for indirect costs in grant funds is essential to institutions’ ability to take part in Federal grant-making. If the new guidance policy is allowed to stand, universities collectively expect to lose many hundreds of millions of dollars from the move, losses which in turn will lead to decreases in important, sometimes life-saving research. This new policy has raised serious concerns among affected institutions. 

To say things are moving quickly in Washington, these days, would be an understatement. The administration has, of course, been releasing a flurry of sometimes sweeping executive orders. The pace is dizzying. In this case, in the space of just four days—two of which were a weekend—the NIH issued its guidance; at least three different lawsuits were filed, each in the District of Massachusetts; and a judge entered a temporary restraining order on the guidance. A hearing on the restraining order is scheduled for February 21 (the cases have not yet been consolidated, though they almost certainly will be if they proceed).

In our view, there are multiple clear violations of law in the guidance, both of statute and of the Constitution. While we await the hearing, we thought it worthwhile to highlight to authors some of legal challenges it will face. Many others have already written on this topic—for more responses to the guidance policy, we recommend COGR’s collection of responses from the grantee community, as well as this post by Holden Thorpe in Science and this post from Lisa Janicke Hinchliffe in Scholarly Kitchen (which draws important connections to scholarly publishing). 

Some Fact-Checking

At the outset, an examination of the issuing guidance reveals holes in the chain of authority that anticipate problems with the new order. For example, the guidance asserts that “NIH may, however, use ‘a rate different from the negotiated rate for either a class of Federal awards or a single Federal award.’ 45 C.F.R. 75.414(c)(1).” The citation at the end refers to Title 45, part 75 of the Code of Federal Regulations, where NIH’s parent agency, the Department of Health and Human Services (HHS), codifies its grant guidelines. Here is the entire paragraph:

“Negotiated indirect cost rates must be accepted by all Federal agencies. A Federal agency may use a rate different from the negotiated rate for either a class of Federal awards or a single Federal award only when required by Federal statute or regulation, or when approved by the awarding Federal agency in accordance with paragraph (c)(3) of this section.” 45 C.F.R. § 75.414(c)(1) (emphasis added).

Note that this paragraph doesn’t say NIH generally may use a different rate, as the guidance appears to claim. Rather, it states the exception—they may not do so unless they are required to by statute or another regulation. Alternatively, under paragraph (c)(3) of the regulation, NIH must “implement, and make publicly available, the policies, procedures and general decision making criteria that their programs will follow to seek and justify deviations from negotiated rates.” (emphasis added). The paragraph doesn’t give NIH general permission, it constrains them.

The notice’s very next sentence provides arguably its most egregious claim, namely that the cap may be applied retroactively to existing grants, in defiance of institutions’ reliance on their contractually negotiated rates. The notice states that “​​NIH may deviate from the negotiated rate both for future grant awards and, in the case of grants to institutions of higher education (‘IHEs’), for existing grant awards. See 45 CFR Appendix III to Part 75, § C.7.a; see 45 C.F.R. 75.414(c)(1).” The citation, to Appendix III of Part 75, purports to support the claim that NIH may unilaterally, and retroactively, alter the terms of a contract. Here is the cited paragraph, in its entirety: 

“Except as provided in paragraph (c)(1) of § 200.414, Federal agencies must use the negotiated rates in effect at the time of the initial award throughout the life of the Federal award. Award levels for Federal awards may not be adjusted in future years as a result of changes in negotiated rates. “Negotiated rates” per the rate agreement include final, fixed, and predetermined rates and exclude provisional rates. “Life” for the purpose of this subsection means each competitive segment of a project. A competitive segment is a period of years approved by the Federal awarding agency at the time of the Federal award. If negotiated rate agreements do not extend through the life of the Federal award at the time of the initial award, then the negotiated rate for the last year of the Federal award must be extended through the end of the life of the Federal award.” (emphasis added)

Once again, the cited text not only does not support the claim, but if anything forecloses it. This paragraph does not purport to give permission to change an existing agreement. To the contrary, the paragraph requires NIH to respect the negotiated rate for the life of the award. (Sec. 200.414(c)(1), referenced in the appendix, points to the OMB Uniform Guidance, and is essentially the same as HHS’s Sec. 75.414(a), which is discussed above). 

The end result is that the notice rests its legal authority to carry out the policy on regulations that in fact work against the new policy. Not a great start.

Violation of law and policy

Though federal agencies are ultimately under the direction of the President, this does not give the executive branch unfettered authority to dictate an agency’s policies. Agencies act as agents for carrying out the laws passed by Congress. This means that Congress has the last word as to what an agency is authorized to do or not do, or must do or not do. In fact, every act of an agency must, in some way, be tied to an act of Congress (admittedly, the connection is often fairly loose).

Congress has actually prohibited the president—this president—from capping the negotiated indirect cost rates. In 2017, when the president pressed Congress to limit indirect costs to 10% of the grant award, Congress not only rejected the idea, but in Sec. 226 of the Consolidated Appropriations Act of 2018 (p.394) they forbade the president from pursuing the policy. Under Sec. 226 rider, Congress provided that the existing regulations pertaining to indirect costs are to continue, and that the department may not expend funds in pursuing a policy to the contrary. The rider has persisted in every appropriations bill since, including the most recent one.

The policy also is contrary to HHS’s own regulations that govern new policies such as this one. The notice purports to “implement, and make publicly available, the policies, procedures and general decision making criteria” as required by 45 CFR 75.414(c)(3) (discussed above), but in fact it only satisfies one of the three requirements. The notice publishes a policy (the 15% rate cap), but it does not make the procedures or general criteria available as required by the regulation. And publication must occur prior to the policies’ effective date, not simultaneously with it.

Under the Administrative Procedure Act (APA), in place since 1946, Congress has established the courts’ jurisdiction to review agency actions, such as this one, and to “decide all relevant questions of law.” The courts are empowered to set aside agency actions that are not in accordance with law, whether because they are contrary to the agency’s own regulations, acts of Congress, or the Constitution. As the three complaints observe, Congress has forbidden NIH from changing the system of negotiated indirect costs, and the new policy is also in violation of the agency’s own regulations.

Constitutional violations

The Constitution also has something to say about the guidance. In addition to the separation of powers problems, related to Congress’s actions discussed above, the retroactive nature of the guidance raises problems under the Fifth Amendment’s due process and takings clauses. These problems arise because the guidance professes to alter the indirect costs for existing grants, effectively unilaterally rewriting the grant agreements without regard to the institutions’ justified reliance on the binding nature of the agreements.

Contracts are a form of property, and contracts are binding on the U.S. Government to the same extent that they are on private parties. Though grant agreements are not formally contracts per se, the Supreme Court has observed that legislation enacted under the Spending Clause, as all grants are, is “much in the nature of a contract.”  The grant agreements bind the grantee institution to numerous terms and conditions (some of which could be said to be consideration for the award) in return for federal financial support for the project. The grant agreements are clearly binding on both parties, and renegotiation of a contract requires consent of both parties.

States, for their part, are explicitly forbidden from legislating their way out of contractual agreements, such as the NIH purports to do, under the Constitution’s Contracts Clause, but that clause does not apply to the Federal government. Still, the Federal government is prohibited, under the Fifth Amendment, from taking private property (and again, contracts are property) for public use without just compensation, and from depriving a party of property (for any purpose) “without due process of law.” Grantee institutions rely on the government’s promise to follow through on the agreed upon, negotiated indirect cost rate, and that reliance interest is in some cases hundreds of millions of dollars. NIH’s implementing this new policy, and with no notice (much less a hearing or opportunity to comment as the APA would require) sounds a lot like deprivation of property without due process.

Conclusion

NIH-funded research has produced an astonishing amount of highly significant, impactful research, and its role in the biomedical research ecosystem is pivotal. The authors NIH has funded have won every major prize in the field many times over, and their research has saved and improved countless lives. But NIH’s track record is only as strong as its grantees—authors who do the research and the institutions that employ them. If NIH is permitted to recklessly cut its promised support to those grantees, the inevitable resulting loss of research will be a great detriment to the scientific community, both home and abroad, and to Americans in general.

Thomson Reuters v. Ross: The First AI Fair Use Ruling Fails to Persuade

Posted February 13, 2025
A confused judge, generated by Gemini AI

Facts of the Case

On February 11, Third Circuit Judge Stephanos Bibas (sitting by designation for the U.S.  District Court of Delaware) issued a new summary judgment ruling in Thomson Reuters v. ROSS Intelligence. He overruled his previous decision from 2023 which held that a jury must decide the fair use question. The decision was one of the first to address fair use in the context of AI, though the facts of this case differ significantly from the many other pending AI copyright suits. 

This ruling focuses on copyright infringement claims brought by Thomson Reuters (TR), the owner of Westlaw, a major legal research platform, against ROSS Intelligence. TR alleged that ROSS improperly used Westlaw’s headnotes and the Key Number System to train its AI system to better match legal questions with relevant case law. 

Westlaw’s headnotes summarize legal principles extracted from judicial opinions. (Note: Judicial opinions are not copyrightable in the US.) The Key Number System is a numerical taxonomy categorizing legal topics and cases. Clicking on a headnote takes users to the corresponding passage in the judicial text. Clicking on the key number associated with a headnote takes users to a list of cases that make the same legal point. 

Importantly, ROSS did not directly ingest the headnotes and the Key Number System to train its model. Instead, ROSS hired LegalEase, a company that provides legal research and writing services, to create training data based on the headnotes and the Key Number System. LegalEase created Bulk Memos—a collection of legal questions paired with four to six possible answers. LegalEase instructed lawyers to use Westlaw headnotes as a reference to formulate the questions in Bulk Memos. LegalEase instructed the lawyers not to copy the headnotes directly. 

ROSS attempted to license the necessary content directly from TR, but TR refused to grant a license because it thought the AI tool contemplated by ROSS would compete with Westlaw.

The financial burden of defending this lawsuit has caused ROSS to shut down its operations. ROSS has countered TR’s copyright infringement claims with antitrust claims but the claims were dismissed by the same Judge. 

The New Ruling

The court found that ROSS copied 2,243 headnotes from Westlaw. The court ruled that these headnotes and the Key Number System met the low legal threshold for originality and were copyrightable. The court rejected the merger and scenes à faire defense by ROSS, because, according to the court, the headnotes and the Key Number System were not dictated by necessity. The court also rejected ROSS’s fair use defense on the grounds that the 1st and 4th factors weighed in favor of TR. At this point, the only remaining issue for trial is whether some headnotes’ copyrights had expired or were untimely registered.

The new ruling has drawn mixed reactions—some saying it undermines potential fair use defenses in other AI cases, while others dismiss its significance since its facts are unique. In our view, the opinion is poorly reasoned and disregards well-established case law. Future AI cases must demonstrate why the ROSS Court’s approach is unpersuasive. Here are three key flaws we see in the ruling.   

Problems with the Opinion

  1. Near-Verbatim Summaries are “Original”?

“A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. … A headnote is a short, key point of law chiseled out of a lengthy judicial opinion.” 

— the ROSS court

(↑example of a headnote and the uncopyrightable judicial text the headnote was based on↑)

The court claims that the Westlaw headnotes are original both individually and as a compilation, and the Key Number System is original and protected as a compilation. 

“Original” has a special meaning in US copyright law: It means that a work has a modicum of human creativity that our society would want to protect and encourage. Based on the evidence that survived redaction, it is near impossible to find creativity in any individual headnotes. The headnotes consist of verbatim copying of uncopyrightable judicial texts, along with some basic paraphrasing of facts. 

As we know, facts are not copyrightable, but expressions of facts often are. One important safeguard for protecting our freedom to reference facts is the merger doctrine. US law has long recognized that when there are only limited ways to express a fact or an idea, those expressions are not considered “original.” The expressions “merge” with the underlying unprotectable fact, and become unprotectable themselves. 

Judge Bibas gets merger wrong—he claims merger does not apply here because “there are many ways to express points of law from judicial opinions.” This view misunderstands the merger doctrine. It is the nature of human language to be capable of conveying the same thing in many different ways, as long as you are willing to do some verbal acrobatics. But when there are only a limited number of reasonable, natural ways to express a fact or idea—especially when textual precision and terms of art are used to convey complex ideas—merger applies. 

There are many good reasons for this to be the law. For one, this is how we avoid giving copyright protection to concise expression of ideas. Fundamentally, we do not need to use copyright to incentivize the simple restatement of facts. As the Constitution intended, copyright law is designed to encourage creativity, not to grant exclusive rights to basic expressions of facts. We want people to state facts accurately and concisely. If we allowed the first person to describe a judicial text in a natural, succinct way to claim exclusive rights over that expression, it would hinder, rather than facilitate, meaningful discussion of said text, and stifle blog posts like this one. 

As to the selection and arrangement of the Key Number System, the court claims that originality exists here, too, because “there are many possible, logical ways to organize legal topics by level of granularity,” and TR exercised some judgment in choosing the particular “level” with its Key Number System. However, the cases are tagged with Key Number System by an automated computer system, and the topics closely mirror what law schools teach their first-year students. 

The court does not say much about why the compilation of the headnotes should receive separate copyright protection, other than that it qualifies as original “factual compilations.” This claim is dubious because the compilation is of uncopyrightable materials, as discussed, and the selection is driven by the necessity to represent facts and law, not by creativity. Even if the compilation of headnotes is indeed copyrightable, using portions of it that are uncopyrightable is decidedly not an infringement, because the US does not protect sui generis database rights.

  1. Can’t Claim Fair Use When Nobody Saw a Copy?

 “[The intermediate-copying cases] are all about copying computer code. This case is not.” 

— the ROSS court conveniently ignoring Bellsouth Advertising & Publishing Corp. v. Donnelley Information Publishing, Inc., 933 F.2d 952 (11th Cir. 1991) and Sundeman v. Seajay Society, Inc., 142 F. 3d 194 (4th Cir. 1998).

In deciding whether ROSS’s use of Westlaw’s headnotes and the Key Number System is transformative under the 1st factor, the court took a moment to consider whether the available intermediate copying case law is in favor of ROSS, and quickly decided against it. 

Even though no consumer ever saw the headnotes or the Key Number System in the AI products offered by ROSS, the court claims that the copying of these constitutes copyright infringement because there existed an intermediate copy that contained copyright-restricted materials authored by Westlaw. And, according to the court, intermediate copying can only weigh in favor of fair use for computer codes.

Before turning to the actual case law the court is overlooking here, we wonder if Judge Bibas is in fact unpersuaded by his own argument: under the 3rd fair use factor, he admits that only the content made accessible to the public should be taken into consideration when deciding what amount is taken from a copyrighted work compared to the copyrighted work as a whole, which is contrary to what he argues under the 1st factor—that we must examine non-public intermediate copies. 

Intermediate copying is the process of producing a preliminary, non-public work as an interim step in the creation of a new public-facing work. It is well established under US jurisprudence that any type of copying, whether private or public, satisfies a prima facie copyright infringement claim, but, the fact that a work was never shared publicly—nor intended to be shared publicly—strongly favors fair use. For example, in Bellsouth Advertising & Publishing Corp. v. Donnelley Information Publishing, Inc., the 11th Circuit Court decided that directly copying a competitor’s yellow pages business directory in order to produce a competing yellow pages was fair use when the resulting publicly accessible yellow pages the defendant created did not directly incorporate the plaintiff’s work. Similarly, in Sundeman v. Seajay Society, Inc., the Fourth Circuit concluded that it was fair use when the Seajay Society made an intermediary, entire copy of plaintiffs’ unpublished manuscript for a scholar to study and write about it. The scholar wrote several articles about it mostly summarizing important facts and ideas (while also using short quotations).  

There are many good reasons for allowing intermediate copying. Clearly, we do not want ALL unlicensed copies to be subject to copyright infringement lawsuits, particularly when intermediate copies are made in order to extract unprotectable facts or ideas. More generally, intermediate copying is important to protect because it helps authors and artists create new copyrighted works (e.g., sketching a famous painting to learn a new style, translating a passage to practice your language skills, copying the photo of a politician to create a parody print t-shirt). 

  1. Suddenly, We Have an AI Training Market?

“[I]t does not matter whether Thomson Reuters has used [the headnotes and the Key Number System] to train its own legal search tools; the effect on a potential market for AI training data is enough.”

 — the ROSS court

The 4th fair use factor is very much susceptible to circular reasoning: if a user is making a derivative use of my work, surely that proves a market already exists or will likely develop for that derivative use, and, if a market exists for such a derivative use, then, as the copyright holder, I should have absolute control over such a market.

The ROSS court runs full tilt into this circular trap. In the eyes of the court, ROSS, by virtue of using Westlaw’s data in the context of AI training, has created a legitimate AI training data market that should be rightfully controlled by TR.

Only that our case law suggests the 4th factor “market substitution” considers only markets which are traditional, reasonable or likely to be developed. As we have already pointed out in a previous blog post, copyright holders must offer concrete evidence to prove the existence, or likelihood of developing, licensing market, before they can argue a secondary use serves as “market substitute.” If we allowed a copyright holder’s protected market to include everything that he’s willing to receive licensing fees for, it will all but wipe out fair use in the service of stifling competition. 

Conclusion

The impact of this case is currently limited, both because it is a district court ruling and because it concerns non-generative AI. However, it is important to remain vigilant, as the reasoning put forth by the ROSS court could influence other judges, policymakers, and even the broader public, if left unchallenged.

This ruling combines several problematic arguments that, if accepted more widely, could have significant consequences. First, it blurs the line between fact and expression, suggesting that factual information can become copyrightable simply by being written down by someone in a minimally creative way. Second, it expands copyright enforcement to intermediate copies, meaning that even temporary, non-public use of copyrighted material could be subject to infringement claims. Third, it conjures up a new market for AI training data, regardless of whether such a licensing market is legitimate or even likely to exist.

If these arguments gain traction, they could further entrench the dominance of a few large AI companies. Only major players like Microsoft and Meta will be able to afford AI training licenses, consolidating control over the industry. The AI training licensing terms will be determined solely between big AI companies and big content aggregators, without representation of individual authors or public interest.  The large content aggregators will get to dictate the terms under which creators must surrender rights to their works for AI training, and the AI companies will dictate how their AI models can be used by the general public. 

Without meaningful pushback and policy intervention, smaller organizations and individual creators cannot participate fairly. Let’s not rewrite our copyright laws to entrench this power imbalance even further.

Why Bayh-Dole has nothing to do with public access to articles under the Federal Purpose License

Posted February 4, 2025
On the left a patent showing windmill, on the right a once copyrightable poem about windmill. This is to illustrate the difference between patent and copyright.
This image, along with all its components, is in the Public Domain and free for reuse.

In the course of our work on Federal public access policies and the Nelson Memo, one of the objections I’ve encountered recently is that federal agency initiatives to provide immediate public access to scholarly articles run afoul of the Bayh-Dole Act or may imperil a university’s patent rights to inventions created pursuant to federal funding. Another related objection is that Stanford v. Roche, a case about how a university must go about securing rights in patentable inventions from their faculty under Bayh-Dole, affects how universities obtain sufficient rights to comply with federal public access policies. 

I thought it would be worth explaining why we don’t think these are realistic problems for federal public access law or policy. 

Bayh-Dole does not affect copyright in scholarly articles

The Bayh-Dole Act is an amendment to U.S. patent law passed in 1980 that gives nonprofits and small businesses the right to retain patent rights in inventions developed using federal funding. Before Bayh-Dole, federal grant recipients were required by some federal agencies’ policies to assign patent rights arising from federally funded research to the government. To encourage institutions receiving federal research funding to commercialize inventions for public benefit, Bayh-Dole instead allowed institutions receiving federal grants the right to retain rights to an invention. If a grantee elects to retain title to an invention (rather than commercializing it), they must grant the government a nonexclusive, nontransferable, irrevocable, paid-up license to use the invention. Unreasonable refusal to develop or commercialize may result in the government exercising “march-in” rights to license the invention to others (one of the more controversial parts of the legislation). 

The rights that Bayh-Dole secures for government contractors and grantees apply to “subject inventions.” “Inventions” it defines as “any invention or discovery which is or may be patentable or otherwise protectable under [US patent laws], or any novel variety of plant which is or may be protectable under the Plant Variety Protection Act. . . . .”  In turn, “subject inventions” are “any invention of the contractor conceived or first actually reduced to practice in the performance of work under a funding agreement.”  In other words, “subject inventions” are inventions that were developed within the scope of a Federal grant.

The Nelson Memo also applies to grant outputs, but not inventions; it applies to “peer-reviewed scholarly publications.” Peer-reviewed scholarly publications, of course, are not inventions nor would any rights under patent law apply to them. Scholarly publications are creative works of authorship, reuse of which is governed by copyright law under Title 17 of the United States Code, not covered by Bayh-Dole. It is true that copyrights and patents are sometimes discussed together as “intellectual property,” and courts sometimes even borrow concepts from one body of law to the other. But for the most part, different statutes and different cases govern how rights under each may be created, owned, licensed, and used.

Federal regulations about agency ownership and licensing of patent and copyright rights reflect that they are different. As discussed at length in this paper we published a few months ago (or see the one-page summary), grant-making agencies have for nearly half a century reserved certain rights in copyrighted grant outputs under a provision known as the “Federal Purpose License.” That license, which is codified in 2 C.F.R. § 200.315(b), provides that: 

“To the extent permitted by law, the recipient or subrecipient may copyright any work that is subject to copyright and was developed, or for which ownership was acquired, under a Federal award. The Federal agency reserves a royalty-free, nonexclusive, and irrevocable right to reproduce, publish, or otherwise use the work for Federal purposes and to authorize others to do so. This includes the right to require recipients and subrecipients to make such works available through agency-designated public access repositories.” (emphasis added).

Note that the Federal Purpose License is limited to copyrightable works.  By contrast, in the very next sub-section of the regulation, we see that rights in patents are treated differently:  

“[T]he recipient or subrecipient is subject to applicable regulations governing patents and inventions, including government-wide regulations in 37 CFR part 401 [the implementing regulations for Bayh-Dole].” 2 C.F.R. § 200.315(c)(emphasis added).

It is, of course, possible that in the course of federally funded research, one might produce both a patentable invention that is subject to Bayh-Dole and a copyrighted research article on the same subject. But this does not make Bayh-Dole applicable to the copyright rights in the article, nor does it mean that the Federal Purpose License (a copyright license) affects patent rights under Bayh-Dole regulations. The copyright provisions cover copyrightable works; the patent provisions the patents.

Disclosure of Inventions or Discoveries

If you’ve worked with your campus technology transfer office before, you know that public disclosure of new research (e.g., in a research article)  can be a problem if one hopes to obtain a patent for an invention discussed in that publication. U.S. patent law rewards new and non-obvious inventions, and so the law provides in 35 U.S.C. § 102(a) that one is not entitled to a patent if “the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.”

Note that the statute specifically calls out description of the invention “in a print publication.” Prior print publication turns on “public accessibility,” which the courts have explained as being “disseminated or otherwise made available to the extent that persons interested and ordinarily skilled in the subject matter or art exercising reasonable diligence[ ] can locate it.” And so, the standard is far less than the “worldwide free public access” provided by the public access databases under the Nelson memo. For example, the Federal Circuit has found that a dissertation shelved and indexed in a card catalog at a German University qualified as publicly accessible. The court has also concluded that an oral presentation of a paper (with dissemination of the paper itself to only six people) at a conference satisfied the test. Similarly, the Federal Circuit has held that electronic distribution via a subscription email list qualified as publicly accessible. The point is that if you’ve already published a paper in a peer-reviewed journal that sufficiently describes the invention–even if just published via a subscription route and not available for free–you have almost certainly already disclosed the invention. Further expanding the reach through a public access repository would make no difference. 

Public access policies implementing the Nelson Memo do not compel researchers or universities to disclose inventions prematurely, thus having no impact on patentability. It merely states that once you choose to publish your research in an article, it must be promptly accessible to the public for free, no later than the publication date, in a public access repository. Whether the article is restricted to subscribers only or made openly available does not affect its status as a public disclosure for patent purposes.

Stanford v. Roche

Stanford v. Roche is 2011 Supreme Court case addressing ownership of patent rights in inventions created pursuant to federal funding and subject to Bayh-Dole. The case was about control over rights in a test kit developed to detect HIV in human blood. As the Court explained the relevant facts: 

Dr. Mark Holodniy joined Stanford as a research fellow . . . When he did so, he signed a Copyright and Patent Agreement (CPA) stating that he “agree[d] to assign” to Stanford his “right, title and interest in” inventions resulting from his employment at the University. 

At Stanford Holodniy undertook to develop an improved method for quantifying HIV levels in patient blood samples, using [polymerase chain reaction, or PCR, a Nobel Prize-winning technique developed at Cetus]. Because Holodniy was largely unfamiliar with PCR, his supervisor arranged for him to conduct research at Cetus. As a condition of gaining access to Cetus, Holodniy signed a Visitor’s Confidentiality Agreement (VCA). That agreement stated that Holodniy “will assign and do[es] hereby assign” to Cetus his “right, title, and interest in each of the ideas, inventions and improvements” made “as a consequence of [his] access” to Cetus. 

For the next nine months, Holodniy conducted research at Cetus. 

The conflict was ultimately about whether Stanford could prevent Roche, the company that acquired Cetus’s IP assets, from using the invention. 

At the Supreme Court, the court was asked to address the apparent conflict between 1) the ordinary rule in patent law that rights in an invention belong to the inventor and that “in most circumstances, an inventor must expressly grant his rights in an invention to his employer if the employer is to obtain those rights” and 2) the contention of Stanford University that Bayh-Dole changed this ordinary rule and instead gave it first priority in that invention, such that an individual inventor couldn’t just sign away rights to a third party. 

Stanford made this argument about Bayh-Dole in part to protest against an important decision in the appellate court below; namely, that Stanford’s agreement with Dr. Holodniy was a “mere promise to assign rights in the future, not an immediate transfer of expectant interests” and therefore came second in line to Holodniy’s agreement with Cetus which allowed it to “immediately gained equitable title to Holodniy’s inventions.”

The Supreme Court concluded that Bayh-Dole did not disrupt the ordinary rule that inventors own rights in their inventions absent an express assignment, and because Holodniy’s agreement with Stanford used ineffective language to secure for it first priority—“agree to assign” instead of the effective “do hereby assign”—Stanford lost. The practical upshot—many of you may remember this—was that universities rushed to revise their agreements with employees to put in place more effective language securing first-priority rights in inventions of university employees. 

Federal grants and copyright—what’s a university to do?

Stanford v. Roche contains some important lessons for universities, as federal grant recipients, about securing clear and effective rights from employees to comply with their grant obligations. 

Like in Stanford v. Roche, in the context of copyrightable works created pursuant to federal funding, it’s also important for universities (as grantees) to make sure they actually hold sufficient rights in copyrightable works produced under that grant so they can comply with federal agencies’ public access requirements. That said, there are some important differences between the assignment of patent rights issues in Roche and what is required for compliance under the federal purpose license. 

Probably the biggest determining factor in the effectiveness of those licenses will be how universities craft and implement their copyright policies. We’ve touched on this before, and explained that one important factor to consider is whether copyright law’s “work made for hire” doctrine applies (patent law has no such thing). Under copyright’s work made-for-hire doctrine, a work produced within the scope of employment is owned initially by the employer rather than an employee.  Whether and how “work made for hire” applies to academic work is contested, but if it does apply, it largely eliminates concerns about priority of the university’s license since the university would be the initial owner. That’s true even though most universities (very rightly in our opinion!), make it clear that individual authors should ultimately be in control of rights in their works.  For instance, the University of Michigan transfers the copyright of scholarly works to its faculty members, but reserves the ability to make uses consistent with academic norms, including complying with a Federal Purpose License.

Even without the application of work for hire, universities can and do use their copyright policies to effectively address ownership and licensing of faculty created scholarly works.  Though we haven’t read every university’s copyright policy, for the most part we’ve found them to be thoughtful about securing from faculty authors at a minimum a non-exclusive license that would satisfy the requirements of Section 205(e) of the Copyright Act, giving it priority over any subsequent transfers such as a publishing agreement with a publisher.  We review some of these approaches university policies take in this post, and we plan to release a white paper on this subject in the next few months. If you want to read further now,  Law Professor Eric Priest has a good article, “Copyright and the Harvard Open Access Mandate,” that explains why these kinds of licenses are likely effective. 

Conclusion

It’s important to remember that patent law and copyright law are distinct in many ways. While they share some similar concepts, the details are important and ownership and licensing of rights under one can be quite different from the other. The Bayh-Dole Act and other U.S. patent law govern ownership and commercialization of federally funded inventions, but they do not dictate how the Federal Purpose License should be interpreted or applied within the confines of copyright law. 

Artificial Intelligence, Authorship, and the Public Interest

Posted January 9, 2025
Photo by Robert Anasch on Unsplash

Today, we’re pleased to announce a new project generously supported by the John S. and James L. Knight Foundation. The project, “Artificial Intelligence, Authorship, and the Public Interest,” aims to identify, clarify, and offer answers to some of the most challenging copyright questions posed by artificial intelligence (AI) and explain how this new technology can best advance knowledge and serve the public interest.

Artificial intelligence has dominated public conversation about the future of authorship and creativity for several years. Questions abound about how this technology will affect creators’ incentives, influence readership, and what it might mean for future research and learning. 

At the heart of these questions is copyright law. Over two dozen class-action copyright lawsuits have been filed between November 2022 and today against companies such as Microsoft, Google, OpenAI, Meta, and others. Additionally, congressional leadership, state legislatures, and regulatory agencies have held dozens of hearings to reconcile existing intellectual property law with artificial intelligence. As one of the primary legal mechanisms for promoting the “progress of science and the useful arts,” copyright law plays a critical role in creating, producing, and disseminating information. 

We are convinced that how policymakers shape copyright law in response to AI will have a lasting impact on whether and how the law supports democratic values and serves the common good. That is why Authors Alliance has already devoted considerable effort to these issues, and this project will allow us to expand those efforts at this critical moment. 

AI Legal Fellow
As part of the project, we’re pleased to add an AI Legal Fellow to our team to support the project. The position requires a law degree and demonstrated interest and experience with artificial intelligence, intellectual property, and legal technology issues. We’re particularly interested in someone with a demonstrated interest in how copyright law can serve the public interest. This role will require significant research and writing. Pay is $90,000/yr, and it is a two-year term position. Read more about the position here. We’ll begin reviewing applications immediately and do interviews on a rolling basis until filled. 

As we get going, we’ll have much more to say about this project. We will have some funds available to support research subgrants, organize several workshops and symposia, and offer numerous opportunities for public engagement. 

About the John S. and James L. Knight Foundation
We are social investors who support democracy by funding free expression and journalism, arts and culture in community, research in areas of media and democracy, and in the success of American cities and towns where the Knight brothers once had newspapers. Learn more at kf.org and follow @knightfdn on social media.

Authors Alliance 2024 Annual Report

Posted December 17, 2024

Authors Alliance celebrated an important milestone in 2024: our 10th anniversary! 

Quite a lot has changed since 2014, but our mission remains the same. We exist to advance the interests of authors who want to serve the public good by sharing their creations broadly.  I’m pleased to share our 2024 annual report, where you can find highlights of our work this year to promote laws, policies, and practices that enable authors to reach wide audiences.

Our success in 2024 was largely due to the wonderful collaboration and support we have from our members. You’ll see in the report a number of ongoing projects and issues we are working to address: legal questions about open access publishing, rights reversion at scale, supporting text data mining research, addressing contractual override of fair use,  AI and copyright, and more. As we look to 2025, I would love to hear from you if you have a special interest in any of these projects and would like to contribute your ideas, time, or expertise to help us tackle them.

I’m grateful for those of you who contributed financially to make 2024 a success. Authors Alliance is funded almost entirely by gifts and grants, and so we truly rely on you. As we end the year, I hope you will consider giving if you haven’t done so already. You can donate online here.

Thank you,

Dave Hansen
Executive Director 


Restricting Innovation: How Publisher Contracts Undermine Scholarly AI Research

Posted December 6, 2024
Photo by Josh Appel on Unsplash

This post is by Rachael Samberg, Director, Scholarly Communication & Information Policy, UC Berkeley Library and Dave Hansen, Executive Director, Authors Alliance

This post is about the research and the advancement of science and knowledge made impossible when publishers use contracts to limit researchers’ ability to use AI tools with scholarly works. 

Within the scholarly publishing community, mixed messages pervade about who gets to say when and how AI tools can be used for research reliant on scholarly works like journal articles or books. Some scholars voiced concern (explained more here) when major scholarly publishers like Wiley or Taylor & Francis entered lucrative contracts with big technology companies to allow for AI training without first seeking permission from authors. We suspect that these publishers have the legal right to do so since most publishers demand that authors hand over extensive rights in exchange for publishing their work. And with the backdrop of dozens of pending AI copyright lawsuits, who can blame the AI companies for paying for licenses, if for no other reason than avoiding the pain of litigation? While it stings to see the same large commercial, academic publishers profit yet again off of the work academic authors submit to them for free, we continue to think there are good ways for authors to retain a say in the matter. 

 Big tech companies are one thing, but what about scholarly research? What about the large and growing number of scholars who are themselves using scholarly copyrighted content with AI tools to conduct their research? We currently face a situation in which publishers are attempting to dictate how and when researchers can do that work, even when authors’ fair use rights to use and derive new understandings from scholarship clearly allow for such uses. 

How vendor contracts disadvantage US researchers

We have written elsewhere (in an explainer and public comment to the Copyright Office) why training AI tools, particularly in the scholarly and research context, constitutes a fair use under U.S. Copyright law. Critical for the advancement of knowledge, training AI is based on a statutory right already held by all scholarly authors engaging in computational research and one that lawmakers should preserve. 

The problem U.S. scholarly authors presently face with AI training is that publishers restrict their access to these statutory rights through contracts that override them: In the United States, publishers can use private contracts to take away statutory fair use rights that researchers would otherwise hold under Federal law. In this case, the private contracts at issue are the electronic resource (e-resource) license agreements that academic research libraries sign to secure campus access to electronic journal, e-book, data, and other content that scholars need for their computational research.

Contractual override of fair use is a problem that disparately disadvantages U.S. researchers. As we have described elsewhere, more than forty countries, including the European Union, expressly reserve text mining and AI training rights for scientific research by research institutions. Not only do scholars in these countries not have to worry whether their computational research with AI is permitted, but also: They do not risk having those reserved rights overridden by contract. The European Union’s Copyright Digital Single Market Directive and recent AI Act nullify any attempt to circumscribe the text and data mining and AI training rights reserved for scientific research within research organizations. U.S. scholars are not as fortunate. 

In the U.S., most institutional e-resource licenses are negotiated and managed by research libraries, so it is imperative that scholars work closely with their libraries and advocate to preserve their computational research and AI training rights within the e-resource license agreements that universities sign. To that end, we have developed adaptable licensing language to support institutions in doing that nationwide. But while this language is helpful, the onus of advocacy and negotiation for those rights in the contracting process remains. Personally, we have found it helpful to explain to publishers that they must consent to these terms in the European Union, and can do so in the U.S. as well. That, combined with strong faculty and administrative support (such as at the University of California), makes for a strong stance against curtailment of these rights.

But we think there are additional practical ways for libraries to illustrate—both to publishers and scholarly authors—exactly what would happen to the advancement of knowledge if publishers’ licensing efforts to curtail AI training were successful. One way to do that is by “unpacking” or decoding a publisher’s proposed licensing restriction, and then demonstrating the impact that provision would have on research projects that were never objectionable to publishers before, and should not be now. We’ll take that approach below.

Decoding a publisher restriction

A commercial publisher recently proposed the following clause in an e-resource agreement:

Customer [the university] and its Authorized Users [the scholars] may not:

  1. directly or indirectly develop, train, program, improve, and/or enrich any artificial intelligence tool (“AI Tool”) accessible to anyone other than Customer and its Authorized Users, whether developed internally or provided by a third party; or
  2. reproduce or redistribute the Content to any third-party AI Tool, except to the extent limited portions of the Content are used solely for research and academic purposes (including to train an algorithm) and where the third-party AI Tool (a) is used locally in a self-hosted environment or closed hosted environment solely for use by Customer or Authorized Users; (b) is not trained or fine-tuned using the Content or any part thereof; and (c) does not share the Content or any part thereof with a third party.  

What does this mean?

  • The first paragraph forbids the training or improving of any AI tool if it’s accessible or released to third parties. And, it further forbids the use of any computational outputs or analysis that are derived from the licensed content from being used to train any tool available to third parties. 
  • The second paragraph is perhaps even more concerning. It provides that when using third party AI tools of any kind, a scholar can use only limited portions of the licensed content with the tools, and are prohibited from doing any training at all of third party tools even if it’s a non-generative AI tool and the scholar is performing the work in a completely closed and highly secure research environment.

What would the impact of such a restrictive licensing provision be on research? 

It would mean that every single one of the trained tools in the following projects could never be disseminated. In addition, for the projects below that used third-party AI tools, the research would have been prohibited full-stop because the third-party tools in those projects required training which the publisher above is attempting to prevent:

Tools that could not be disseminated

  1. In 2017, chemists created and trained a generative AI tool on 12,000 published research papers regarding synthesis conditions for metal oxides, so that the tool could identify anticipated chemical outputs and reactions for any given set of synthesis conditions entered into the tool. The generative tool they created is not capable of reproducing or redistributing any licensed content from the papers; it has merely learned conditions and outcomes and can predict chemical reactions based on those conditions and outcomes. And this beneficial tool would be prohibited from dissemination under the publisher’s terms identified above.
  2. In 2018, researchers trained an AI tool (that they had originally created in 2014) to understand whether a character is “masculine” or “feminine” by looking at the tacit assumptions expressed in words associated with that character. That tool can then look at other texts and identify masculine or feminine characters based on what it knows from having been trained before. The implications are that scholars can therefore use texts from different time periods with the tool to study representations of masculinity and femininity over time. No licensed content, no licensed or copyrighted books from a publisher can ever be released to the world by sharing the trained tool; the trained tool is merely capable of topic modeling—but the publisher’s above language would prohibit its dissemination nevertheless. 

Tools that could neither be trained nor disseminated 

  1. In 2019, authors used text from millions of books published over 100 years to analyze cultural meaning. They did this by training third-party non-generative AI word-embedding models called Word2Vec and GLoVE on multiple textual archives. The tools cannot reproduce content: when shown new text, they merely represent words as numbers, or vectors, to evaluate or predict how similar words in a given space are semantically or linguistically. The similarity of words can reveal cultural shifts in understanding of socioeconomic factors like class over time. But the publisher’s above licensing terms would prohibit the training of the tools to begin with, much less the sharing of them to support further or different inquiry. 
  2. In 2023, scholars trained a third-party-created open-source natural language processing (NLP) tool called Chemical Data Extractor (CDE). Among other things, CDE can be used to extract chemical information and properties identified in scholarly papers. In this case, the scholars wanted to teach CDE to parse a specific type of chemical information: metal-organic frameworks, or MoFs. Generally speaking, the CDE tool works by breaking sentences into “tokens” like parts of speech and referenced chemicals. By correlating tokens, one can determine that a particular chemical compound has certain synthetic properties, topologies, reactions with solvents, etc. The scholars trained CDE specifically to parse MoF names, synthesis methods, inorganic precursors, and more—and then exported the results into an open source database that identifies the MoF properties for each compound. Anyone can now use both the trained CDE tool and the database of MoF properties to ask different chemical property questions or identify additional MoF production pathways—thereby improving materials science for all. Neither the CDE tool nor the MoF database reproduces or contains the underlying scholarly papers that the tool learned from. Yet, neither the training of this third-party CDE tool nor its dissemination would be permitted under the publisher’s restrictive licensing language cited above.

Indeed, there are hundreds of AI tools that scholars have trained and disseminated—tools that do not reproduce licensed content—and that scholars have created or fine-tuned to extract chemical information, recognize faces, decode conversations, infer character types, and so much more. Restrictive licensing language like that shown above suppresses research inquiries and societal benefits that these tools make possible. It may also disproportionately affect the advancement of knowledge in or about developing countries, which may lack the resources to secure licenses or be forced to rely on open-source or poorly-coded public data—hindering journalism, language translation, and language preservation.

Protecting access to facts

Why are some publishers doing this? Perhaps to reserve the opportunity to develop and license their own scholarship-trained AI tools, which they could then license at additional cost back to research institutions. We could speculate about motivations, but the upshot is that publishers have been pushing hard to foreclose scholars from training and dissemination AI tools that now “know” something based on the licensed content. That is, such publishers wish to prevent tools from learning facts about the licensed content. 

However, this is precisely the purpose of licensing content. When institutions license content for their scholars to read, they are doing so for the scholars to learn information from the content. When scholars write about it or teach about the content, they are not regenerating the actual expression from the content—the part that is protected by copyright; rather the scholars are conveying the lessons learned from the content—facts not protected by copyright. Prohibiting the training of AI tools and the dissemination of those tools is functionally equivalent to prohibiting scholars from learning anything about the content that institutions are licensing for that very purpose, and that scholars have written to begin with! Publishers should not be able to monopolize the dissemination of information learned from scholarly content, and especially when that information is used non-commercially.

For these reasons, when we negotiate to preserve AI usage and training rights, we generally try to achieve the following outcomes which would promote—rather than prohibit—all of the research projects described above:

The sample language we’ve disseminated empowers others to negotiate for these outcomes. We hope that, when coupled with the advocacy tools we’ve provided above, scholars and libraries can protect their AI usage and training rights, while also being equipped to consider how they want their own works to be used.

Developing a public-interest training commons of books

Posted December 5, 2024
Photo by Zetong Li on Unsplash

Authors Alliance is pleased to announce a new project, supported by the Mellon Foundation, to develop an actionable plan for a public-interest book training commons for artificial intelligence. Northeastern University Library will be supporting this project and helping to coordinate its progress.

Access to books will play an essential role in how artificial intelligence develops. AI’s Large Language Models (LLMs) have a voracious appetite for text, and there are good reasons to think that these data sets should include books and lots of them. Over the last 500 years, human authors have written over 129 million books. These volumes, preserved for future generations in some of our most treasured research libraries, are perhaps the best and most sophisticated reflection of all human thinking. Their high editorial quality, breadth, and diversity of content, as well as the unique way they employ long-form narratives to communicate sophisticated and nuanced arguments and ideas make them ideal training data sources for AI.

These collections and the text embedded in them should be made available under ethical and fair rules as the raw material that will enable the computationally intense analysis needed to inform new AI models, algorithms, and applications imagined by a wide range of organizations and individuals for the benefit of humanity. 

Currently, AI development is dominated by a handful of companies that, in their rush to beat other competitors, have paid insufficient attention to the diversity of their inputs, questions of truth and bias in their outputs, and questions about social good and access. Authors Alliance, Northeastern University Library, and our partners seek to correct this tilt through the swift development of a counterbalancing project that will focus on AI development that builds upon the wealth of knowledge in nonprofit libraries and that will be structured to consider the views of all stakeholders, including authors, publishers, researchers, technologists, and stewards of collections. 

The main goal of this project is to develop a plan for either establishing a new organization or identifying the relevant criteria for an existing organization (or partnership of organizations) to take on the work of creating and stewarding a large-scale public interest training commons of books.

We seek to answer several key questions, such as: 

  • What are the right goals and mission for such an effort, taking into account both the long and short-term;
  • What are the technical and logistical challenges that might differ from existing library-led efforts to provide access to collections as data;
  • How to develop a sufficiently large and diverse corpus to offer a reasonable alternative to existing sources;
  • What a public-interest governance structure should look like that takes into account the particular challenges of AI development;
  • How do we, as a collective of stakeholders from authors and publishers to students, scholars, and libraries, sustainably fund such a commons, including a model for long-term sustainability for maintenance, transformation, and growth of the corpus over time;
  • Which combination of legal pathways is acceptable to ensure books are lawfully acquired in a way that minimizes legal challenges;
  • How to respect the interests of authors and rightsholders by accounting for concerns about consent, credit, and compensation; and
  • How to distinguish between the different needs and responsibilities of nonprofit researchers, small market entrants, and large commercial actors.

The project will include two meetings during 2025 to discuss these questions and possible ways forward, additional research and conversations with stakeholders, and the development and release of an ambitious yet achievable roadmap.

Support Authors Alliance!

Posted December 3, 2024

As we end the year, I’m writing to ask for your financial support by giving toward our end-of-year campaign (click here to donate online).

In May, Authors Alliance marked its 10th anniversary. We’ve experienced tremendous support and enthusiasm for our work over the last decade, and your collaboration has been an important part of our success. I hope you’ll help Authors Alliance take on our next decade. 

We’re proud of our work promoting authorship for the public good by supporting authors who write to be read. In the past year, we secured expanded copyright exemptions for text and data mining researchhelped defend authors’ fair use rights in courtlaunched an important initiative to clarify legal pathways for open access to federally funded research, and much more. We’ve also continued to help authors develop a deeper understanding of how complex policy issues can affect their work, drawing over 20,000 attendees for our in-person and online events on topics such as text and data mining, open access, artificial intelligence, and competition law. 

For 2025, we have our work cut out for us. As policymakers actively consider changes to how the law accommodates free expression, access to information, and new technology, we continue to find that we are among the only voices defending authors’ rights to research, write, and share their work for the benefit of the public. Your support for Authors Alliance will help us continue to speak out in support of authors who value the public interest.

Donate Online Today

Thank you,
Dave Hansen
Executive Director