Category Archives: News

On The New NIH Indirect Cost Guidance

Posted February 18, 2025
Photo of an emergency room with multiple "emergency" signs; red color enhanced.
NIH cuts are an emergency for hospitals (Photo: Eric Harbeson, CC-BY)

A little over a week ago, the National Institutes of Health issued a new guidance policy on indirect costs in Federal grant awards. Presently, NIH negotiates the indirect cost rate with individual institutions through a carefully regulated process that ensures an appropriate rate for a given institution’s unique circumstances, while also providing robust safeguards and auditing requirements to ensure that the rate is no greater than necessary. The new policy—similar to what the previous Trump administration proposed in 2017—would replace the negotiated rates with a standard rate of 15%. For comparison, the average rate among grantee institutions is around 27%, and many of the top research institutions currently have negotiated rates exceeding 50% or even 60%, which amount to tens of millions of dollars in some cases. The rate cap would apply both prospectively to new grants, as well as to all in-progress grants.

Indirect costs are the institutional expenditures that cannot be attributed to a particular research project. These are the costs of keeping the lights on, and the lab clean, and the MRI machine running. They pay for biocontainment labs, or clinical testing facilities, or computer systems to analyze data, facilities each of which might be shared by multiple NIH-funded projects. Though indirect, they are significant costs incurred by the institution and are an unavoidable part of conducting grant-funded research. From a government efficiency standpoint they are also highly desirable, in that they reduce unnecessary redundancy as well as exceedingly time-consuming and expensive bookkeeping.

Support for indirect costs in grant funds is essential to institutions’ ability to take part in Federal grant-making. If the new guidance policy is allowed to stand, universities collectively expect to lose many hundreds of millions of dollars from the move, losses which in turn will lead to decreases in important, sometimes life-saving research. This new policy has raised serious concerns among affected institutions. 

To say things are moving quickly in Washington, these days, would be an understatement. The administration has, of course, been releasing a flurry of sometimes sweeping executive orders. The pace is dizzying. In this case, in the space of just four days—two of which were a weekend—the NIH issued its guidance; at least three different lawsuits were filed, each in the District of Massachusetts; and a judge entered a temporary restraining order on the guidance. A hearing on the restraining order is scheduled for February 21 (the cases have not yet been consolidated, though they almost certainly will be if they proceed).

In our view, there are multiple clear violations of law in the guidance, both of statute and of the Constitution. While we await the hearing, we thought it worthwhile to highlight to authors some of legal challenges it will face. Many others have already written on this topic—for more responses to the guidance policy, we recommend COGR’s collection of responses from the grantee community, as well as this post by Holden Thorpe in Science and this post from Lisa Janicke Hinchliffe in Scholarly Kitchen (which draws important connections to scholarly publishing). 

Some Fact-Checking

At the outset, an examination of the issuing guidance reveals holes in the chain of authority that anticipate problems with the new order. For example, the guidance asserts that “NIH may, however, use ‘a rate different from the negotiated rate for either a class of Federal awards or a single Federal award.’ 45 C.F.R. 75.414(c)(1).” The citation at the end refers to Title 45, part 75 of the Code of Federal Regulations, where NIH’s parent agency, the Department of Health and Human Services (HHS), codifies its grant guidelines. Here is the entire paragraph:

“Negotiated indirect cost rates must be accepted by all Federal agencies. A Federal agency may use a rate different from the negotiated rate for either a class of Federal awards or a single Federal award only when required by Federal statute or regulation, or when approved by the awarding Federal agency in accordance with paragraph (c)(3) of this section.” 45 C.F.R. § 75.414(c)(1) (emphasis added).

Note that this paragraph doesn’t say NIH generally may use a different rate, as the guidance appears to claim. Rather, it states the exception—they may not do so unless they are required to by statute or another regulation. Alternatively, under paragraph (c)(3) of the regulation, NIH must “implement, and make publicly available, the policies, procedures and general decision making criteria that their programs will follow to seek and justify deviations from negotiated rates.” (emphasis added). The paragraph doesn’t give NIH general permission, it constrains them.

The notice’s very next sentence provides arguably its most egregious claim, namely that the cap may be applied retroactively to existing grants, in defiance of institutions’ reliance on their contractually negotiated rates. The notice states that “​​NIH may deviate from the negotiated rate both for future grant awards and, in the case of grants to institutions of higher education (‘IHEs’), for existing grant awards. See 45 CFR Appendix III to Part 75, § C.7.a; see 45 C.F.R. 75.414(c)(1).” The citation, to Appendix III of Part 75, purports to support the claim that NIH may unilaterally, and retroactively, alter the terms of a contract. Here is the cited paragraph, in its entirety: 

“Except as provided in paragraph (c)(1) of § 200.414, Federal agencies must use the negotiated rates in effect at the time of the initial award throughout the life of the Federal award. Award levels for Federal awards may not be adjusted in future years as a result of changes in negotiated rates. “Negotiated rates” per the rate agreement include final, fixed, and predetermined rates and exclude provisional rates. “Life” for the purpose of this subsection means each competitive segment of a project. A competitive segment is a period of years approved by the Federal awarding agency at the time of the Federal award. If negotiated rate agreements do not extend through the life of the Federal award at the time of the initial award, then the negotiated rate for the last year of the Federal award must be extended through the end of the life of the Federal award.” (emphasis added)

Once again, the cited text not only does not support the claim, but if anything forecloses it. This paragraph does not purport to give permission to change an existing agreement. To the contrary, the paragraph requires NIH to respect the negotiated rate for the life of the award. (Sec. 200.414(c)(1), referenced in the appendix, points to the OMB Uniform Guidance, and is essentially the same as HHS’s Sec. 75.414(a), which is discussed above). 

The end result is that the notice rests its legal authority to carry out the policy on regulations that in fact work against the new policy. Not a great start.

Violation of law and policy

Though federal agencies are ultimately under the direction of the President, this does not give the executive branch unfettered authority to dictate an agency’s policies. Agencies act as agents for carrying out the laws passed by Congress. This means that Congress has the last word as to what an agency is authorized to do or not do, or must do or not do. In fact, every act of an agency must, in some way, be tied to an act of Congress (admittedly, the connection is often fairly loose).

Congress has actually prohibited the president—this president—from capping the negotiated indirect cost rates. In 2017, when the president pressed Congress to limit indirect costs to 10% of the grant award, Congress not only rejected the idea, but in Sec. 226 of the Consolidated Appropriations Act of 2018 (p.394) they forbade the president from pursuing the policy. Under Sec. 226 rider, Congress provided that the existing regulations pertaining to indirect costs are to continue, and that the department may not expend funds in pursuing a policy to the contrary. The rider has persisted in every appropriations bill since, including the most recent one.

The policy also is contrary to HHS’s own regulations that govern new policies such as this one. The notice purports to “implement, and make publicly available, the policies, procedures and general decision making criteria” as required by 45 CFR 75.414(c)(3) (discussed above), but in fact it only satisfies one of the three requirements. The notice publishes a policy (the 15% rate cap), but it does not make the procedures or general criteria available as required by the regulation. And publication must occur prior to the policies’ effective date, not simultaneously with it.

Under the Administrative Procedure Act (APA), in place since 1946, Congress has established the courts’ jurisdiction to review agency actions, such as this one, and to “decide all relevant questions of law.” The courts are empowered to set aside agency actions that are not in accordance with law, whether because they are contrary to the agency’s own regulations, acts of Congress, or the Constitution. As the three complaints observe, Congress has forbidden NIH from changing the system of negotiated indirect costs, and the new policy is also in violation of the agency’s own regulations.

Constitutional violations

The Constitution also has something to say about the guidance. In addition to the separation of powers problems, related to Congress’s actions discussed above, the retroactive nature of the guidance raises problems under the Fifth Amendment’s due process and takings clauses. These problems arise because the guidance professes to alter the indirect costs for existing grants, effectively unilaterally rewriting the grant agreements without regard to the institutions’ justified reliance on the binding nature of the agreements.

Contracts are a form of property, and contracts are binding on the U.S. Government to the same extent that they are on private parties. Though grant agreements are not formally contracts per se, the Supreme Court has observed that legislation enacted under the Spending Clause, as all grants are, is “much in the nature of a contract.”  The grant agreements bind the grantee institution to numerous terms and conditions (some of which could be said to be consideration for the award) in return for federal financial support for the project. The grant agreements are clearly binding on both parties, and renegotiation of a contract requires consent of both parties.

States, for their part, are explicitly forbidden from legislating their way out of contractual agreements, such as the NIH purports to do, under the Constitution’s Contracts Clause, but that clause does not apply to the Federal government. Still, the Federal government is prohibited, under the Fifth Amendment, from taking private property (and again, contracts are property) for public use without just compensation, and from depriving a party of property (for any purpose) “without due process of law.” Grantee institutions rely on the government’s promise to follow through on the agreed upon, negotiated indirect cost rate, and that reliance interest is in some cases hundreds of millions of dollars. NIH’s implementing this new policy, and with no notice (much less a hearing or opportunity to comment as the APA would require) sounds a lot like deprivation of property without due process.

Conclusion

NIH-funded research has produced an astonishing amount of highly significant, impactful research, and its role in the biomedical research ecosystem is pivotal. The authors NIH has funded have won every major prize in the field many times over, and their research has saved and improved countless lives. But NIH’s track record is only as strong as its grantees—authors who do the research and the institutions that employ them. If NIH is permitted to recklessly cut its promised support to those grantees, the inevitable resulting loss of research will be a great detriment to the scientific community, both home and abroad, and to Americans in general.

Thomson Reuters v. Ross: The First AI Fair Use Ruling Fails to Persuade

Posted February 13, 2025
A confused judge, generated by Gemini AI

Facts of the Case

On February 11, Third Circuit Judge Stephanos Bibas (sitting by designation for the U.S.  District Court of Delaware) issued a new summary judgment ruling in Thomson Reuters v. ROSS Intelligence. He overruled his previous decision from 2023 which held that a jury must decide the fair use question. The decision was one of the first to address fair use in the context of AI, though the facts of this case differ significantly from the many other pending AI copyright suits. 

This ruling focuses on copyright infringement claims brought by Thomson Reuters (TR), the owner of Westlaw, a major legal research platform, against ROSS Intelligence. TR alleged that ROSS improperly used Westlaw’s headnotes and the Key Number System to train its AI system to better match legal questions with relevant case law. 

Westlaw’s headnotes summarize legal principles extracted from judicial opinions. (Note: Judicial opinions are not copyrightable in the US.) The Key Number System is a numerical taxonomy categorizing legal topics and cases. Clicking on a headnote takes users to the corresponding passage in the judicial text. Clicking on the key number associated with a headnote takes users to a list of cases that make the same legal point. 

Importantly, ROSS did not directly ingest the headnotes and the Key Number System to train its model. Instead, ROSS hired LegalEase, a company that provides legal research and writing services, to create training data based on the headnotes and the Key Number System. LegalEase created Bulk Memos—a collection of legal questions paired with four to six possible answers. LegalEase instructed lawyers to use Westlaw headnotes as a reference to formulate the questions in Bulk Memos. LegalEase instructed the lawyers not to copy the headnotes directly. 

ROSS attempted to license the necessary content directly from TR, but TR refused to grant a license because it thought the AI tool contemplated by ROSS would compete with Westlaw.

The financial burden of defending this lawsuit has caused ROSS to shut down its operations. ROSS has countered TR’s copyright infringement claims with antitrust claims but the claims were dismissed by the same Judge. 

The New Ruling

The court found that ROSS copied 2,243 headnotes from Westlaw. The court ruled that these headnotes and the Key Number System met the low legal threshold for originality and were copyrightable. The court rejected the merger and scenes à faire defense by ROSS, because, according to the court, the headnotes and the Key Number System were not dictated by necessity. The court also rejected ROSS’s fair use defense on the grounds that the 1st and 4th factors weighed in favor of TR. At this point, the only remaining issue for trial is whether some headnotes’ copyrights had expired or were untimely registered.

The new ruling has drawn mixed reactions—some saying it undermines potential fair use defenses in other AI cases, while others dismiss its significance since its facts are unique. In our view, the opinion is poorly reasoned and disregards well-established case law. Future AI cases must demonstrate why the ROSS Court’s approach is unpersuasive. Here are three key flaws we see in the ruling.   

Problems with the Opinion

  1. Near-Verbatim Summaries are “Original”?

“A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. … A headnote is a short, key point of law chiseled out of a lengthy judicial opinion.” 

— the ROSS court

(↑example of a headnote and the uncopyrightable judicial text the headnote was based on↑)

The court claims that the Westlaw headnotes are original both individually and as a compilation, and the Key Number System is original and protected as a compilation. 

“Original” has a special meaning in US copyright law: It means that a work has a modicum of human creativity that our society would want to protect and encourage. Based on the evidence that survived redaction, it is near impossible to find creativity in any individual headnotes. The headnotes consist of verbatim copying of uncopyrightable judicial texts, along with some basic paraphrasing of facts. 

As we know, facts are not copyrightable, but expressions of facts often are. One important safeguard for protecting our freedom to reference facts is the merger doctrine. US law has long recognized that when there are only limited ways to express a fact or an idea, those expressions are not considered “original.” The expressions “merge” with the underlying unprotectable fact, and become unprotectable themselves. 

Judge Bibas gets merger wrong—he claims merger does not apply here because “there are many ways to express points of law from judicial opinions.” This view misunderstands the merger doctrine. It is the nature of human language to be capable of conveying the same thing in many different ways, as long as you are willing to do some verbal acrobatics. But when there are only a limited number of reasonable, natural ways to express a fact or idea—especially when textual precision and terms of art are used to convey complex ideas—merger applies. 

There are many good reasons for this to be the law. For one, this is how we avoid giving copyright protection to concise expression of ideas. Fundamentally, we do not need to use copyright to incentivize the simple restatement of facts. As the Constitution intended, copyright law is designed to encourage creativity, not to grant exclusive rights to basic expressions of facts. We want people to state facts accurately and concisely. If we allowed the first person to describe a judicial text in a natural, succinct way to claim exclusive rights over that expression, it would hinder, rather than facilitate, meaningful discussion of said text, and stifle blog posts like this one. 

As to the selection and arrangement of the Key Number System, the court claims that originality exists here, too, because “there are many possible, logical ways to organize legal topics by level of granularity,” and TR exercised some judgment in choosing the particular “level” with its Key Number System. However, the cases are tagged with Key Number System by an automated computer system, and the topics closely mirror what law schools teach their first-year students. 

The court does not say much about why the compilation of the headnotes should receive separate copyright protection, other than that it qualifies as original “factual compilations.” This claim is dubious because the compilation is of uncopyrightable materials, as discussed, and the selection is driven by the necessity to represent facts and law, not by creativity. Even if the compilation of headnotes is indeed copyrightable, using portions of it that are uncopyrightable is decidedly not an infringement, because the US does not protect sui generis database rights.

  1. Can’t Claim Fair Use When Nobody Saw a Copy?

 “[The intermediate-copying cases] are all about copying computer code. This case is not.” 

— the ROSS court conveniently ignoring Bellsouth Advertising & Publishing Corp. v. Donnelley Information Publishing, Inc., 933 F.2d 952 (11th Cir. 1991) and Sundeman v. Seajay Society, Inc., 142 F. 3d 194 (4th Cir. 1998).

In deciding whether ROSS’s use of Westlaw’s headnotes and the Key Number System is transformative under the 1st factor, the court took a moment to consider whether the available intermediate copying case law is in favor of ROSS, and quickly decided against it. 

Even though no consumer ever saw the headnotes or the Key Number System in the AI products offered by ROSS, the court claims that the copying of these constitutes copyright infringement because there existed an intermediate copy that contained copyright-restricted materials authored by Westlaw. And, according to the court, intermediate copying can only weigh in favor of fair use for computer codes.

Before turning to the actual case law the court is overlooking here, we wonder if Judge Bibas is in fact unpersuaded by his own argument: under the 3rd fair use factor, he admits that only the content made accessible to the public should be taken into consideration when deciding what amount is taken from a copyrighted work compared to the copyrighted work as a whole, which is contrary to what he argues under the 1st factor—that we must examine non-public intermediate copies. 

Intermediate copying is the process of producing a preliminary, non-public work as an interim step in the creation of a new public-facing work. It is well established under US jurisprudence that any type of copying, whether private or public, satisfies a prima facie copyright infringement claim, but, the fact that a work was never shared publicly—nor intended to be shared publicly—strongly favors fair use. For example, in Bellsouth Advertising & Publishing Corp. v. Donnelley Information Publishing, Inc., the 11th Circuit Court decided that directly copying a competitor’s yellow pages business directory in order to produce a competing yellow pages was fair use when the resulting publicly accessible yellow pages the defendant created did not directly incorporate the plaintiff’s work. Similarly, in Sundeman v. Seajay Society, Inc., the Fourth Circuit concluded that it was fair use when the Seajay Society made an intermediary, entire copy of plaintiffs’ unpublished manuscript for a scholar to study and write about it. The scholar wrote several articles about it mostly summarizing important facts and ideas (while also using short quotations).  

There are many good reasons for allowing intermediate copying. Clearly, we do not want ALL unlicensed copies to be subject to copyright infringement lawsuits, particularly when intermediate copies are made in order to extract unprotectable facts or ideas. More generally, intermediate copying is important to protect because it helps authors and artists create new copyrighted works (e.g., sketching a famous painting to learn a new style, translating a passage to practice your language skills, copying the photo of a politician to create a parody print t-shirt). 

  1. Suddenly, We Have an AI Training Market?

“[I]t does not matter whether Thomson Reuters has used [the headnotes and the Key Number System] to train its own legal search tools; the effect on a potential market for AI training data is enough.”

 — the ROSS court

The 4th fair use factor is very much susceptible to circular reasoning: if a user is making a derivative use of my work, surely that proves a market already exists or will likely develop for that derivative use, and, if a market exists for such a derivative use, then, as the copyright holder, I should have absolute control over such a market.

The ROSS court runs full tilt into this circular trap. In the eyes of the court, ROSS, by virtue of using Westlaw’s data in the context of AI training, has created a legitimate AI training data market that should be rightfully controlled by TR.

Only that our case law suggests the 4th factor “market substitution” considers only markets which are traditional, reasonable or likely to be developed. As we have already pointed out in a previous blog post, copyright holders must offer concrete evidence to prove the existence, or likelihood of developing, licensing market, before they can argue a secondary use serves as “market substitute.” If we allowed a copyright holder’s protected market to include everything that he’s willing to receive licensing fees for, it will all but wipe out fair use in the service of stifling competition. 

Conclusion

The impact of this case is currently limited, both because it is a district court ruling and because it concerns non-generative AI. However, it is important to remain vigilant, as the reasoning put forth by the ROSS court could influence other judges, policymakers, and even the broader public, if left unchallenged.

This ruling combines several problematic arguments that, if accepted more widely, could have significant consequences. First, it blurs the line between fact and expression, suggesting that factual information can become copyrightable simply by being written down by someone in a minimally creative way. Second, it expands copyright enforcement to intermediate copies, meaning that even temporary, non-public use of copyrighted material could be subject to infringement claims. Third, it conjures up a new market for AI training data, regardless of whether such a licensing market is legitimate or even likely to exist.

If these arguments gain traction, they could further entrench the dominance of a few large AI companies. Only major players like Microsoft and Meta will be able to afford AI training licenses, consolidating control over the industry. The AI training licensing terms will be determined solely between big AI companies and big content aggregators, without representation of individual authors or public interest.  The large content aggregators will get to dictate the terms under which creators must surrender rights to their works for AI training, and the AI companies will dictate how their AI models can be used by the general public. 

Without meaningful pushback and policy intervention, smaller organizations and individual creators cannot participate fairly. Let’s not rewrite our copyright laws to entrench this power imbalance even further.

Why Bayh-Dole has nothing to do with public access to articles under the Federal Purpose License

Posted February 4, 2025
On the left a patent showing windmill, on the right a once copyrightable poem about windmill. This is to illustrate the difference between patent and copyright.
This image, along with all its components, is in the Public Domain and free for reuse.

In the course of our work on Federal public access policies and the Nelson Memo, one of the objections I’ve encountered recently is that federal agency initiatives to provide immediate public access to scholarly articles run afoul of the Bayh-Dole Act or may imperil a university’s patent rights to inventions created pursuant to federal funding. Another related objection is that Stanford v. Roche, a case about how a university must go about securing rights in patentable inventions from their faculty under Bayh-Dole, affects how universities obtain sufficient rights to comply with federal public access policies. 

I thought it would be worth explaining why we don’t think these are realistic problems for federal public access law or policy. 

Bayh-Dole does not affect copyright in scholarly articles

The Bayh-Dole Act is an amendment to U.S. patent law passed in 1980 that gives nonprofits and small businesses the right to retain patent rights in inventions developed using federal funding. Before Bayh-Dole, federal grant recipients were required by some federal agencies’ policies to assign patent rights arising from federally funded research to the government. To encourage institutions receiving federal research funding to commercialize inventions for public benefit, Bayh-Dole instead allowed institutions receiving federal grants the right to retain rights to an invention. If a grantee elects to retain title to an invention (rather than commercializing it), they must grant the government a nonexclusive, nontransferable, irrevocable, paid-up license to use the invention. Unreasonable refusal to develop or commercialize may result in the government exercising “march-in” rights to license the invention to others (one of the more controversial parts of the legislation). 

The rights that Bayh-Dole secures for government contractors and grantees apply to “subject inventions.” “Inventions” it defines as “any invention or discovery which is or may be patentable or otherwise protectable under [US patent laws], or any novel variety of plant which is or may be protectable under the Plant Variety Protection Act. . . . .”  In turn, “subject inventions” are “any invention of the contractor conceived or first actually reduced to practice in the performance of work under a funding agreement.”  In other words, “subject inventions” are inventions that were developed within the scope of a Federal grant.

The Nelson Memo also applies to grant outputs, but not inventions; it applies to “peer-reviewed scholarly publications.” Peer-reviewed scholarly publications, of course, are not inventions nor would any rights under patent law apply to them. Scholarly publications are creative works of authorship, reuse of which is governed by copyright law under Title 17 of the United States Code, not covered by Bayh-Dole. It is true that copyrights and patents are sometimes discussed together as “intellectual property,” and courts sometimes even borrow concepts from one body of law to the other. But for the most part, different statutes and different cases govern how rights under each may be created, owned, licensed, and used.

Federal regulations about agency ownership and licensing of patent and copyright rights reflect that they are different. As discussed at length in this paper we published a few months ago (or see the one-page summary), grant-making agencies have for nearly half a century reserved certain rights in copyrighted grant outputs under a provision known as the “Federal Purpose License.” That license, which is codified in 2 C.F.R. § 200.315(b), provides that: 

“To the extent permitted by law, the recipient or subrecipient may copyright any work that is subject to copyright and was developed, or for which ownership was acquired, under a Federal award. The Federal agency reserves a royalty-free, nonexclusive, and irrevocable right to reproduce, publish, or otherwise use the work for Federal purposes and to authorize others to do so. This includes the right to require recipients and subrecipients to make such works available through agency-designated public access repositories.” (emphasis added).

Note that the Federal Purpose License is limited to copyrightable works.  By contrast, in the very next sub-section of the regulation, we see that rights in patents are treated differently:  

“[T]he recipient or subrecipient is subject to applicable regulations governing patents and inventions, including government-wide regulations in 37 CFR part 401 [the implementing regulations for Bayh-Dole].” 2 C.F.R. § 200.315(c)(emphasis added).

It is, of course, possible that in the course of federally funded research, one might produce both a patentable invention that is subject to Bayh-Dole and a copyrighted research article on the same subject. But this does not make Bayh-Dole applicable to the copyright rights in the article, nor does it mean that the Federal Purpose License (a copyright license) affects patent rights under Bayh-Dole regulations. The copyright provisions cover copyrightable works; the patent provisions the patents.

Disclosure of Inventions or Discoveries

If you’ve worked with your campus technology transfer office before, you know that public disclosure of new research (e.g., in a research article)  can be a problem if one hopes to obtain a patent for an invention discussed in that publication. U.S. patent law rewards new and non-obvious inventions, and so the law provides in 35 U.S.C. § 102(a) that one is not entitled to a patent if “the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.”

Note that the statute specifically calls out description of the invention “in a print publication.” Prior print publication turns on “public accessibility,” which the courts have explained as being “disseminated or otherwise made available to the extent that persons interested and ordinarily skilled in the subject matter or art exercising reasonable diligence[ ] can locate it.” And so, the standard is far less than the “worldwide free public access” provided by the public access databases under the Nelson memo. For example, the Federal Circuit has found that a dissertation shelved and indexed in a card catalog at a German University qualified as publicly accessible. The court has also concluded that an oral presentation of a paper (with dissemination of the paper itself to only six people) at a conference satisfied the test. Similarly, the Federal Circuit has held that electronic distribution via a subscription email list qualified as publicly accessible. The point is that if you’ve already published a paper in a peer-reviewed journal that sufficiently describes the invention–even if just published via a subscription route and not available for free–you have almost certainly already disclosed the invention. Further expanding the reach through a public access repository would make no difference. 

Public access policies implementing the Nelson Memo do not compel researchers or universities to disclose inventions prematurely, thus having no impact on patentability. It merely states that once you choose to publish your research in an article, it must be promptly accessible to the public for free, no later than the publication date, in a public access repository. Whether the article is restricted to subscribers only or made openly available does not affect its status as a public disclosure for patent purposes.

Stanford v. Roche

Stanford v. Roche is 2011 Supreme Court case addressing ownership of patent rights in inventions created pursuant to federal funding and subject to Bayh-Dole. The case was about control over rights in a test kit developed to detect HIV in human blood. As the Court explained the relevant facts: 

Dr. Mark Holodniy joined Stanford as a research fellow . . . When he did so, he signed a Copyright and Patent Agreement (CPA) stating that he “agree[d] to assign” to Stanford his “right, title and interest in” inventions resulting from his employment at the University. 

At Stanford Holodniy undertook to develop an improved method for quantifying HIV levels in patient blood samples, using [polymerase chain reaction, or PCR, a Nobel Prize-winning technique developed at Cetus]. Because Holodniy was largely unfamiliar with PCR, his supervisor arranged for him to conduct research at Cetus. As a condition of gaining access to Cetus, Holodniy signed a Visitor’s Confidentiality Agreement (VCA). That agreement stated that Holodniy “will assign and do[es] hereby assign” to Cetus his “right, title, and interest in each of the ideas, inventions and improvements” made “as a consequence of [his] access” to Cetus. 

For the next nine months, Holodniy conducted research at Cetus. 

The conflict was ultimately about whether Stanford could prevent Roche, the company that acquired Cetus’s IP assets, from using the invention. 

At the Supreme Court, the court was asked to address the apparent conflict between 1) the ordinary rule in patent law that rights in an invention belong to the inventor and that “in most circumstances, an inventor must expressly grant his rights in an invention to his employer if the employer is to obtain those rights” and 2) the contention of Stanford University that Bayh-Dole changed this ordinary rule and instead gave it first priority in that invention, such that an individual inventor couldn’t just sign away rights to a third party. 

Stanford made this argument about Bayh-Dole in part to protest against an important decision in the appellate court below; namely, that Stanford’s agreement with Dr. Holodniy was a “mere promise to assign rights in the future, not an immediate transfer of expectant interests” and therefore came second in line to Holodniy’s agreement with Cetus which allowed it to “immediately gained equitable title to Holodniy’s inventions.”

The Supreme Court concluded that Bayh-Dole did not disrupt the ordinary rule that inventors own rights in their inventions absent an express assignment, and because Holodniy’s agreement with Stanford used ineffective language to secure for it first priority—“agree to assign” instead of the effective “do hereby assign”—Stanford lost. The practical upshot—many of you may remember this—was that universities rushed to revise their agreements with employees to put in place more effective language securing first-priority rights in inventions of university employees. 

Federal grants and copyright—what’s a university to do?

Stanford v. Roche contains some important lessons for universities, as federal grant recipients, about securing clear and effective rights from employees to comply with their grant obligations. 

Like in Stanford v. Roche, in the context of copyrightable works created pursuant to federal funding, it’s also important for universities (as grantees) to make sure they actually hold sufficient rights in copyrightable works produced under that grant so they can comply with federal agencies’ public access requirements. That said, there are some important differences between the assignment of patent rights issues in Roche and what is required for compliance under the federal purpose license. 

Probably the biggest determining factor in the effectiveness of those licenses will be how universities craft and implement their copyright policies. We’ve touched on this before, and explained that one important factor to consider is whether copyright law’s “work made for hire” doctrine applies (patent law has no such thing). Under copyright’s work made-for-hire doctrine, a work produced within the scope of employment is owned initially by the employer rather than an employee.  Whether and how “work made for hire” applies to academic work is contested, but if it does apply, it largely eliminates concerns about priority of the university’s license since the university would be the initial owner. That’s true even though most universities (very rightly in our opinion!), make it clear that individual authors should ultimately be in control of rights in their works.  For instance, the University of Michigan transfers the copyright of scholarly works to its faculty members, but reserves the ability to make uses consistent with academic norms, including complying with a Federal Purpose License.

Even without the application of work for hire, universities can and do use their copyright policies to effectively address ownership and licensing of faculty created scholarly works.  Though we haven’t read every university’s copyright policy, for the most part we’ve found them to be thoughtful about securing from faculty authors at a minimum a non-exclusive license that would satisfy the requirements of Section 205(e) of the Copyright Act, giving it priority over any subsequent transfers such as a publishing agreement with a publisher.  We review some of these approaches university policies take in this post, and we plan to release a white paper on this subject in the next few months. If you want to read further now,  Law Professor Eric Priest has a good article, “Copyright and the Harvard Open Access Mandate,” that explains why these kinds of licenses are likely effective. 

Conclusion

It’s important to remember that patent law and copyright law are distinct in many ways. While they share some similar concepts, the details are important and ownership and licensing of rights under one can be quite different from the other. The Bayh-Dole Act and other U.S. patent law govern ownership and commercialization of federally funded inventions, but they do not dictate how the Federal Purpose License should be interpreted or applied within the confines of copyright law. 

Artificial Intelligence, Authorship, and the Public Interest

Posted January 9, 2025
Photo by Robert Anasch on Unsplash

Today, we’re pleased to announce a new project generously supported by the John S. and James L. Knight Foundation. The project, “Artificial Intelligence, Authorship, and the Public Interest,” aims to identify, clarify, and offer answers to some of the most challenging copyright questions posed by artificial intelligence (AI) and explain how this new technology can best advance knowledge and serve the public interest.

Artificial intelligence has dominated public conversation about the future of authorship and creativity for several years. Questions abound about how this technology will affect creators’ incentives, influence readership, and what it might mean for future research and learning. 

At the heart of these questions is copyright law. Over two dozen class-action copyright lawsuits have been filed between November 2022 and today against companies such as Microsoft, Google, OpenAI, Meta, and others. Additionally, congressional leadership, state legislatures, and regulatory agencies have held dozens of hearings to reconcile existing intellectual property law with artificial intelligence. As one of the primary legal mechanisms for promoting the “progress of science and the useful arts,” copyright law plays a critical role in creating, producing, and disseminating information. 

We are convinced that how policymakers shape copyright law in response to AI will have a lasting impact on whether and how the law supports democratic values and serves the common good. That is why Authors Alliance has already devoted considerable effort to these issues, and this project will allow us to expand those efforts at this critical moment. 

AI Legal Fellow
As part of the project, we’re pleased to add an AI Legal Fellow to our team to support the project. The position requires a law degree and demonstrated interest and experience with artificial intelligence, intellectual property, and legal technology issues. We’re particularly interested in someone with a demonstrated interest in how copyright law can serve the public interest. This role will require significant research and writing. Pay is $90,000/yr, and it is a two-year term position. Read more about the position here. We’ll begin reviewing applications immediately and do interviews on a rolling basis until filled. 

As we get going, we’ll have much more to say about this project. We will have some funds available to support research subgrants, organize several workshops and symposia, and offer numerous opportunities for public engagement. 

About the John S. and James L. Knight Foundation
We are social investors who support democracy by funding free expression and journalism, arts and culture in community, research in areas of media and democracy, and in the success of American cities and towns where the Knight brothers once had newspapers. Learn more at kf.org and follow @knightfdn on social media.

Authors Alliance 2024 Annual Report

Posted December 17, 2024

Authors Alliance celebrated an important milestone in 2024: our 10th anniversary! 

Quite a lot has changed since 2014, but our mission remains the same. We exist to advance the interests of authors who want to serve the public good by sharing their creations broadly.  I’m pleased to share our 2024 annual report, where you can find highlights of our work this year to promote laws, policies, and practices that enable authors to reach wide audiences.

Our success in 2024 was largely due to the wonderful collaboration and support we have from our members. You’ll see in the report a number of ongoing projects and issues we are working to address: legal questions about open access publishing, rights reversion at scale, supporting text data mining research, addressing contractual override of fair use,  AI and copyright, and more. As we look to 2025, I would love to hear from you if you have a special interest in any of these projects and would like to contribute your ideas, time, or expertise to help us tackle them.

I’m grateful for those of you who contributed financially to make 2024 a success. Authors Alliance is funded almost entirely by gifts and grants, and so we truly rely on you. As we end the year, I hope you will consider giving if you haven’t done so already. You can donate online here.

Thank you,

Dave Hansen
Executive Director 


Restricting Innovation: How Publisher Contracts Undermine Scholarly AI Research

Posted December 6, 2024
Photo by Josh Appel on Unsplash

This post is by Rachael Samberg, Director, Scholarly Communication & Information Policy, UC Berkeley Library and Dave Hansen, Executive Director, Authors Alliance

This post is about the research and the advancement of science and knowledge made impossible when publishers use contracts to limit researchers’ ability to use AI tools with scholarly works. 

Within the scholarly publishing community, mixed messages pervade about who gets to say when and how AI tools can be used for research reliant on scholarly works like journal articles or books. Some scholars voiced concern (explained more here) when major scholarly publishers like Wiley or Taylor & Francis entered lucrative contracts with big technology companies to allow for AI training without first seeking permission from authors. We suspect that these publishers have the legal right to do so since most publishers demand that authors hand over extensive rights in exchange for publishing their work. And with the backdrop of dozens of pending AI copyright lawsuits, who can blame the AI companies for paying for licenses, if for no other reason than avoiding the pain of litigation? While it stings to see the same large commercial, academic publishers profit yet again off of the work academic authors submit to them for free, we continue to think there are good ways for authors to retain a say in the matter. 

 Big tech companies are one thing, but what about scholarly research? What about the large and growing number of scholars who are themselves using scholarly copyrighted content with AI tools to conduct their research? We currently face a situation in which publishers are attempting to dictate how and when researchers can do that work, even when authors’ fair use rights to use and derive new understandings from scholarship clearly allow for such uses. 

How vendor contracts disadvantage US researchers

We have written elsewhere (in an explainer and public comment to the Copyright Office) why training AI tools, particularly in the scholarly and research context, constitutes a fair use under U.S. Copyright law. Critical for the advancement of knowledge, training AI is based on a statutory right already held by all scholarly authors engaging in computational research and one that lawmakers should preserve. 

The problem U.S. scholarly authors presently face with AI training is that publishers restrict their access to these statutory rights through contracts that override them: In the United States, publishers can use private contracts to take away statutory fair use rights that researchers would otherwise hold under Federal law. In this case, the private contracts at issue are the electronic resource (e-resource) license agreements that academic research libraries sign to secure campus access to electronic journal, e-book, data, and other content that scholars need for their computational research.

Contractual override of fair use is a problem that disparately disadvantages U.S. researchers. As we have described elsewhere, more than forty countries, including the European Union, expressly reserve text mining and AI training rights for scientific research by research institutions. Not only do scholars in these countries not have to worry whether their computational research with AI is permitted, but also: They do not risk having those reserved rights overridden by contract. The European Union’s Copyright Digital Single Market Directive and recent AI Act nullify any attempt to circumscribe the text and data mining and AI training rights reserved for scientific research within research organizations. U.S. scholars are not as fortunate. 

In the U.S., most institutional e-resource licenses are negotiated and managed by research libraries, so it is imperative that scholars work closely with their libraries and advocate to preserve their computational research and AI training rights within the e-resource license agreements that universities sign. To that end, we have developed adaptable licensing language to support institutions in doing that nationwide. But while this language is helpful, the onus of advocacy and negotiation for those rights in the contracting process remains. Personally, we have found it helpful to explain to publishers that they must consent to these terms in the European Union, and can do so in the U.S. as well. That, combined with strong faculty and administrative support (such as at the University of California), makes for a strong stance against curtailment of these rights.

But we think there are additional practical ways for libraries to illustrate—both to publishers and scholarly authors—exactly what would happen to the advancement of knowledge if publishers’ licensing efforts to curtail AI training were successful. One way to do that is by “unpacking” or decoding a publisher’s proposed licensing restriction, and then demonstrating the impact that provision would have on research projects that were never objectionable to publishers before, and should not be now. We’ll take that approach below.

Decoding a publisher restriction

A commercial publisher recently proposed the following clause in an e-resource agreement:

Customer [the university] and its Authorized Users [the scholars] may not:

  1. directly or indirectly develop, train, program, improve, and/or enrich any artificial intelligence tool (“AI Tool”) accessible to anyone other than Customer and its Authorized Users, whether developed internally or provided by a third party; or
  2. reproduce or redistribute the Content to any third-party AI Tool, except to the extent limited portions of the Content are used solely for research and academic purposes (including to train an algorithm) and where the third-party AI Tool (a) is used locally in a self-hosted environment or closed hosted environment solely for use by Customer or Authorized Users; (b) is not trained or fine-tuned using the Content or any part thereof; and (c) does not share the Content or any part thereof with a third party.  

What does this mean?

  • The first paragraph forbids the training or improving of any AI tool if it’s accessible or released to third parties. And, it further forbids the use of any computational outputs or analysis that are derived from the licensed content from being used to train any tool available to third parties. 
  • The second paragraph is perhaps even more concerning. It provides that when using third party AI tools of any kind, a scholar can use only limited portions of the licensed content with the tools, and are prohibited from doing any training at all of third party tools even if it’s a non-generative AI tool and the scholar is performing the work in a completely closed and highly secure research environment.

What would the impact of such a restrictive licensing provision be on research? 

It would mean that every single one of the trained tools in the following projects could never be disseminated. In addition, for the projects below that used third-party AI tools, the research would have been prohibited full-stop because the third-party tools in those projects required training which the publisher above is attempting to prevent:

Tools that could not be disseminated

  1. In 2017, chemists created and trained a generative AI tool on 12,000 published research papers regarding synthesis conditions for metal oxides, so that the tool could identify anticipated chemical outputs and reactions for any given set of synthesis conditions entered into the tool. The generative tool they created is not capable of reproducing or redistributing any licensed content from the papers; it has merely learned conditions and outcomes and can predict chemical reactions based on those conditions and outcomes. And this beneficial tool would be prohibited from dissemination under the publisher’s terms identified above.
  2. In 2018, researchers trained an AI tool (that they had originally created in 2014) to understand whether a character is “masculine” or “feminine” by looking at the tacit assumptions expressed in words associated with that character. That tool can then look at other texts and identify masculine or feminine characters based on what it knows from having been trained before. The implications are that scholars can therefore use texts from different time periods with the tool to study representations of masculinity and femininity over time. No licensed content, no licensed or copyrighted books from a publisher can ever be released to the world by sharing the trained tool; the trained tool is merely capable of topic modeling—but the publisher’s above language would prohibit its dissemination nevertheless. 

Tools that could neither be trained nor disseminated 

  1. In 2019, authors used text from millions of books published over 100 years to analyze cultural meaning. They did this by training third-party non-generative AI word-embedding models called Word2Vec and GLoVE on multiple textual archives. The tools cannot reproduce content: when shown new text, they merely represent words as numbers, or vectors, to evaluate or predict how similar words in a given space are semantically or linguistically. The similarity of words can reveal cultural shifts in understanding of socioeconomic factors like class over time. But the publisher’s above licensing terms would prohibit the training of the tools to begin with, much less the sharing of them to support further or different inquiry. 
  2. In 2023, scholars trained a third-party-created open-source natural language processing (NLP) tool called Chemical Data Extractor (CDE). Among other things, CDE can be used to extract chemical information and properties identified in scholarly papers. In this case, the scholars wanted to teach CDE to parse a specific type of chemical information: metal-organic frameworks, or MoFs. Generally speaking, the CDE tool works by breaking sentences into “tokens” like parts of speech and referenced chemicals. By correlating tokens, one can determine that a particular chemical compound has certain synthetic properties, topologies, reactions with solvents, etc. The scholars trained CDE specifically to parse MoF names, synthesis methods, inorganic precursors, and more—and then exported the results into an open source database that identifies the MoF properties for each compound. Anyone can now use both the trained CDE tool and the database of MoF properties to ask different chemical property questions or identify additional MoF production pathways—thereby improving materials science for all. Neither the CDE tool nor the MoF database reproduces or contains the underlying scholarly papers that the tool learned from. Yet, neither the training of this third-party CDE tool nor its dissemination would be permitted under the publisher’s restrictive licensing language cited above.

Indeed, there are hundreds of AI tools that scholars have trained and disseminated—tools that do not reproduce licensed content—and that scholars have created or fine-tuned to extract chemical information, recognize faces, decode conversations, infer character types, and so much more. Restrictive licensing language like that shown above suppresses research inquiries and societal benefits that these tools make possible. It may also disproportionately affect the advancement of knowledge in or about developing countries, which may lack the resources to secure licenses or be forced to rely on open-source or poorly-coded public data—hindering journalism, language translation, and language preservation.

Protecting access to facts

Why are some publishers doing this? Perhaps to reserve the opportunity to develop and license their own scholarship-trained AI tools, which they could then license at additional cost back to research institutions. We could speculate about motivations, but the upshot is that publishers have been pushing hard to foreclose scholars from training and dissemination AI tools that now “know” something based on the licensed content. That is, such publishers wish to prevent tools from learning facts about the licensed content. 

However, this is precisely the purpose of licensing content. When institutions license content for their scholars to read, they are doing so for the scholars to learn information from the content. When scholars write about it or teach about the content, they are not regenerating the actual expression from the content—the part that is protected by copyright; rather the scholars are conveying the lessons learned from the content—facts not protected by copyright. Prohibiting the training of AI tools and the dissemination of those tools is functionally equivalent to prohibiting scholars from learning anything about the content that institutions are licensing for that very purpose, and that scholars have written to begin with! Publishers should not be able to monopolize the dissemination of information learned from scholarly content, and especially when that information is used non-commercially.

For these reasons, when we negotiate to preserve AI usage and training rights, we generally try to achieve the following outcomes which would promote—rather than prohibit—all of the research projects described above:

The sample language we’ve disseminated empowers others to negotiate for these outcomes. We hope that, when coupled with the advocacy tools we’ve provided above, scholars and libraries can protect their AI usage and training rights, while also being equipped to consider how they want their own works to be used.

Developing a public-interest training commons of books

Posted December 5, 2024
Photo by Zetong Li on Unsplash

Authors Alliance is pleased to announce a new project, supported by the Mellon Foundation, to develop an actionable plan for a public-interest book training commons for artificial intelligence. Northeastern University Library will be supporting this project and helping to coordinate its progress.

Access to books will play an essential role in how artificial intelligence develops. AI’s Large Language Models (LLMs) have a voracious appetite for text, and there are good reasons to think that these data sets should include books and lots of them. Over the last 500 years, human authors have written over 129 million books. These volumes, preserved for future generations in some of our most treasured research libraries, are perhaps the best and most sophisticated reflection of all human thinking. Their high editorial quality, breadth, and diversity of content, as well as the unique way they employ long-form narratives to communicate sophisticated and nuanced arguments and ideas make them ideal training data sources for AI.

These collections and the text embedded in them should be made available under ethical and fair rules as the raw material that will enable the computationally intense analysis needed to inform new AI models, algorithms, and applications imagined by a wide range of organizations and individuals for the benefit of humanity. 

Currently, AI development is dominated by a handful of companies that, in their rush to beat other competitors, have paid insufficient attention to the diversity of their inputs, questions of truth and bias in their outputs, and questions about social good and access. Authors Alliance, Northeastern University Library, and our partners seek to correct this tilt through the swift development of a counterbalancing project that will focus on AI development that builds upon the wealth of knowledge in nonprofit libraries and that will be structured to consider the views of all stakeholders, including authors, publishers, researchers, technologists, and stewards of collections. 

The main goal of this project is to develop a plan for either establishing a new organization or identifying the relevant criteria for an existing organization (or partnership of organizations) to take on the work of creating and stewarding a large-scale public interest training commons of books.

We seek to answer several key questions, such as: 

  • What are the right goals and mission for such an effort, taking into account both the long and short-term;
  • What are the technical and logistical challenges that might differ from existing library-led efforts to provide access to collections as data;
  • How to develop a sufficiently large and diverse corpus to offer a reasonable alternative to existing sources;
  • What a public-interest governance structure should look like that takes into account the particular challenges of AI development;
  • How do we, as a collective of stakeholders from authors and publishers to students, scholars, and libraries, sustainably fund such a commons, including a model for long-term sustainability for maintenance, transformation, and growth of the corpus over time;
  • Which combination of legal pathways is acceptable to ensure books are lawfully acquired in a way that minimizes legal challenges;
  • How to respect the interests of authors and rightsholders by accounting for concerns about consent, credit, and compensation; and
  • How to distinguish between the different needs and responsibilities of nonprofit researchers, small market entrants, and large commercial actors.

The project will include two meetings during 2025 to discuss these questions and possible ways forward, additional research and conversations with stakeholders, and the development and release of an ambitious yet achievable roadmap.

Support Authors Alliance!

Posted December 3, 2024

As we end the year, I’m writing to ask for your financial support by giving toward our end-of-year campaign (click here to donate online).

In May, Authors Alliance marked its 10th anniversary. We’ve experienced tremendous support and enthusiasm for our work over the last decade, and your collaboration has been an important part of our success. I hope you’ll help Authors Alliance take on our next decade. 

We’re proud of our work promoting authorship for the public good by supporting authors who write to be read. In the past year, we secured expanded copyright exemptions for text and data mining researchhelped defend authors’ fair use rights in courtlaunched an important initiative to clarify legal pathways for open access to federally funded research, and much more. We’ve also continued to help authors develop a deeper understanding of how complex policy issues can affect their work, drawing over 20,000 attendees for our in-person and online events on topics such as text and data mining, open access, artificial intelligence, and competition law. 

For 2025, we have our work cut out for us. As policymakers actively consider changes to how the law accommodates free expression, access to information, and new technology, we continue to find that we are among the only voices defending authors’ rights to research, write, and share their work for the benefit of the public. Your support for Authors Alliance will help us continue to speak out in support of authors who value the public interest.

Donate Online Today

Thank you,
Dave Hansen
Executive Director

The DMCA 1201 Rulemaking: Summary, Key Takeaways, and Other Items of Interest

Posted November 8, 2024

Last month, we blogged about the key takeaways from the 2024 TDM exemptions recently put in place by the Librarian of Congress, including how the 2024 exemptions (1) expand researchers’ access to existing corpora, (2) definitively allow the viewing and annotation of copyrighted materials for TDM research purposes, and (3) create new obligations for researchers to disclose security protocols to trade associations. Beyond these key changes, the TDM exemptions remain largely the same: researchers affiliated with universities are allowed to circumvent TPMs to compile corpora for TDM research, provided that those copies of copyrighted materials are legally obtained and adequate security protocols are put in place.

We have since updated our resources page on Text and Data Mining and have incorporated the new developments into our TDM report: Text and Data Mining Under U.S. Copyright Law: Landscape, Flaws & Recommendations.

In this blog post, we share some further reflections on the newly expanded TDM exemptions—including (1) the use of AI tools in TDM research, (2) outside researchers’ access to existing corpora, (3) the disclosure requirement, and (4) a potential TDM licensing market—as well as other insights that emerged during the 9th triennial rulemaking.

The TDM Exemption

In other jurisdictions, such as the EU, Singapore, and Japan, legal provisions that permit “text data mining” also allow a broad array of uses, such as general machine learning and generative AI model training. In the US, exemptions allowing TDM so far have not explicitly addressed whether AI could be used as a tool for conducting TDM research. In this round of remaking, we were able to gain clarity on how AI tools are allowed to aid TDM research. Advocates for the TDM exemptions provided ample examples of how machine learning and AI are key to conducting TDM research and asked that “generative AI” not be deemed categorically impermissible as a tool for TDM research. The Copyright Office agreed that a wide array of tools could be utilized for TDM research under the exemptions, including AI tools, as long as the purpose is to conduct “scholarly text and data mining research and teaching.” The Office was careful to limit its analysis to those uses and not address other applications such as compiling data—or reusing existing TDM corpora—for training generative AI models; those are an entirely separate issue from facilitating non-commercial TDM research.

Besides clarifying that AI tools are allowed for TDM research and that viewing and annotation are permitted for copyrighted materials, the new exemptions offer meaningful improvement to TDM researchers’ access to corpora. The previous 2021 exemptions allowed access for purposes of “collaboration,” but many researchers interpreted that narrowly, and the Office confirmed that “collaboration” was not meant to encompass outside research projects entirely unrelated to the original research for which the corpus was created. Under the 2021 exemptions, a TDM corpus could only be accessed by outside researchers if they are working on the same research project as the original compiler of the corpus. The 2024 exemptions’ expansion of access to existing corpora has two main components and advantages. 

The expansion now allows for new research projects to be conducted on existing corpora, permitting institutions that have created a corpus to provide access “to researchers affiliated with other nonprofit institutions of higher education, with all access provided only through secure connections and on the condition of authenticated credentials, solely for purposes of text and data mining research or teaching.” At the same time, it also opens up new possibilities for researchers at institutions who otherwise would not have access, as the new exemption does not require a precondition that the outside researchers’ institutions otherwise own copies of works in the corpora. The new exemptions pose some important limitations: only researchers at institutions of higher education are allowed this access, and nothing more than “access” is allowed—it does not, for example, allow the transfer of a corpus for local use. 

The Office emphasized the need for adequate security protections, pointing back to cases such as Authors Guild v. Google and Authors Guild v. HathiTrust, which emphasized how careful both organizations were, respectively, to prevent their digitized corpora from being misused. To take advantage of this newly expanded TDM exemption, it will be crucial for universities to provide adequate IT support to ensure that technical barriers do not impede TDM researchers. That said, the record for the exemption shows that existing users are exceedingly conscientious when it comes to security. There have been zero reported instances of security breaches or lapses related to TDM corpora being compiled and used under the exemptions. 

As we previously explained, the security requirements are changed in a few ways. The new rule clarifies that trade associations can send inquiries on behalf of rightsholders. However, inquiries must be supported by a “reasonable belief” that the sender’s works are in a corpus being used for TDM research. It remains to be seen how the new obligation to disclose security measures to trade associations would impact TDM researchers and their institutions. The Register circuitously called out demands by trade associations sent to digital humanities researchers in the middle of the exemption process with a two-week response deadline as unreasonable and quoted NTIA (which provides input on the exemptions) in agreement that  “[t]he timing, targeting, and tenor of these requests [for institutions to disclose their security protocols] are disturbing.”  We are hopeful that this discouragement from the Copyright Office will prevent any future large-scale harassment towards TDM researchers and their institutions, but we will also remain vigilant in case trade associations were to abuse this new power. 

Alongside the concerns over disclosure requirements, we have some questions about the Copyright Office’s treatment of fair use as a rationale for circumventing TPMs for TDM research. The Register restated her 2021 conclusion that “under Authors Guild, Inc. v. HathiTrust, lost licensing revenue should only be considered ‘when the use serves as a substitute for the original.’” The Office, in its recommendations, placed considerable weight on the lack of a viable licensing market for TDM, which raises a concern that, in the Office’s view, a use that once was fair and legal might lose that status when the rightsholder starts to offer an adequate licensing option. While this may never become a real issue for the existing TDM exemptions (because no sufficient licensing options exist for TDM researchers, and for the breadth and depth of content needed, it seems unlikely to ever develop), it nonetheless contributes to the growing confusion surrounding the stability of a fair use defense in the face of new licensing markets. 

These concerns highlight the need for ongoing advocacy in the realm of TDM research. Overall, the Register of Copyright recognizes TDM as “a relatively new field that is quickly evolving.” This means that we could ask the Library of Congress to relax the limitations placed on TDM if we can point to legitimate research-related purposes. But, due to the nature of this process, it also means TDM researchers do not have a permanent and stable right to circumvent TPMs. As the exemptions remain subject to review every three years, many large trade associations advocate for the TDM exemptions to be greatly limited or even canceled, wishing to stifle independent TDM research. We will continue to advocate for TDM researchers, as we did during the 8th and 9th triennial rulemaking. 

Looking beyond the TDM exemption, we noted a few other developments: 

Warhol has not fundamentally changed fair use

First, the Opponents of the renewal of the existing exemptions repeatedly pointed to Warhol Foundation v. Goldsmith—the Supreme Court’s most recent fair use opinion—to argue that it has changed the fair use analysis such that the existing exemptions should not be renewed. For example, the Opponents argued that the fair use analysis for repairing medical devices changed under Warhol because, according to them, commercial nontransformative uses were less likely to be fair. The Copyright Office did not agree. The Register said that the same fair use analysis as in 2021 applied and that the Opponents failed “to show that the Warhol decision constitutes intervening legal precedent rendering the Office’s prior fair use analysis invalid.” In another instance where the Opponents tried to argue that commerciality must be given more weight under Warhol, the Register pointed out that under Warhol commerciality is not dispositive and must be weighed against the purpose of the new use.  The arguments for revisiting the 2021 fair use analyses were uniformly rejected, which we think is good news for those of us who believe Warhol should be read as making a modest adjustment to fair use and not a wholesale reworking of the fair use doctrine. 

Does ownership and control of copies matter for access? 

One of the requests before the Office was an expansion of an exemption that allows for access to preservation copies of computer programs and video games. The Office rejected the main thrust of the request but, in doing so, also provided an interesting clarification that may reveal some of the Office’s thinking about the relationship between fair use and access to copies owned by the user: 

The Register concludes that proponents did not show that removing the single user limitation for preserved computer programs or permitting off-premises access to video games are likely to be noninfringing. She also notes the greater risk of market harm with removing the video game exemption’s premises limitation, given the market for legacy video games. She recommends clarifying the single copy restriction language to reflect that preservation institutions can allow a copy of a computer program to be accessed by as many individuals as there are circumvented copies legally owned.”

That sounds a lot like an endorsement of the idea that the owned-to-loaned ratio, a key concept in the controlled digital lending analysis, should matter in the fair use analysis (which is something the Hachette v. Internet Archive controlled digital lending court gave zero weight to). For future 1201 exemptions, we will have to wait and see whether the Office will use this framework in other contexts. 

Addressing other non-copyright and AI questions in the 1201 process

The Librarian of Congress’s final rule included a number of notes on issues not addressed by the rulemaking: 

“The Librarian is aware that the Register and her legal staff have invested a great deal of time over the past two years in analyzing the many issues underlying the 1201 process and proposed exemptions. 

Through this work, the Register has come to believe that the issue of research on artificial intelligence security and trustworthiness warrants more general Congressional and regulatory attention. The Librarian agrees with the Register in this assessment. As a regulatory process focused on technological protection measures for copyrighted content, section 1201 is ill-suited to address fundamental policy issues with new technologies.” 

Proponents tried to argue that the software platforms’ restrictions and barriers to conducting AI research, such as their account requirements, rate limits, and algorithmic safeguards, are circumventable TPMs under 1201, but the Register disagreed. The Register maintained that the challenges Proponents described arose not out of circumventable TPMs but out of third-party controlled Software as a Service platforms. This decision can be illuminating for TDM researchers seeking to conduct TDM research on online streaming media or social media posts.

The Librarian’s note went on to say: “The Librarian is further aware of the policy and legal issues involving a generalized ‘‘right to repair’’ equipment with embedded software. These issues have now occupied the White House, Congress, state legislatures, federal agencies, the Copyright Office, and the general public through multiple rounds of 1201 rulemaking. 

Copyright is but one piece in a national framework for ensuring the security, trustworthiness, and reliability of embedded software, as well as other copyright-protected technology that affects our daily lives. Issues such as these extend beyond the reach of 1201 and may require a broader solution, as noted by the NTIA.”

These notes give an interesting, though a bit confusing, insight into how the Librarian of Congress and the Copyright Office think about the role of 1201 rulemaking when they address issues that go beyond copyright’s core concerns. While we can agree that 1201 is ill-suited to address fundamental policy issues with new technology, it is also somewhat concerning that the Office and the Librarian view copyright more generally as part of a broader “national framework for ensuring the security, trustworthiness, and reliability of embedded software.”  While of course, copyright is sometimes used to further ends outside of its intended purpose, these issues are far from the core constitutional purpose of copyright law and we think they are best addressed through other means. 

Copyright Management Information, 1202(b), and AI

Posted October 30, 2024

This post is by Maria Crusey, a third-year law student at Washington University in St. Louis. Maria has been working with Authors Alliance this semester on a project exploring legal claims in the now 30+ pending copyright AI lawsuits. 

In the recent spate of copyright infringement lawsuits against AI developers, many plaintiffs allege violations of 17 U.S.C. § 1202(b) in their use of copyrighted works for training and development of AI systems.  

Section 1202(b) prohibits the “removal or alteration of copyright management information.” Compared to related provisions in 17 U.S.C. § 1201, which protects against circumvention of copyright protection systems, §1202(b) has seldom been litigated at the appellate level, and there’s a growing divide among district courts about whether §1202(b) should apply to derivative works, particularly those created using AI technology.

At first glance, §1202(b) appears to be a straightforward provision. However, the uptick in §1202(b) claims raises some challenging questions, namely: How does §1202(b) apply to the use of a copyrighted work as part of a dataset that must be cleaned, restructured, and processed in ways that separate copyright management information from the content itself? And how should 1202(b) apply to AI systems that may reproduce small portions of content contained in training data?  Answers to this question may have serious implications in the AI suits because violations of 1202(b) can come with hefty statutory damage awards – between $2,500 and $25,000 for each violation. Spread across millions of works, the damages could be staggering. How the courts resolve this issue could also impact many other reuses of copyrighted works–from analogous uses such as text data mining research to much more routine re-distribution of copyrighted works in other contexts. 

One of these AI cases has requested that the Ninth Circuit Court of Appeals accept an interlocutory appeal on just this issue, and we are waiting to see whether the court will accept it.

For an introduction to §1202(b) and observations on this question, among others, read on:

What is § 1202(b) and what is it intended to do?

Broadly, 17 U.S.C. § 1202 is a provision of the Digital Millennium Copyright Act (DMCA) that protects the integrity of copyright management information (“CMI”). Per §1202(c), CMI comprises certain information identifying a copyrighted work, often including the title, the name of the author, and terms and conditions for the use of a work.

Section 1202(b) forbids the alteration or removal of copyright management information. The section provides that:

“[n]o person shall, without the authority of the copyright owner or the law – 

(1) intentionally remove or alter any CMI,

(2) distribute or import for distribution CMI knowing that the CMI has been removed or altered without authority of the copyright owner or the law, or 

(3) distribute, import for distribution, or publicly perform works, copies of works or phonorecords, knowing that copyright management information has been removed or altered without authority of the copyright owner or the law, knowing, or with respect to civil remedies under section 1203, having reasonable grounds to know that it will induce, enable, facilitate, or conceal an infringement of any right under this title.”

17 U.S.C. § 1202(b).

Congress primarily aimed to limit the assistance and enablement of copyright infringement in its enactment of §1202(b). This purpose is evident in the legislative history of the provision. In an address to a congressional subcommittee prior to the adoption of the DMCA, the then–Register of Copyrights, Marybeth Peters, discussed the aims of §1202(b). First, Peters noted that the requirements of §1202(b) would make CMI more reliable and thus aid in the administrability of copyright law. Second, Peters stated that §1202(b) would help prevent instances of copyright infringement that could come from the removal of CMI. The idea is if a copyrighted work lacks CMI, there is a greater likelihood of infringement since others may use the work under the pretense that they are the author or copyright holder. In creating a statutory violation for a party’s removal of CMI, regardless of later infringing activity, §1202(b) functions as damage control against potential copyright infringement.

What are the essential elements of a § 1202(b) claim?

To have a claim under §1202(b), a plaintiff must allege particularized facts about the existence and alteration or removal of CMI. Additionally, some courts require a plaintiff to demonstrate that the defendant had knowledge that the CMI was being altered or removed and that the alteration or removal would enable copyright infringement. Finally, some courts have required plaintiffs to show that the work with the altered or removed CMI is an exact copy of the original work–what has become known as the “identicality” requirement. This last “identicality” requirement is one of the main issues in the AI lawsuits raising §1202(b) and is detailed further below.

→ The “Identicality” Requirement

Courts that have imposed “identicality” have required that plaintiffs demonstrate that the work with the removed CMI is an exact copy of the original work and thus is “identical,” except for the missing or altered CMI. 

Suppose, for example, a photographer owns the copyright to a photograph they took. The photographer adds CMI to the photograph and takes care to protect the integrity of the work as it is dispersed online. A third party captures the photograph posted on a website by taking a screenshot and removes the CMI from the copied image while keeping all other aspects of the original photograph the same. The screenshot with the removed CMI is an “exact copy” of the original photograph because the only difference between the copyrighted photograph and the screenshot is the removal of the CMI.

Federal courts are divided in imposing the identicality requirement for §1202(b) claims, though the circuit courts have not yet addressed the issue. Notably, district courts of the Ninth Circuit Court of Appeals have varied in their treatments of the identicality requirement. For example, the court for the District of Nevada in Oracle v. Rimini Street declined to impose the identicality requirement because the requirement may weaken the intended protections for copyright holders under §1202(b). Conversely, in Kirk Kara Corp. v. W. Stone & Metal Corp., a court in the Central District of California applied the identicality requirement, though it provided little explanation for why it adopted it. Application of the identicality requirement is also unsettled in district courts beyond the Ninth Circuit (see, for example, this Southern District of Texas case discussing at length the identicality requirement and rejecting it). 

What are the §1202(b) claims at issue in the present suits?

The claims in Doe 1 v. Github exemplify the §1202(b) issues common among the present suits, and it is the Github suit that is presently before the Ninth Circuit Court of Appeals to take, if it wishes, on appeal.  

In Github, owners of copyrights in software code brought a suit against GitHub, a software developer platform. The plaintiffs alleged that Microsoft Copilot, an AI product developed in part by GitHub, illegally removed CMI from their works. The plaintiffs stored their software in GitHub’s publicly accessible software repositories under open-source license agreements. The plaintiffs claimed that GitHub removed CMI from their code and trained the Copilot AI model on the code in violation of the license agreements. Moreover, the plaintiffs claimed that, when prompted to generate software code, Copilot includes unique aspects of the plaintiffs’ code in its outputs. In their complaint, the plaintiffs alleged that all requirements for a valid § 1202(b) claim were met in the present suit. The plaintiffs stressed that, in removing CMI, the defendants failed to prevent users of products from making non-infringing use of the product. Consequently, they claim, the defendants removed the CMI, knowing that it would “induce, enable, facilitate, and/or conceal infringement” of copyrights in violation of the DMCA.

Regarding the §1202(b) claims, the parties contest the application of the identicality requirement. The plaintiffs first argue that § 1202 contains no such requirement: “The plain language of DMCA § 1202 makes it a violation to remove or alter CMI. It does not require that the output work be original or identical to obtain relief. . . By a plain reading of the statute, there is no need for a copy to be identical—there only needs to be copying, which Plaintiffs have amply alleged.” 

As a backstop, the plaintiffs further argue that Copilot does produce “near-identical reproduction[s]” of their copyrighted code and allege this is sufficient to fulfill the identicality requirement under §1202(b). Specifically, plaintiffs claimed that Copilot generates parts of plaintiffs’ code in extra lines of output code that are not relevant to input prompts. Plaintiffs also claimed Copilot generates their code in output code that produces errors due to a mismatch between the directly copied code and the code that would actually fit the prompt. To make this assertion work, plaintiffs distinguish their version of “identicality” –semantically equivalent lines of code–from a reproduction of the whole work. They argue that the defendant’s position, that “the reproduction of short passages that may be part of [a] larger work, rather than the reproduction of an entire work, is insufficient to violate Section 1202,” would lead to absurd results. “By OpenAI’s logic, a party could copy and distribute a fragment of a copyrighted work—say, a chapter of a book, a stanza of a poem, or a scene from a movie—and face no repercussions for infringement.” 

 In their reply, the defendants countered that §1202, which defines CMI as relating to a “copy of a work,” requires a complete and identical copy, not just snippets. Defendants noted that the plaintiffs have conceded that Copilot reproduces only snippets of code rather than complete versions of the code. Therefore, the defendants argue, Copilot does not create “identical copies” of the plaintiffs’ complete copyrighted works. The argument is based on both the text of the statute (they note that the statute only provides for liability when distributing copies that CMI has been stripped from, not derivatives, abridgments, or other adaptations), and they bolster those arguments by suggesting that allowing 1202 claims for incomplete copies would create chaos for ordinary uses of copyrighted works: “On Plaintiffs’ reading of § 1202, if someone opened an anthology of poetry and typed up a modified version of a single “stanza of a poem,” . . . without including the anthology’s copyright page, a § 1202(b) claim would lie. Plaintiffs’ reading effectively concedes that they are attempting to turn every garden-variety claim of copyright infringement into a DMCA claim, only without the usual limitations and defenses applicable under copyright law. Congress intended no such thing.” 

The GitHub court has addressed the issue now several times: it initially dismissed the plaintiffs’ §1202(b)(1) and (b)(3) claims, subsequently denied the plaintiffs’ motion for reconsideration of the claims, allowed the plaintiffs to amend their complaint and try again with more specificity, then dismissed the claims again. The reasoning of the court has been consistent, and largely focused on insufficient allegations of identicality. The court agreed with Defendants that the identicality requirement should apply and that the snippets do not satisfy the requirement. Following the dismissal, the plaintiffs sought and received permission from the district court to file an interlocutory appeal (an appeal on a specific issue before the case is fully resolved– something not usually allowed) to the Court of Appeals for the Ninth Circuit to determine whether § 202(b)(1) and (b)(3) impose an identicality requirement. The Ninth Circuit is presently considering whether to hear the appeal.

What would the Ninth Circuit assess in the appeal, and what are the implications of the appeal for future lawsuits?

If the appeal is accepted, the Ninth Circuit will determine whether §1202(b)(1) and (b)(3) actually impose an identicality requirement. Moreover, with regard to the facts of the Github case, the court will decide whether the identicality requirement requires exact copying of a complete copyrighted work, or perhaps something less. The Ninth Circuit’s hearing of this appeal would be notable for a number of reasons.

First, as mentioned above, §1202(b) is largely unaddressed by the circuit courts, and explicit appellate guidance has only been provided for the knowledge requirement referenced above. Consequently, determinations of §1202(b) claims are largely informed by varying district court decisions that are binding only on the parties to the suits and provide inconsistent interpretations of the requirements for a claim under the provision. An appellate ruling that accepts or rejects the identicality requirement would create additional binding authority to further clarify courts’ interpretations of §1202(b).

Second, a ruling on the identicality requirement from the Ninth Circuit specifically would be notable because it would be binding on the large number of §1202(b) claims presently being litigated in the Ninth Circuit’s lower courts. And, given the centrality of AI developers operating in California and elsewhere in the Ninth Circuit, the outcome of the appeal would significantly impact future lawsuits that involve §1202(b) claims.

It is hard to predict how the Ninth Circuit might rule, but we can work through some of the implications of the choices the court would have before it: 

If the Ninth Circuit interprets the identicality requirement as requiring a complete and exact copy, it would impose a high standard for the requirement and plaintiffs would likely be constrained in their ability to bring §1202(b) claims. If the court did this, the Github plaintiffs’ claims would likely fail as the alleged copied snippets of code generated by Copilot are not exact copies and do not comprise the complete copyrighted works. This hypothetical standard would be advantageous for individuals who remove CMI from copyrighted works in the course of processing them using AI as well as those who deploy AI systems that produce small portions of content similar (but not exactly so) to inputs.  So long as the works being processed or distributed are not complete exact copies, individuals would be free to alter the CMI of the works for ease in analyzing the copyrighted information. 

Alternatively, the Ninth Circuit could adopt a loose interpretation of identicality in which incomplete and inexact copying would be sufficient. One approach would be to require identicality but not copying of the entire work (something the plaintiffs in the Github suit advocate for). How the parties or the Ninth Circuit would formulate what standard would apply to this “less than entire” but still “near identical” standard is hard to say, but presumably, plaintiffs would have an easier time alleging facts sufficient for a §1202(b) claim. Applied to Github, it still seems unclear that the copied snippets of the plaintiffs’ code in the Copilot outputs could pass muster (this is likely a factual question to be determined at later stages of the litigation). But it could allow claims to at least survive an early motion to dismiss. As such, the adoption of this standard could limit how AI developers engage with works but also potentially affect others, such as researchers using similar techniques to process, clean, and distribute small portions of copyrighted works as part of a dataset.

Finally, the Ninth Circuit may decide to do away with the identicality requirement altogether. While this may seem like a potential boon to plaintiffs, who could allege that removal of CMI and distribution of some copied material, no matter how small, plaintiffs would still face substantial challenges.  Elimination of the identicality requirement would likely lead to greater weight being placed on the knowledge requirement in courts’ assessments of §1202(b) claims, which requires that defendants know or have reasonable grounds to know that their actions will “induce, enable, facilitate, or conceal an infringement.” In the context of the Github case, even without an identicality requirement, plaintiffs §1202(b) claims contain scant factual allegations about the defendants’ CMI removal and knowledge in the court filings to date. For other developers and users of AI, the effects of not having an identicality requirement would likely vary on a case-by-case basis. 

Conclusion

Recent copyright infringement suits and the pending appeal to the Ninth Circuit in Doe 1 v. Github demonstrate that §1202(b) is having its day in the sun. Although the provision has been overlooked and infrequently litigated in the past, the scope of protections granted by §1202(b) is important for understanding whether and how AI developers can remove CMI when using copyrighted works to process, restructure, and analyze copyrighted works for AI development. Thus, as lawsuits against AI developers and users continue to progress, the requirements to have a valid §1202(b) claim are sure to become even more contentious.