Author Archives: Dave Hansen

Government-Operated Platforms and the First Amendment in Schiff v. U.S. Office of Personnel Management

Posted March 25, 2025

This is a post authored by Maria Crusey, an intern with Authors Alliance and a third-year law student at Washington University School of Law. 

Introduction

Tension between government actions and freedom of speech under the First Amendment is nothing new. Since the adoption of the First Amendment, individuals and entities have alleged government actors violate free speech rights through actions ranging from the establishment of campaign finance laws to the imposition of book bans. In the digital age, allegations of free speech violations by government actors have started to take new forms that are largely made possible by technological developments. Most recently, in Schiff v. U.S. Office of Personnel Management, filed on March 12, 2025 in the District of Massachusetts, the plaintiffs allege direct censorship by the government through the takedown of select publications posted on a government-operated platform. There are few prior lawsuits in which plaintiffs allege direct censorship by the government on a government-operated platform, and most relevant lawsuits instead allege the government engaged in “censorship by proxy” through requiring online platforms to suppress particular kinds of speech and expression. Consequently, we can look to the paradigmatic “censorship by proxy” case, Murthy v. Missouri, to anticipate how a court might assess the legal arguments posited in Schiff.

The Complaint

The alleged First Amendment violation in Schiff arises out of the takedown of the plaintiffs’ scholarly publications from Patient Safety Network (PSNet), an online platform that publishes research articles and resources about patient safety. The plaintiffs argue that the removal only of articles that may “inculcate or promote” the government’s definition of “gender ideology” pursuant to a recent Executive Order imposes a viewpoint-based restriction in a government forum, PSNet, and the removal of such articles is not reasonable in light of the purpose of PSNet. Moreover, the plaintiffs argue they suffer ongoing undue and actual hardship and irreparable injury from the removal of their articles from PSNet and have no adequate remedy at law to correct this injury. As such, the plaintiffs seek to preliminarily and permanently enjoin the defendants from further censoring their research through implementation of the executive order.

PSNet is operated by the Agency of Healthcare Research and Quality (AHRQ), an executive branch agency within the federal government and a part of the Department of Health and Human Services. PSNet is a leading resource for information on patient safety in the United States, and all content is free and accessible to the public. PSNet’s scholarly publication facet is managed by its editorial team, a group of editors and a librarian that review submissions and select content for publication in various PSNet collections.

One such collection is PSNet’s Case Studies series. Publications in the Case Studies series are sourced from online form submissions by healthcare providers that include a case description of a given medical error and a short recommendation for how healthcare providers or systems might prevent similar errors from happening in the future and thus increase patient safety. PSNet’s editorial team selects submissions for publication based on a number of criteria, including clinical interest and educational value. Following selection of a case, the editorial team invites healthcare providers to submit a commentary based on the case. All articles published by PSNet include a disclaimer that “[r]eaders should not interpret any statement in this report as an official position of AHRQ or of the U.S. Department of Health and Human Services.”

Plaintiffs Gordon Schiff, M.D., and Celeste Royce, M.D., are associate professors of medicine at Harvard Medical School who publish research articles about patient safety in their respective medical specialties. Both have separately published articles about patient safety on PSNet. PSNet accepted a commentary written by Dr. Schiff and co-authors for a case on suicide assessment and prevention. In the publication process, the authors and the editorial team exchanged multiple drafts of the commentary. The second draft of the commentary included a sentence describing “high risk groups” for suicide as including individuals who identify as lesbian, gay, bisexual, or queer/questioning. The PSNet editorial team did not substantively modify this part of the sentence prior to publication, and the case commentary was published on PSNet on January 7, 2022 under the title “Multiple Missed Opportunities for Suicide Risk Assessment” and included the aforementioned disclaimer.

In a separate publication cycle, Dr. Royce and a co-author submitted a commentary for publication in the Case Studies series on delayed diagnosis of endometriosis. Like Dr. Schiff, Dr. Royce exchanged multiple drafts of her commentary with the PSNet editorial team in the publication process. The commentary included text stating that “endometriosis can occur in trans and non-gender-conforming people” and that lack of understanding of this fact could make diagnosis more challenging. No substantive comments or changes dealt with the statement that “endometriosis can occur in trans and non-gender-conforming people.” Dr. Royce’s case commentary was published on PSNet on June 24, 2020 under the title “Endometriosis Commentary” and included the aforementioned disclaimer.

On January 2, 2025, President Trump issued Executive Order 14168, which directed government agencies to combat “gender ideology,” which “replaces the biological category of sex with an ever-shifting concept of self-assessed gender identity,” by removing all statements that promote or otherwise inculcate gender ideology from federal platforms. The Office of Personnel Management subsequently issued a memo instructing all agencies, including AHRQ that oversees PSNet, to “take down all outward facing media that inculcate or promote gender ideology” and report all steps taken to implement this instruction. AHRQ removed articles from PSNet that contained words or terms that “inculcate or promote” the government’s definition of “gender ideology” and took down the plaintiffs’ articles on January 31, 2025.

Following the takedowns, PSNet’s editorial team shared separate emails with the plaintiffs explaining that Dr. Schiff’s article was removed due to inclusion of the words “transgender” and “LGBTQ” in his article and that Dr. Royce’s article was removed for its inclusion of the phrase “endometriosis can occur in trans and non-gender-conforming people” and the description that it may make diagnosis in those populations more challenging. AHRQ subsequently offered to repost the plaintiffs’ articles on the condition that the plaintiffs would remove the language in violation of the executive order. Both plaintiffs declined after AHRQ did not agree to their proposed revisions. As of March 12, 2025, both articles remain unavailable on PSNet.

So far as relief, the plaintiffs request that the court declare: (1) that the Office of Personnel Management’s internal direction to “take down all outward facing media . . . that inculcate or promote gender ideology” as applied to speech by private individuals and organizations on government-run forums is unconstitutional and unlawful and (2) that AHRQ’s implementation of the direction by removing or altering speech of private individuals and organizations on government-run forums is unconstitutional and unlawful.

“Censorship by Proxy” in Murthy v. Missouri
In recent years, the government has been sued for engaging in “censorship by proxy” through requiring digital platforms that widely share information to engage in content-moderation practices to comport with government standards. Multiple “censorship by proxy” suits are currently working their way through the federal courts, including one contesting state book bans and another contesting government-promoted “blacklist tools” that allegedly strip media organizations of advertising revenue, and the outcome of the suits appear to be fact-dependent. The most recent Supreme Court decision on “censorship by proxy” in Murthy v. Missouri found the plaintiffs were barred from alleging First Amendment violations due to procedural deficiencies in their claims. 

In Murthy v. Missouri, two states and five individual social-media users sued executive branch agencies and officials, alleging that they pressured social-media platforms to suppress protected speech. Specifically, the plaintiffs alleged that, during the COVID-19 pandemic, Facebook and Twitter deleted posts deemed to be false regarding measures to combat the spread of the virus, as well as “conspiracy theories” about the origins of the virus. The plaintiffs also alleged these same content-moderation practices were applied to content related to the 2020 United States presidential election. The district court issued the plaintiffs a preliminary injunction, finding that executive branch and agency actors likely coerced and significantly encouraged platforms to engage in content-moderation decisions that effectively were decisions of the government. The Fifth Circuit affirmed this part of the injunction, finding that both groups of plaintiffs had standing. The individuals had standing because the social media companies had suppressed the plaintiffs’ speech and were likely to do so again in the future, and the states had standing because the platforms limited the states’ “right to listen” to their citizens on social media.

On appeal, the Supreme Court reversed on the basis that neither the individual nor the state plaintiffs had standing. To have standing to bring suit in federal court, a plaintiff must demonstrate they have suffered or will suffer an injury that is concrete, particularized, and imminent; fairly traceable to the challenged action in the suit; and redressable through a favorable ruling by a court. As a preliminary determination, the Court found that the plaintiffs did not seek to enjoin the appropriate parties—the social-media platforms—for their “direct censorship injuries.” Because social-media platforms, not government actors, were alleged to implement the alleged censorship, the Court stated it was incapable of redressing injuries resulting from third party defendants not present in the suit. As to the traceability of the alleged injuries, the Court found the platforms had “independent incentives” to moderate content relating to the COVID-19 pandemic and 2020 presidential election. While the government defendants played some role in the moderation choices, there was no evidence to suggest every content-moderation decision alleged to be censorship was made under government direction. As such, the plaintiffs lacked a fairly traceable injury and did not have standing to sue.

The Court also found that the lower courts improperly enjoined the government defendants given the lack of standing. The fact that the plaintiffs sought forward-looking relief in the form of an injunction made any past alleged injuries relevant only if they were predictive of future censorship activities. The record did not include any specific findings about the causation of discrete instances of content moderation. Rather, the lower courts instead relied on statements that the platforms censored particular viewpoints on issues and the government defendants engaged in “a years-long pressure campaign” to ensure view-point suppression. The Court rejected these assertions as overly broad and unsupported by the record. Additionally, the Court found the plaintiffs misplaced their reliance on past government censorship as evidence that future censorship was likely. Because the plaintiffs failed to trace the alleged censorship by the platforms to the government’s role in the platforms’ content-moderation, their prior harms could not be used to establish standing to seek an injunction to prevent future harms.

The Court concluded by considering, and declining to agree with, the plaintiffs’ counterarguments in turn. First, the Court found the plaintiffs’ allegation that they suffered ongoing harm from having to self-censor on social media did not support standing. Specifically, as provided in precedent caselaw, the plaintiffs could not “manufacture standing merely by inflicting harm on themselves based on their fears of hypothetical future harm that is not certainly impending.” Second, the plaintiffs’ argument that the platforms continued to suppress their speech per policies initially adopted at the direction of the government failed due to a lack of redressability. The Court indicated that the lack of ongoing pressure from the government made the platforms’ continued content-moderation attributable only to the platforms, not the government, even though the practices may have initially been imposed due to governmental coercion. Moreover, as the government scaled back pandemic response measures, the platforms continued to enforce COVID-19 misinformation policies to the same degree as during the pandemic. Collectively, the Court found these facts suggested that the ongoing harm could only be redressed by legal action against the platforms, not the government, and that such redress could not be pursued in the suit.

Implications of Murthy v. Missouri in Schiff

It is difficult to anticipate how a court’s analysis will proceed in Schiff , but similarities and differences between the Supreme Court’s decision in Murthy and the pending suit in Schiff provide insights as to how a court may assess government censorship imposed through a government-owned platform.

Like the plaintiffs in Murthy, the Schiff plaintiffs sued for a preliminary injunction against the government actors in the case. As a result, they undoubtedly must fulfill the standing requirements necessary to bring suit for an injunction. Several aspects of the Schiff plaintiffs’ situation may help them avoid the pitfalls the Murthy plaintiffs faced in their standing analysis on appeal. 

First, the harms alleged by the two Schiff plaintiffs appear to be supported by allegations that are more detailed than those of the Murthy plaintiffs. The Murthy plaintiffs alleged sweeping allegations of government censorship on behalf of multiple individuals and states in a single action, and, as noted by the Supreme Court, the Murthy plaintiffs’ amended complaint provided few specifics about the causation of individual instances of censorship by particular government defendants. Conversely, the Schiff plaintiffs plead specific facts about a few particular government agencies and particular government actors who engaged in discrete acts to takedown the plaintiffs’ online publications in accordance with a particular order issued directly by the executive branch. On their face, these facts appear better positioned to allege a “concrete and particularized harm.”

Second, the traceability of the harms to the plaintiffs, as well as their imminency, may be more easily demonstrated in Schiff. The primary pitfall of the Murthy plaintiffs’ standing argument was the lack of traceability of the act of censoring through content moderation to the government defendants because the government was not involved in every content-moderation action taken by the third-party social media platforms. In Schiff, the lack of a non-government third party in the implementation and execution of the publication takedowns may strengthen an argument of traceability and prevent the disconnect between government actors and conduct like that in Murthy. Moreover, the facts that the Schiff plaintiffs’ publications were taken down pursuant to an ongoing executive order; that the plaintiffs were not offered viable alternative methods to restore their online publications in accordance with their protected speech; and that the publications have not been republished on PSNet as of March 12, 2025 could suggest the executive order poses an ongoing First Amendment violation.

Third, redressability of the harms to the plaintiffs through enjoining the defendants may be more successful in Schiff than in Murthy. In Murthy, the Supreme Court found redress could not be achieved through enjoining the government defendants given the defendants’ lack of ongoing involvement in the alleged censorship. Since there was no evidence that the platforms’ current content-moderation practices were pursuant to government directives, enjoining the government would not remedy the plaintiffs’ ongoing First Amendment violations. In Schiff, the fact that the plaintiffs’ alleged ongoing First Amendment violations (removal of their online publications from PSNet) are driven by a government directive (the executive order), enjoining the government defendants from enforcing the executive order may be more likely to redress the plaintiffs’ ongoing harm.

Additional Considerations in Schiff

Another feature of Schiff that warrants discussion is the justification provided by the U.S. Office of Personnel Management for adopting its guidelines that implement the executive order to the operations of the AHRQ. Per the complaint, Charles Ezell, the Acting Director of the U.S. Office of Personnel Management, cited 5 U.S.C. § 1103(a)(1) and (5) as grounds for his authority to provide the guidelines as the Director is charged with “securing accuracy, uniformity, and justice in the functions of the office” and “executing, administering, and enforcing” the civil service rules and regulations of the President and the Office and laws governing the civil service. Both of these responsibilities of the Director may be implicated in the suit as arguments advanced by the government to justify the implementation of the executive order.

First, the government could argue that the Director’s implementation of the guidelines and takedown of the plaintiffs’ publications were appropriate given his charge to secure the “accuracy” of the office. The plaintiffs’ publications that allegedly “promote or inculcate gender ideology” by suggesting there are only two sexes could be argued to be “inaccurate” in the eyes of the executive branch and thus takedown of the articles was necessary to further “accuracy” in the functions of the office. 

Second, the government could argue that the Director’s implementation of the guidelines and takedown of the plaintiffs’ publications are necessary as an execution and enforcement of the “civil service rules and regulations of the President and Office.” This argument would hinge upon whether the executive order can be classified as a “civil service rule and regulation.”

If the plaintiffs allege a viable First Amendment claim, both of the above arguments could play into the justification the government would need to provide to excuse the First Amendment violation. The plaintiffs allege the takedown of their publications constitutes a view-point discriminatory restriction on their protected speech. If the executive order is found to be a view-point discriminatory restriction, the First Amendment violation would be assessed under a “strict scrutiny” standard of review, in which the government would have to demonstrate that the executive order is “narrowly tailored” and necessary to achieve a “compelling governmental interest.” 

On its face, the executive order appears quite broad with its order that agencies remove all statements that “promote or inculcate gender ideology.” As such, it is uncertain whether it would satisfy the “narrowly tailored” requirement. Moreover, it is uncertain whether combating gender ideology and promoting a policy that there are only two sexes pass muster as a “compelling governmental interest.” The fact that this policy is enshrined in an executive order may heighten the importance of the interest. However, the language justifying the policy in the executive order itself is vague, providing only that “deny[ing] the biological reality of sex . . . is wrong” and “[t]he erasure of sex in language . . . has a corrosive impact not just on women but on the validity of the entire American system.” Consequently, whether the policy advanced by the executive order constitutes a compelling governmental interest may similarly be an open question.

Conclusion

As the Schiff plaintiffs await an appearance from the defendants in the suit, it is uncertain how broadly the executive order will continue to be applied. The plaintiffs in Schiff report that the Office of Personnel Management guidelines resulted in the removal of at least 20 total publications from PSNet, but the number of takedowns attributable to the executive order on other government-operated platforms is unknown. As proceedings in Schiff progress and additional suits may be filed that similarly allege First Amendment violations by the executive order, direct censorship by the government through government-operated platforms is sure to become an even more contentious issue.

AI Licensing: An Interview with Ben Denne of Cambridge University Press

Posted March 17, 2025

We’ve heard from lots of authors with questions about AI licensing of their works by their publishers. Cambridge University Press is one that has been in the news because it has undertaken a project to ask authors to opt into a contract addendum that would allow CUP to license AI rights for their books, giving authors a royalty on AI licensing net revenue. Cambridge has shared an FAQ with authors already, along with a further explanation of its approach last September and a report in January highlighting that it had contacted some 17,000 authors, the majority of whom have opted in. 

Below is an interview with Ben Denne, Director of Publishing, Academic Books, at Cambridge University Press, answering some questions about the program. 

Dave: Thank you, Ben, for talking with me. To start off, could you say what your role is at Cambridge University Press?

Ben: I’m the Director of Publishing for the Academic Books part of the Academic Division of Cambridge. In short, I’m the director overseeing the whole of the books program, the Academic Books program for Cambridge, except for the Bibles. That’s a specialist unit that runs separately that I don’t have anything to do with, but that means our textbooks, our research and reference books, and then we have a kind of small program of more traditional academic titles that sell a bit more to a bit of a wider audience.

Dave:  Thanks. My interest in talking with you is about generative AI licensing. And we’ve had quite a few authors actually forward us some emails that they’ve gotten from Cambridge presenting an AI license addendum to sign that goes with their contract and also an FAQ. I’d like to ask just a few questions about how that’s going and how that works.

What are Cambridge University Press’ goals with AI licensing?

Ben:  That’s a really good question. Broadly speaking, when this started to come our way, which was the same time a couple of years ago as this subject became really noisy. We’re looking at it and thinking, what’s the best way through this? How do we appropriately engage in this conversation? And I think it came back to us thinking about encouraging responsible use and thinking about our role as an academic publisher. 

And I think our role as an academic publisher is to push the academic debate forward, which means that we want our authors’ books to get read. We want them to get used. We want them to get cited. I think that’s really the kind of spirit we came into this conversation with is thinking, these developments are happening, right, that they’re happening anyway and the best thing we can do as a publisher is try and engage with this debate and push it in a direction that we think really helps to underline those principles of how good research is done.

Dave:  One of the things that I’ve seen with CUP’s rollout with this asking authors is, first of all, that you are asking authors.  Could you talk me through that decision? We’ve seen some other publishers in the news just announce that they have licensing deals with technology companies, and there was no outreach to authors as far as we can tell from those publishers. So could you talk through that thought process of this outreach?

Ben: Sure, so for us, when we first looked at this, we have a contract that authors sign, which is, probably in many ways, very similar to contracts that they signed with other publishers, and it includes all sorts of clauses about use and wide ranging licensing rights.  And one of the things it covers is derivative uses for content and the right to make derivatives. Our sense with that is when we looked at this in the context of these AI conversations and licensing, from a legal perspective, we looked at that and thought, well actually that derivative use clause technically does cover us for this kind of work. And I’m sure that’s the conclusion that some other people have reached too.

But we also thought, it just feels a bit like nobody knew that this kind of technology was emerging when they signed those contracts. And so from our perspective, we thought there’s a lot of noise about this subject in the whole ecosystem right now, you know, you can’t read the news without reading about AI, and people are nervous about it, understandably, and all of those kinds of things. So we felt that we should treat this as additional consent and approach it in that spirit. And that really underpins the decision to go out with the addendum for existing contracts.

I don’t want to jump onto any of your other questions, but that kind of principle, that we were going to ask for opt-ins, was important. Authors have to actively opt into this. We’re not saying to them. “if we don’t hear from you, we’ll assume you’ve opted in.” They have to actually come back to us and say that they’re happy for that use to happen.

Dave:  I think one of the things a lot of people don’t think about is how complicated rights clearance is, especially at scale, across a title list that is the size that you have. So this seems to me like a pretty big investment in just doing this process. Could you say how many of these  you have sent out? I gather that you’re doing this in batches, but do you have a sense of the scale of how many author addendum requests you anticipate making over the course of however long this process lasts?

Ben: It’s a really good question and it’s a moving target. At this stage, we have sent out multiple thousands. But I think we have about 45,000 books available in print and digitally at the moment. And we’re working our way through that list systematically. So we’re in the thousands and you’re right. It is a pretty big undertaking you know it’s quite a logistical challenge to do. We had to set up a whole kind of new workflow for doing this. We have a team that are working on the addenda and addressing the questions that authors have and all of those kinds of things.

Dave: This is maybe getting in the weeds, but it seems to me like there’s a pretty big difference between figuring this out for a sole-authored, single-part monograph, for instance, which is mostly what I’ve seen come through, and edited volumes. Have you tried to figure out those more complex books with multiple authors, multiple works within them?

Ben: Yeah, so the way it’s working for us is where we have several contracted authors for a book, we’re contacting them all and all of those authors have to opt in in order for us to agree that we have the licensing rights.

For edited volumes with multiple contributors, we’re not contacting the individual contributors for opt-in and there are a couple of different reasons for that. Typically, they don’t get paid royalties and also it would just be impossible for us to do. I mean, that’s logistically, you know, that’s a huge ask. So what we are doing is we are still contacting the editors for those volumes and the editors will opt in or not. So if the editor opts in, our understanding is that they’re opting in on behalf of the contributors as well.

But for multi-authored works, we get in touch with all of them. And in fact, we have quite a sizable number of books which are stuck because we’ve had some authors opt-in and some authors not opt-in.

Dave: This is a pretty fast-moving technology and I think a lot of authors are feeling just uncertain right now. And so I wonder about the opt-in window, if an author declines to opt in right now, is that it? Is there an opportunity to come back later after the dust settles and say, oh, no, actually, you know, I’d be happy to have my work used in this way? 

Ben:  Yeah, definitely. We’re in the process of putting something in place so that if authors don’t opt in now they are able to come back and opt in later. And by the way, if they don’t opt in, that’s fine for all the reasons that you just said;  some people are queasy about this and that’s okay. We’re not trying to, we’re not putting a hard sell on it

My sense with this is that for some of the people that we’re speaking to who haven’t opted in, it is because they haven’t yet really seen what the kind of use cases are for this kind of technology. Perhaps as those become more public, people will want to come back and opt in. 

I think some of the things that are out there are going to be quite powerful discovery tools in the future. So we want to make sure the authors do have the opportunity to opt in later if they want to, although we can’t, of course, be sure that if people opt in later the same opportunities will necessarily be available then, since this is quite a fast moving area.

Dave:  For your contracts moving forward for front list books, is a clause like this now a default in those agreements or will authors of new books have the option to opt-in or opt-out for AI licensing?

Ben: Good question. Currently, we have put a clause into our contracts to add AI licensing. But, where authors are asking us to remove that clause, we’re taking it out.

And again, coming back to your point before, those authors could opt in later. But for the contracts as they go out, we have it in as a clause now.

Dave:  Okay. So let’s shift to if you’re gathering all of these rights from authors, presumably at some point, then you would actually engage in the licensing with technology companies or others.  Could you say a little bit about that? Do you have any deals in place with tech companies already?  Or, the other thing that I’ve seen is, some publishers have been in the position of not doing those deals directly, but having sort of sub-licensing deals with others- I understand Proquest Clarivate is doing this. And I think Wiley is as well. Do you have any of those deals in place now?

Ben:  We’re still having those conversations at the moment. And we are talking to a range of different people who are looking at this kind of content. 

Dave:  Okay, that’s really helpful to know.

At the beginning, you talked a little bit about Cambridge University Press’s motivations with engaging in this space and doing licensing. Could you talk a little bit about important factors for what might show up in one of those kinds of deals with tech companies?  For instance, one of the things that I think aligns with the sort of values that you outlined at the beginning and that authors care a lot about is credit, right? We know that, especially for academic authors, credit is incredibly valuable and important. And so I wonder if you’ve thought about how ensuring author credit might factor into any sort of downstream deal that CUP might engage in?

Ben: Absolutely. So we’re having exactly those conversations at the moment with anybody that we’re talking to. And we’ve been very clear with our authors when they’ve asked questions about this, and you may have seen this alluded to in some of the information that you’ve had forwarded to you from authors, that those principles of attribution are 100% what we’re focused on.  Really, they’re kind of a red line for us. 

One of the things we’ve been in lots of conversations with people around this technology is the question of at what level does content need to be attributed? Our sense with this is that any kind of meaningful extract from somebody else’s work needs to be cited. 

I’m kind of repeating myself, but that’s how research works. People build on other people’s work, and so in a scenario where content is being ‘discovered’, if we can’t identify and cite that content, it can’t be accurately attributed. So that’s a red line for us.

Dave: Right.  I think figuring out that attribution, like at what level does that attribution need to kick in, is a really tricky thing. It seems to me, that if you’ve got a foundation model that is pulling in some texts and then someone’s using, say ChatGPT to write emails and somewhere in the model it gleans some structural components from sources like academic books,  I don’t think that’s the thing most authors care about – being cited for the fact that you help train this model to understand how to format citations or do other things like that. It’s the intellectual content that matters and that’s the really tricky piece of it.

Ben: Absolutely and I don’t have an easy answer for you there. So we’re having those conversations at the moment, but our sense is that any sort of direct quote, anything that could be, you know, anything that you would consider to be plagiarism or worthy of credit in a non-AI world should be attributed.

Dave:  I realize this question is asking a hypothetical because you don’t have any of these agreements in place yet, but it seems to me there’s a pretty big difference between use of Cambridge books for model training and uses such as for Retrieval Augmented Generation (RAG).

Have you thought about those distinctions in terms of how that might affect differences in Cambridge’s willingness to set a price on those things?  I assume retrieval-augmented generation (RAG) would come with a higher licensing price than others. But could you talk me through that thought process?

Ben: So it’s kind of interesting because I think there’s a little bit of a gray area,  because I think a lot of the RAG tools are combined with some aspect of LLM. So they might belooking to summarize some research or write a brief about X, Y, and Z.

I think it is quite interesting at the moment that most of the questions we get from people who are worried about this are really anxious about LLMs, but I feel like the really exciting place for academia and research is around that kind of retrieval augmented generation because that’s what’s going to help with discoverability for authors.  It is difficult to talk about at the moment because we don’t have any public deals that I can point to. But I’d say a lot of the conversations that we’re having are somewhere between those two things, you know, so it’s a combination of an  LLM that’s generating text and a citation engine or discovery engine sitting over content.

Dave: Leaving aside the legal situation for a moment, one of the things that I hear from authors pretty consistently is the sentiment that with these big technology companies coming in, they feel that these companies are sort of profiting off of content; that they are exploiting. And so they ought to return something to the system and to authors.

But there’s a really different sentiment about what happens when you have, say, academic researchers using content for AI or text data mining purposes to make new discoveries or learn new things both about the texts and about the world around them. We work a lot with text data mining researchers who are interested in large aggregations of content, not so they can build the next OpenAI,  but so they can understand how language has changed over time, or how has culture changed over time.

I wonder from CUP’s perspective, how do those two different kinds of use cases factor into your thinking about downstream licensing deals for AI/ text data mining?

Ben: Yeah, I think for us that the primary thing we’re really trying to lean on, because of course the whole thing is not quite that clear cut, because a lot of the time it’s the big tech companies that are facilitating a lot of that discovery or that a lot of the kind of discovery traffic goes through them. So I think from our perspective, I’m going to say we’re not ruling out working with anyone. We would put anybody– any partner that we had– through the same diligence process that we would have with onboarding anybody else, but we wouldn’t rule out those conversations with anybody. I think for us, the most important thing is coming back to, and I’m going to sound like a stuck record here, but those principles of attribution. And we have had conversations, some preliminary conversations with people who’ve said, “Well, we don’t think it would be possible to do what you’re asking,” and at that point, we’re saying, “well, okay, then you know that’s the red line for us.”

I think there’s quite a bit of cloudy territory between those two things. And I think for us, the most important thing is to make sure that authors are being credited where their work’s being used.

Dave:  All right,  I have a hypothetical that I wanted to give you. So we see that it’s a 20% royalty calculated on net revenue. Let’s say you received $5 million from an AI licensing deal. Can you walk me through how that might work out for the author? How do you calculate net revenue on that? And then, how that the individual author sitting there sees CUP signs a big deal. What can they expect?

Ben: That’s a tricky one because it would depend a little bit on the terms of the deal as well. But broadly speaking, the principle is, if that’s the net revenues that we receive, so in your situation, you had five million in there, the full licensing payment, is divided out across the list of titles. Authors then earn the royalty for that sale or license type per title, as they do now with all other forms of licensing. 

But, then, where a licensee can provide accurate title-level usage within their royalty statements, this would instead be used. So in an LLM situation that you were just talking about, that would be divided among those books.  With the retrieval augmented generation tool, I think that would work much more around the basis of usage. So, depending on what searches within that tool were bringing back particular content, then we would be attributing revenue that way.

Dave: Okay, that makes a lot of sense. I think this was in the FAQ: one of your use cases is in an authoritative database that’s used on a perpetual basis. But there was somewhere that talked about the removal of content once a licensing term has ended. I wonder if you’ve developed thinking internally about what a standard term would be, how long these things might last? 

Ben:  Yeah, I mean, it’s hard, isn’t it? Because where you’re licensing content to train an LLM, it would be sort of insincere to dress that up. Generally most agreements would be governed by a 2-5 year training term and at the end of that term the training data set would be destroyed, however, they would retain the output from the specific models that were developed during the training term. If they wanted to create new models they would need to renew the license/extend the term. 

For some of the other uses that’s all being discussed at the moment. I think there is still work on this, but there would be standard partnership length terms. What I would say is that from our perspective, we think it’s quite likely in the next few years, the focus will move more away from training large language models and into that area of discovery that these are going to become quite important revenue streams for academic publishers. 

Dave: Thanks, very helpful. As you work on these deals, what level of transparency do you plan on offering authors or the general public about what these licenses might look like? At least with other publishers, it’s been quite mysterious – I think with one, we learned about an AI licensing deal in a quarterly earnings report, for instance. I think authors do really care about what the details of these deals look like. 

Ben:  It’s tricky, isn’t it? It’s hard for me to talk about a deal that hasn’t been done already, and of course, these deals can be subject to the same commercial confidentiality requirements as any other partnership. But I think it’s fair to say that Cambridge University Press would endeavor to be pretty transparent about what we’re doing generally and most importantly, be transparent about why we’re doing it. So I don’t think we’d be concealing that information from anybody. And coming back to my point before, we’ve been quite clear that we only want to enter into these kinds of conversations with people that we think are using content responsibly, and we’d always aim to be open.

Dave: A few final questions. First, CUP has published a number of open-access books. For example, I believe CUP was part of the TOME initiative.  Do you feel like this kind of addendum is necessary for those open-access books, given that they already have some sort of open license attached to them? Or do you think that this is a necessary addition to those OA licenses? 

Ben: That’s a really good question, and it’s something that we’re grappling with at the moment. Without getting into the kind of weeds around open access, some of it depends on the license. Historically for books, our default license open access license was a Creative Commons CC BY NC license, which prohibits commercial reuse. I think at the moment, we’re looking at that (and I think a lot of publishers would say the same thing) and working through how that fits with AI licensing with commercial AI companies. The short answer to your question is if you have a CC BY license, then, people do have a broad license to reuse that content. So at the moment, we’re not actively going after those authors for opt-ins, nor are we including those books in licensing deals.

That we’re doing, but that’s also a relatively small number of books. I can say, we are now looking at using more CC-BY-NC-ND as the default, which restricts the creation of derivative works. You’ve touched on a conversation that is evolving, but we would be treating AI usage as requiring a derivative license and therefore not covered under a CC-BY-NC-ND license. 

Dave:  Thanks, that’s very helpful and I think that’s something a lot of authors are trying to figure out: how does AI downstream use factor into Creative Commons licensed works? And of course, the underlying legal situation matters. I didn’t ask, but I assume that the rights that you’re asking for in this addendum are worldwide, since that affects for example whether usage might be permitted under national law. 

Ben: Yes, the rights are worldwide. 

And thinking again about that, I mean, it’s interesting, isn’t it? Because even under the CC-BY license, it doubles down on that principle of attribution as well. That’s the nature of the license so some uses even then may not be covered by that license.

Dave:  Right. That attribution piece under the CC-BY license will be an important one [note: this issue is being litigated, most prominently in the Doe v. Github suit]. And then, there’s also the underlying question of what the law allows independently even if there is no license–open license or otherwise. I know right now there’s a consultation that just closed in the UK about what the law should be, and in the US, we’re fighting these things out in the courts. I think there are 39 lawsuits right now pending about various aspects of this, and a key question in most of them is just how far fair use goes. And of course, you know, if fair use applies then you don’t have to worry too much about what the license says, whether it’s CC BY or CC BY NC ND or anything else.  This is like reading tea leaves but I think the prevailing case law indicates that model training and coming up with the weights has a pretty strong fair use case, but for the output side, that’s where I think it starts to stumble a little bit when you’ve got systems that are producing outputs that are substantially similar to the inputs. So I wouldn’t be surprised if in some of these suits, we get a ruling in favor of fair use and then in some of them we get a different outcome. And then, the landscape is just sort of messy.

And I suppose in the UK, I imagine y’all are watching what that legal landscape looks like around the world as it’s changing.

Ben:  Yeah, absolutely. 

Dave: One final question: we’ve talked a lot about licensing books for AI, but CUP has a substantial journal portfolio as well. Can you say anything about CUP’s approach to use of journal content either as AI training data or for other AI uses? 

Ben: We’ve been more focussed on books, as this is where most of the demand has been to date, but we have seen a developing interest in journal content. We are, therefore, currently exploring this form of licensing in a consultative way with our journal partners. 

Dave:  Well, thank you for talking. And this was really, really helpful. And I think that this will be useful for authors who are trying to understand just more about what’s going on.

Ben:  It’s been a pleasure.

Authors Alliance Comment on US AI Action Plan

Posted March 14, 2025

Today, we submitted a response to a Request for Information from the Office of Science and Technology Policy (OSTP). The OSTP is seeking to develop an “AI Action Plan,” to sustain and accelerate the development of AI in the United States.  As an organization dedicated to advancing the interests of authors who wish to share their works broadly for the public good, we felt it imperative to weigh in on critical copyright and policy issues impacting AI innovation and access to knowledge.

In our response, we reaffirmed our belief that the use of copyrighted works specifically for AI training (distinct from other AI uses) is a quintessential fair use. We noted that Section 1202(b) of the Copyright Act has little utility and serves as an unnecessary stumbling block to the development of AI. We also highlighted the importance of high quality training data and pointed towards the work that is already being done to develop AI training corpora.  

A Few Key Points from Our Submission

Our response to the OSTP highlights several key areas where federal policy can support both authors and a thriving AI research environment:

1. The Role of Fair Use in AI Model Training

We emphasize that fair use has long been a cornerstone of innovation in the U.S.—enabling everything from web search engines to digitization projects. US Copyright law has played a major role in both developing the incredible creative industries homed in the US, as well as driving leading scientific research and commercial innovation. The key to this innovation policy has been a thoughtful balance between providing a degree of control over copyrighted works to copyright holders while allowing for flexibility when it comes to technological innovation and new transformative uses. AI development relies on the ability to analyze large datasets, many of which include copyrighted materials. The uncertainty surrounding the legal status of AI training data due to ongoing litigation threatens to slow innovation. We urge the federal government to explicitly support the application of fair use to AI training and provide much-needed clarity.

2. Addressing the Contractual Override of Fair Use

Many AI developers face contractual barriers that limit their ability to make fair use of content, particularly in text and data mining applications. We recommend legislative measures to prevent contracts from overriding fair use rights, ensuring that AI researchers and developers can continue innovating without undue restrictions.

3. Access to High Quality Datasets

Access to high-quality datasets is a foundational pillar for AI development, enabling models to learn, refine, and iteratively improve. However, the availability of such datasets is often hindered by restrictive licensing agreements, proprietary controls, and inconsistent data standards. To maximize the potential of AI while ensuring ethical and legally sound development, collaborations between academic institutions, libraries, public archives, and technology developers are essential. Government policies should facilitate public-private partnerships that allow for robust and thoughtfully curated datasets, ensuring that AI systems are trained on a rich range of representative materials.

We invite our community of authors, researchers, and policymakers to review our submission. Your engagement is crucial in shaping a responsible and forward-thinking AI policy in the U.S. You can always reach us at info@authorsalliance.org

Updates on AI Copyright Law and Policy: Section 1202 of the DMCA,  Doe v. Github, and the UK Copyright and AI Consultation 

Posted March 7, 2025
some district courts have applied DMCA 1202(b) to physical copies, including textile, which means if you cut off parts of a fabric that contain copyright information, you could be liable for up to $25,000 in damages

The US Copyright Act has never been praised for its clarity or its intuitive simplicity—at a whopping 460 pages long, it is filled with hotly debated ambiguities and overly complex provisions. The copyright laws of most other jurisdictions aren’t much better.

Because of this complexity of copyright law, the implications of changes to copyright law and policy are not always clear to most authors. As we’ve said in the past, many of these issues seem arcane, and largely escape public attention. Yet entities with a vested interest in maximalist copyright—often at odds with the public interest—are certainly paying attention, and often claim to speak for all authors when they in fact represent only a small subset.  As part of our efforts to advocate for a future where copyright law offers ample clarity, certainty, and real focus on values such as the advancement of knowledge and free expression, we would like to share with you two recent projects we undertook:

The 1202 Issue Brief and Amicus Brief in Doe v. Github

Authors Alliance has been closely monitoring the impact of Digital Millennium Copyright Act (DMCA) Section 1202. As we have explained in a previous post, Section 1202(b) creates liability for those who remove or alter copyright management information (CMI) or distribute works with removed CMI. This provision, originally intended to prevent wide-spread piracy, has been increasingly invoked in AI copyright lawsuits, raising significant concerns for lawful use of copyrighted materials beyond training AI. While on its face, penalties for removing CMI might seem somewhat reasonable, the scope of CMI (including a wide variety of information such as website terms of service, affiliate links, and other information) combined with the challenge of including it with all downstream distribution of incomplete copies (imagine if you had to replicate and distribute something like the Amazon Kindle terms of service every time you quoted text from an ebook) could be potentially very disruptive for many users. 

In order to address the confusion regarding the (somewhat inaptly named) “identicality requirement” by the courts in the 9th Circuit, we have released an issue brief, as well undertaken to file an amicus brief in the Doe v. Github case now pending in the 9th Circuit.

Here are the key reasons why we care—and why you should care—about this seemingly obscure issue:

  • The Precedential Nature of Doe v. Github: The upcoming 9th Circuit case, Doe v. GitHub, will address whether Section 1202(b) should only apply when copies made or distributed are identical (or nearly identical) to the original. Lower courts have upheld this identicality requirement to prevent overbroad applications of the law, and the appellate ruling may set a crucial precedent for AI and fair use.
  • Potential Impact on Otherwise Legal Uses: It is not entirely certain if fair use is a defense to 1202(b) claims. If the identicality requirement is removed, Section 1202(b) could create liability for transformative fair uses, snippet reuse, text and data mining, and other lawful applications. This would introduce uncertainty for authors, researchers, and educators who rely on copyrighted materials in limited, legal ways. We advocate for maintaining the identicality requirement and clarifying that fair use applies as a defense to Section 1202 claims. 
  • Possibility of Frivolous Litigation: Section 1202(b) claims have surged in recent years, particularly in AI-related lawsuits. The statute’s vague language and broad applicability have raised fears that opportunistic litigants could use it to chill innovation, scholarship, and creative expression.

To find out more about what’s at stake, please take a look at our 1202(b) Issue Brief. You are also invited to share your stories with us, on how you have navigated this strange statute. 

Reply to the UK Open Consultation on Copyright and AI

We have members in the UK, and many of our US-based members publish in the UK. We have been watching the development in UK copyright law closely, and have recently filed a comment to the UK Open Consultation on Copyright and AI. In our comment, we emphasized the importance of ensuring that copyright policy serves the public interest. Our response’s key points include:

  • Competition Concerns: We alerted the policy-makers that their top objective must include preventing monopolies forming in the AI space. If licensing for AI training becomes the norm, we foresee power consolidating in a handful of tech companies and their unbridled monopoly permeating all aspects of our lives within a few decades—if not sooner. 
  • Fair Use as a Guiding Principle: We strongly believe that the use of works in the training and development of AI models constitutes fair use under US law. While this issue is currently being tested in courts, case law suggests that fair use will prevail, ensuring that AI training on copyrighted works remains permissible. The UK does not have an identical fair use statute, but has recognized that some of its functions—such as flexibility to permit new technological uses—are valuable. We argue that the wise approach is for the UK to update its laws to ensure its creative and tech sectors can meaningfully participate in the global arena. Our comment called for a broad AI and TDM exception allowing temporary copies of copyrighted works for AI training. We emphasized that when AI models extract uncopyrightable elements, such as facts and ideas, this should remain lawful and protected. 
  • Noncommercial Research Should Be Protected: We strongly advocated for the protection of noncommercial AI research, arguing that academic institutions and their researchers should not face legal barriers when using copyrighted works to train AI models for research purposes. Imposing additional licensing requirements would place undue burdens on academic institutions, which already pay significant fees to access research materials.

Book Talk: Copyright, AI, and Great Power Competition

Register Here

How is artificial intelligence reshaping intellectual property law? And what role does copyright play in the global AI race? Join us for a thought-provoking discussion on Copyright, AI, and Great Power Competition, a new paper by Joshua Levine and Tim Hwang that explores how different nations approach AI policy and copyright regulation—and what’s at stake in the battle for technological dominance.

This event will bring together experts to examine key legal, economic, and geopolitical questions, including:

  • How do copyright laws affect AI innovation?
  • What are the competing regulatory approaches of the U.S., China, and the EU?
  • How should policymakers balance creators’ rights with AI development?

Whether you’re a legal scholar, technologist, policymaker, or just curious about the intersection of AI and copyright, this conversation is not to be missed!

DOWNLOAD

Download Copyright, AI, and Great Power Competition.

ABOUT OUR SPEAKERS

JOSHUA LEVINE is a Research Fellow at the Foundation for American Innovation. His work focuses on policies that foster digital competition and interoperability in digital markets, online expression, and emerging technologies. Before joining FAI, Josh was a Technology and Innovation Policy Analyst at the American Action Forum, where he focused on competition in digital markets, data privacy, and artificial intelligence. He holds a BA in Political Economy from Tulane University and lives in Washington, D.C.

TIM HWANG is General Counsel and a Senior Fellow at the Foundation for American Innovation focused on the intersection of artificial intelligence and intellectual property. He is also a Senior Technology Fellow at the Institute for Progress, where he runs Macroscience. Previously, Hwang served as the General Counsel and VP Operations at Substack, as well as the global public policy lead for Google on artificial intelligence and machine learning. He is the author of Subprime Attention Crisis, a book about the structural vulnerabilities in the market for programmatic advertising.

Dubbed “The Busiest Man on the Internet” by Forbes Magazine, his current research focuses on global competition in artificial intelligence and the political economy of metascience. He holds a J.D. from Berkeley Law School and a B.A. from Harvard College.

REGISTER HERE

Fair Use, Censorship, and Struggle for Control of Facts

Posted February 27, 2025
Caption: 451 is the http error code when a webpage is unavailable for legal reasons; it is also the temperature at which books catch fire and burn. This public domain image is taken inside the Internet Archive

Imagine this: a high-profile aerospace and media billionaire threatens to sue you for writing an unauthorized and unflattering biography. In the course of writing, you rely on several news articles, including a series of in-depth pieces about the billionaire’s life written over a decade earlier. Given their closeness in time to real events, you quote, sometimes extensively, from those articles in several places. 

On the eve of publication, your manuscript is leaked. Through one of his associated companies, the billionaire buys up the copyrights to the articles from which you quote. The next day the company files an infringement lawsuit against you. 

Copyright Censorship: a Time-Honored Tradition

It’s easy to imagine such a suit brought by a modern billionaire—perhaps Elon Musk or Jeff Bezos. But using copyright as a tool for censorship is a time-honored tradition. In this case, Howard Hughes tried it out in 1966, using his company Rosemont Enterprises to file suit against Random House for a biography it would eventually publish.

As we’ve seen many times before and since, the courts turned to copyright’s “fair use” right to rescue the biography from censorship. Fair use, the court explained, exists so that “courts in passing upon particular claims of infringement must occasionally subordinate the copyright holder’s interest in a maximum financial return to the greater public interest in the development of art, science and industry.” 

Singling out the biographical nature of the work and its importance in surfacing underlying facts, the court explained: 

Biographies, of course, are fundamentally personal histories and it is both reasonable and customary for biographers to refer to and utilize earlier works dealing with the subject of the work and occasionally to quote directly from such works. . . . This practice is permitted because of the public benefit in encouraging the development of historical and biographical works and their public distribution, e.g., so “that the world may not be deprived of improvements, or the progress of the arts be retarded.”

Fair use playing this role is no accident. As the Supreme Court has explained, the relationship between copyright and free expression is complicated. On the one hand, the Court has explained,  “[T]he Framers intended copyright itself to be the engine of free expression. By establishing a marketable right to the use of one’s expression, copyright supplies the economic incentive to create and disseminate ideas.” But, recognizing that such exclusive control over expression could chill the very speech copyright seeks to enable, the law contains what the Court has described as two “traditional First Amendment safeguards” to ensure that facts and ideas remain available for free reuse: 1) protections against control over facts and ideas, and 2) fair use. 

But rescuing a biography that merely quotes, even extensively, from earlier articles seems like an easy call, especially when it seems so clear that the plaintiff has so clearly engineered the copyright suit not to protect legitimate economic interests but to suppress an unpopular narrative.  

The world is a little more complicated now. Can fair use continue to protect free expression from excessive enforcement of copyright? I think so, but two key areas are at risk: 

Fair Use and the Archives

It may have escaped your notice that large chunks of online content disappear each year. 

For years, archivists have recognized and worked to address the problem. Websites going dark is an annoyance for most of us, but in some cases, it can have real implications for understanding recent history, even as officially documented. For example, back in 2013, a report revealed that well over half of the websites linked to in Supreme Court opinions no longer work, jeopardizing our understanding of just what went into why and how the Court decided an issue.  

While most websites disappear from benign neglect, others are intentionally taken down to remove records from public scrutiny.  Exhibit A may be the 8,000+ government web pages recently removed by the new presidential administration, but there are many other examples (even whole “reputation management” firms devoted to scrubbing the web of information that may cast one in an unfavorable light). 

The most well-known bulwark against disappearing internet content is the Internet Archive, which has, at this point, archived over 900 billion web pages. Over and over again, we’ve seen its WayBack Machine used to shine a light on history that powerful people would rather have hidden. It’s also why the WayBack Machine has been blocked or threatened at various times in China, Russia, India, and other jurisdictions where free expression protections are weak.

It’s not just the open web that is disappearing. A recent report on the problem of “Vanishing Culture” highlights how this challenge pervades modern cultural works. Everything from 90s shareware video games to the entirety of the MTV News Archive are at risk.  As Jordan Mechner, a contributor to the report explains, “historical oblivion is the default, not the exception” to the human record. As the report explains, it’s not just disappearing content that poses a problem: libraries and consumers must grapple with electronic content that can be remotely changed by publishers or others as well. As just one example among many, in just the last few years we’ve seen surreptitious modifications to ebooks on readers’ devices—some changing important aspects of the plot—for works by authors such as RL Stine, Roald Dahl, and Agatha Christie.  

The case for preservation as a foundational necessity to combat censorship is straightforward. “There is no political power without power over the archive,” Jacques Derrida reminds us. Without access to a stable, high-fidelity copy of the historical record, there can be no meaningful reflection on what went right or wrong, or holding to account those in power who may oppose an accurate representation of their past. 

What sometimes goes unnoticed is that, without fair use, a large portion of these preservation efforts would be illegal. 

In a world where century-long copyright protection applies automatically to any human expression with even a “modicum of creativity,” virtually everything created in the last century is subject to copyright. This is a problem for digital works because practically any preservation effort involves making copies—often lots of them—to ensure the integrity of the content. Making those copies means that archivists must rely on fair use to preserve these works and make them available in meaningful ways to researchers and others. 

The upshot is that every time the Internet Archive archives a website, it’s an act of faith in fair use. Is that faith well-founded? 

I think so. But the answer is complicated. 

For preservation efforts like those of the Internet Archive, fair use is a foundation, but not an unshakable one. Two recent cases highlight the risk, one against its book lending program and the other objecting to its “Great 78” record project. Both take issue with how the Archive provides access to preserved digital copies in its collections. While not directly attacking the preservation of those materials, the suits effectively jeopardize their effective use. As archivists have long lamented, “preservation without access is pointless.” 

Beyond direct challenges to fair use, archives are threatened by spurious takedown demands, content removal requests, and legal challenges. Organizations like the Internet Archive have fought back, but many institutions simply cannot afford to, leading to a chilling effect where preservation efforts are scaled back or abandoned altogether.

Compounding this uncertainty is the growing use of technological protection measures (TPMs) and digital rights management (DRM) systems that restrict access to digital works. Under the Digital Millennium Copyright Act (DMCA), circumventing these restrictions is illegal—even for lawful purposes like preservation or research. This creates a paradox where a researcher or archivist may have a clear fair use justification for accessing and copying a work, but breaking an encryption lock to do so could expose them to legal liability.

Additionally, the rise of contractual overrides—such as restrictive licensing agreements on digital platforms—threatens to sideline fair use entirely. Many modern works, including e-books, streaming media, and even scholarly databases, are governed by terms of service that explicitly prohibit copying or analysis, even for noncommercial research. These contracts often supersede fair use rights, leaving archivists and researchers with no legal recourse.

Still, there are reasons for optimism. Courts have generally ruled favorably when fair use is invoked for transformative purposes, such as digitization for research, searchability, and access for disabled users. Landmark decisions, like those in Authors Guild v. Google and Authors Guild v. HathiTrust, upheld fair use in the context of large-scale digital libraries and text-mining projects. These cases suggest that courts recognize the essential role fair use plays in making knowledge accessible, particularly in an era of vast digital information.

Fair Use and the Freedom to Extract 

One of copyright’s other traditional First Amendment protections is that the copyright monopoly does not extend to facts or ideas. Fair use is critical in giving life to this protection by ensuring that facts and ideas remain accessible, providing a “freedom to extract” (a term I borrow from law professor Molly Van Houweling’s recent scholarship) even when they are embedded within copyrighted works. 

Copyright does not and cannot grant exclusive control over facts, but in practice, extracting those facts often requires using the work in ways that implicate the rightsholder’s copyright. Whether journalists referencing past reporting, historians identifying truths in archival materials, or researchers analyzing a vast corpus of written works, fair use provides the necessary legal space to operate without running afoul of copyright protections for rightsholders. 

The need is more urgent than ever given the sheer scale of the modern historical record.   In many cases, relying on individual researchers to sift through the record and extract important facts is impractical, if not impossible. Automated tools and processes, including AI and text data mining tools, are now indispensable for processing, retrieving, and analyzing facts from large amounts of massive amounts of text, images, and audio. From uncovering patterns in historical archives to verifying political statements against prior records, these tools serve as extensions of human analysis, making the extraction of factual information possible at an unprecedented scale. However, these technologies depend on fair use. If every instance of text or data mining required explicit permission from rights holders—who may have economic or political incentives to deny access—the ability to conduct meaningful research and discovery would be crippled.

For example, consider a researcher studying the roots of the opioid crisis, trying to mine the 4 million documents in the Opioid Industry Documents Archive—many of them legal materials, internal company communications, and regulatory filings. These documents, made public through litigation, provide critical insights into how pharmaceutical companies marketed opioids, downplayed their risks, and shaped public policy. But making sense of such a massive trove of records is impossible without computational tools that can analyze trends, track key players, and surface hidden patterns. 

Without fair use, researchers could face legal roadblocks to applying text and data mining techniques to extract the facts buried within these documents. If copyright law were used to restrict or complicate access to these records, it would not only hamper academic research but also shield corporate and governmental actors from exposure and accountability.

Conclusion

As information continues to proliferate across digital media, fair use remains one of the few safeguards ensuring that historical records and cultural artifacts do not become permanently locked away behind copyright barriers. It allows the past to be examined, challenged, and understood. If we allow excessive copyright restrictions to limit the ability to extract and analyze our shared past and culture, we risk not only stifling innovation but also eroding our collective ability to engage with history and truth.

Fair Use Week

This is my contribution to Fair Use Week. The read the other excellent posts from this week, check out Kyle Courtney’s Harvard Library Fair Use Week blog here.

Why Bayh-Dole has nothing to do with public access to articles under the Federal Purpose License

Posted February 4, 2025
On the left a patent showing windmill, on the right a once copyrightable poem about windmill. This is to illustrate the difference between patent and copyright.
This image, along with all its components, is in the Public Domain and free for reuse.

In the course of our work on Federal public access policies and the Nelson Memo, one of the objections I’ve encountered recently is that federal agency initiatives to provide immediate public access to scholarly articles run afoul of the Bayh-Dole Act or may imperil a university’s patent rights to inventions created pursuant to federal funding. Another related objection is that Stanford v. Roche, a case about how a university must go about securing rights in patentable inventions from their faculty under Bayh-Dole, affects how universities obtain sufficient rights to comply with federal public access policies. 

I thought it would be worth explaining why we don’t think these are realistic problems for federal public access law or policy. 

Bayh-Dole does not affect copyright in scholarly articles

The Bayh-Dole Act is an amendment to U.S. patent law passed in 1980 that gives nonprofits and small businesses the right to retain patent rights in inventions developed using federal funding. Before Bayh-Dole, federal grant recipients were required by some federal agencies’ policies to assign patent rights arising from federally funded research to the government. To encourage institutions receiving federal research funding to commercialize inventions for public benefit, Bayh-Dole instead allowed institutions receiving federal grants the right to retain rights to an invention. If a grantee elects to retain title to an invention (rather than commercializing it), they must grant the government a nonexclusive, nontransferable, irrevocable, paid-up license to use the invention. Unreasonable refusal to develop or commercialize may result in the government exercising “march-in” rights to license the invention to others (one of the more controversial parts of the legislation). 

The rights that Bayh-Dole secures for government contractors and grantees apply to “subject inventions.” “Inventions” it defines as “any invention or discovery which is or may be patentable or otherwise protectable under [US patent laws], or any novel variety of plant which is or may be protectable under the Plant Variety Protection Act. . . . .”  In turn, “subject inventions” are “any invention of the contractor conceived or first actually reduced to practice in the performance of work under a funding agreement.”  In other words, “subject inventions” are inventions that were developed within the scope of a Federal grant.

The Nelson Memo also applies to grant outputs, but not inventions; it applies to “peer-reviewed scholarly publications.” Peer-reviewed scholarly publications, of course, are not inventions nor would any rights under patent law apply to them. Scholarly publications are creative works of authorship, reuse of which is governed by copyright law under Title 17 of the United States Code, not covered by Bayh-Dole. It is true that copyrights and patents are sometimes discussed together as “intellectual property,” and courts sometimes even borrow concepts from one body of law to the other. But for the most part, different statutes and different cases govern how rights under each may be created, owned, licensed, and used.

Federal regulations about agency ownership and licensing of patent and copyright rights reflect that they are different. As discussed at length in this paper we published a few months ago (or see the one-page summary), grant-making agencies have for nearly half a century reserved certain rights in copyrighted grant outputs under a provision known as the “Federal Purpose License.” That license, which is codified in 2 C.F.R. § 200.315(b), provides that: 

“To the extent permitted by law, the recipient or subrecipient may copyright any work that is subject to copyright and was developed, or for which ownership was acquired, under a Federal award. The Federal agency reserves a royalty-free, nonexclusive, and irrevocable right to reproduce, publish, or otherwise use the work for Federal purposes and to authorize others to do so. This includes the right to require recipients and subrecipients to make such works available through agency-designated public access repositories.” (emphasis added).

Note that the Federal Purpose License is limited to copyrightable works.  By contrast, in the very next sub-section of the regulation, we see that rights in patents are treated differently:  

“[T]he recipient or subrecipient is subject to applicable regulations governing patents and inventions, including government-wide regulations in 37 CFR part 401 [the implementing regulations for Bayh-Dole].” 2 C.F.R. § 200.315(c)(emphasis added).

It is, of course, possible that in the course of federally funded research, one might produce both a patentable invention that is subject to Bayh-Dole and a copyrighted research article on the same subject. But this does not make Bayh-Dole applicable to the copyright rights in the article, nor does it mean that the Federal Purpose License (a copyright license) affects patent rights under Bayh-Dole regulations. The copyright provisions cover copyrightable works; the patent provisions the patents.

Disclosure of Inventions or Discoveries

If you’ve worked with your campus technology transfer office before, you know that public disclosure of new research (e.g., in a research article)  can be a problem if one hopes to obtain a patent for an invention discussed in that publication. U.S. patent law rewards new and non-obvious inventions, and so the law provides in 35 U.S.C. § 102(a) that one is not entitled to a patent if “the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.”

Note that the statute specifically calls out description of the invention “in a print publication.” Prior print publication turns on “public accessibility,” which the courts have explained as being “disseminated or otherwise made available to the extent that persons interested and ordinarily skilled in the subject matter or art exercising reasonable diligence[ ] can locate it.” And so, the standard is far less than the “worldwide free public access” provided by the public access databases under the Nelson memo. For example, the Federal Circuit has found that a dissertation shelved and indexed in a card catalog at a German University qualified as publicly accessible. The court has also concluded that an oral presentation of a paper (with dissemination of the paper itself to only six people) at a conference satisfied the test. Similarly, the Federal Circuit has held that electronic distribution via a subscription email list qualified as publicly accessible. The point is that if you’ve already published a paper in a peer-reviewed journal that sufficiently describes the invention–even if just published via a subscription route and not available for free–you have almost certainly already disclosed the invention. Further expanding the reach through a public access repository would make no difference. 

Public access policies implementing the Nelson Memo do not compel researchers or universities to disclose inventions prematurely, thus having no impact on patentability. It merely states that once you choose to publish your research in an article, it must be promptly accessible to the public for free, no later than the publication date, in a public access repository. Whether the article is restricted to subscribers only or made openly available does not affect its status as a public disclosure for patent purposes.

Stanford v. Roche

Stanford v. Roche is 2011 Supreme Court case addressing ownership of patent rights in inventions created pursuant to federal funding and subject to Bayh-Dole. The case was about control over rights in a test kit developed to detect HIV in human blood. As the Court explained the relevant facts: 

Dr. Mark Holodniy joined Stanford as a research fellow . . . When he did so, he signed a Copyright and Patent Agreement (CPA) stating that he “agree[d] to assign” to Stanford his “right, title and interest in” inventions resulting from his employment at the University. 

At Stanford Holodniy undertook to develop an improved method for quantifying HIV levels in patient blood samples, using [polymerase chain reaction, or PCR, a Nobel Prize-winning technique developed at Cetus]. Because Holodniy was largely unfamiliar with PCR, his supervisor arranged for him to conduct research at Cetus. As a condition of gaining access to Cetus, Holodniy signed a Visitor’s Confidentiality Agreement (VCA). That agreement stated that Holodniy “will assign and do[es] hereby assign” to Cetus his “right, title, and interest in each of the ideas, inventions and improvements” made “as a consequence of [his] access” to Cetus. 

For the next nine months, Holodniy conducted research at Cetus. 

The conflict was ultimately about whether Stanford could prevent Roche, the company that acquired Cetus’s IP assets, from using the invention. 

At the Supreme Court, the court was asked to address the apparent conflict between 1) the ordinary rule in patent law that rights in an invention belong to the inventor and that “in most circumstances, an inventor must expressly grant his rights in an invention to his employer if the employer is to obtain those rights” and 2) the contention of Stanford University that Bayh-Dole changed this ordinary rule and instead gave it first priority in that invention, such that an individual inventor couldn’t just sign away rights to a third party. 

Stanford made this argument about Bayh-Dole in part to protest against an important decision in the appellate court below; namely, that Stanford’s agreement with Dr. Holodniy was a “mere promise to assign rights in the future, not an immediate transfer of expectant interests” and therefore came second in line to Holodniy’s agreement with Cetus which allowed it to “immediately gained equitable title to Holodniy’s inventions.”

The Supreme Court concluded that Bayh-Dole did not disrupt the ordinary rule that inventors own rights in their inventions absent an express assignment, and because Holodniy’s agreement with Stanford used ineffective language to secure for it first priority—“agree to assign” instead of the effective “do hereby assign”—Stanford lost. The practical upshot—many of you may remember this—was that universities rushed to revise their agreements with employees to put in place more effective language securing first-priority rights in inventions of university employees. 

Federal grants and copyright—what’s a university to do?

Stanford v. Roche contains some important lessons for universities, as federal grant recipients, about securing clear and effective rights from employees to comply with their grant obligations. 

Like in Stanford v. Roche, in the context of copyrightable works created pursuant to federal funding, it’s also important for universities (as grantees) to make sure they actually hold sufficient rights in copyrightable works produced under that grant so they can comply with federal agencies’ public access requirements. That said, there are some important differences between the assignment of patent rights issues in Roche and what is required for compliance under the federal purpose license. 

Probably the biggest determining factor in the effectiveness of those licenses will be how universities craft and implement their copyright policies. We’ve touched on this before, and explained that one important factor to consider is whether copyright law’s “work made for hire” doctrine applies (patent law has no such thing). Under copyright’s work made-for-hire doctrine, a work produced within the scope of employment is owned initially by the employer rather than an employee.  Whether and how “work made for hire” applies to academic work is contested, but if it does apply, it largely eliminates concerns about priority of the university’s license since the university would be the initial owner. That’s true even though most universities (very rightly in our opinion!), make it clear that individual authors should ultimately be in control of rights in their works.  For instance, the University of Michigan transfers the copyright of scholarly works to its faculty members, but reserves the ability to make uses consistent with academic norms, including complying with a Federal Purpose License.

Even without the application of work for hire, universities can and do use their copyright policies to effectively address ownership and licensing of faculty created scholarly works.  Though we haven’t read every university’s copyright policy, for the most part we’ve found them to be thoughtful about securing from faculty authors at a minimum a non-exclusive license that would satisfy the requirements of Section 205(e) of the Copyright Act, giving it priority over any subsequent transfers such as a publishing agreement with a publisher.  We review some of these approaches university policies take in this post, and we plan to release a white paper on this subject in the next few months. If you want to read further now,  Law Professor Eric Priest has a good article, “Copyright and the Harvard Open Access Mandate,” that explains why these kinds of licenses are likely effective. 

Conclusion

It’s important to remember that patent law and copyright law are distinct in many ways. While they share some similar concepts, the details are important and ownership and licensing of rights under one can be quite different from the other. The Bayh-Dole Act and other U.S. patent law govern ownership and commercialization of federally funded inventions, but they do not dictate how the Federal Purpose License should be interpreted or applied within the confines of copyright law. 

Artificial Intelligence, Authorship, and the Public Interest

Posted January 9, 2025
Photo by Robert Anasch on Unsplash

Today, we’re pleased to announce a new project generously supported by the John S. and James L. Knight Foundation. The project, “Artificial Intelligence, Authorship, and the Public Interest,” aims to identify, clarify, and offer answers to some of the most challenging copyright questions posed by artificial intelligence (AI) and explain how this new technology can best advance knowledge and serve the public interest.

Artificial intelligence has dominated public conversation about the future of authorship and creativity for several years. Questions abound about how this technology will affect creators’ incentives, influence readership, and what it might mean for future research and learning. 

At the heart of these questions is copyright law. Over two dozen class-action copyright lawsuits have been filed between November 2022 and today against companies such as Microsoft, Google, OpenAI, Meta, and others. Additionally, congressional leadership, state legislatures, and regulatory agencies have held dozens of hearings to reconcile existing intellectual property law with artificial intelligence. As one of the primary legal mechanisms for promoting the “progress of science and the useful arts,” copyright law plays a critical role in creating, producing, and disseminating information. 

We are convinced that how policymakers shape copyright law in response to AI will have a lasting impact on whether and how the law supports democratic values and serves the common good. That is why Authors Alliance has already devoted considerable effort to these issues, and this project will allow us to expand those efforts at this critical moment. 

AI Legal Fellow
As part of the project, we’re pleased to add an AI Legal Fellow to our team to support the project. The position requires a law degree and demonstrated interest and experience with artificial intelligence, intellectual property, and legal technology issues. We’re particularly interested in someone with a demonstrated interest in how copyright law can serve the public interest. This role will require significant research and writing. Pay is $90,000/yr, and it is a two-year term position. Read more about the position here. We’ll begin reviewing applications immediately and do interviews on a rolling basis until filled. 

As we get going, we’ll have much more to say about this project. We will have some funds available to support research subgrants, organize several workshops and symposia, and offer numerous opportunities for public engagement. 

About the John S. and James L. Knight Foundation
We are social investors who support democracy by funding free expression and journalism, arts and culture in community, research in areas of media and democracy, and in the success of American cities and towns where the Knight brothers once had newspapers. Learn more at kf.org and follow @knightfdn on social media.

Authors Alliance 2024 Annual Report

Posted December 17, 2024

Authors Alliance celebrated an important milestone in 2024: our 10th anniversary! 

Quite a lot has changed since 2014, but our mission remains the same. We exist to advance the interests of authors who want to serve the public good by sharing their creations broadly.  I’m pleased to share our 2024 annual report, where you can find highlights of our work this year to promote laws, policies, and practices that enable authors to reach wide audiences.

Our success in 2024 was largely due to the wonderful collaboration and support we have from our members. You’ll see in the report a number of ongoing projects and issues we are working to address: legal questions about open access publishing, rights reversion at scale, supporting text data mining research, addressing contractual override of fair use,  AI and copyright, and more. As we look to 2025, I would love to hear from you if you have a special interest in any of these projects and would like to contribute your ideas, time, or expertise to help us tackle them.

I’m grateful for those of you who contributed financially to make 2024 a success. Authors Alliance is funded almost entirely by gifts and grants, and so we truly rely on you. As we end the year, I hope you will consider giving if you haven’t done so already. You can donate online here.

Thank you,

Dave Hansen
Executive Director 


Restricting Innovation: How Publisher Contracts Undermine Scholarly AI Research

Posted December 6, 2024
Photo by Josh Appel on Unsplash

This post is by Rachael Samberg, Director, Scholarly Communication & Information Policy, UC Berkeley Library and Dave Hansen, Executive Director, Authors Alliance

This post is about the research and the advancement of science and knowledge made impossible when publishers use contracts to limit researchers’ ability to use AI tools with scholarly works. 

Within the scholarly publishing community, mixed messages pervade about who gets to say when and how AI tools can be used for research reliant on scholarly works like journal articles or books. Some scholars voiced concern (explained more here) when major scholarly publishers like Wiley or Taylor & Francis entered lucrative contracts with big technology companies to allow for AI training without first seeking permission from authors. We suspect that these publishers have the legal right to do so since most publishers demand that authors hand over extensive rights in exchange for publishing their work. And with the backdrop of dozens of pending AI copyright lawsuits, who can blame the AI companies for paying for licenses, if for no other reason than avoiding the pain of litigation? While it stings to see the same large commercial, academic publishers profit yet again off of the work academic authors submit to them for free, we continue to think there are good ways for authors to retain a say in the matter. 

 Big tech companies are one thing, but what about scholarly research? What about the large and growing number of scholars who are themselves using scholarly copyrighted content with AI tools to conduct their research? We currently face a situation in which publishers are attempting to dictate how and when researchers can do that work, even when authors’ fair use rights to use and derive new understandings from scholarship clearly allow for such uses. 

How vendor contracts disadvantage US researchers

We have written elsewhere (in an explainer and public comment to the Copyright Office) why training AI tools, particularly in the scholarly and research context, constitutes a fair use under U.S. Copyright law. Critical for the advancement of knowledge, training AI is based on a statutory right already held by all scholarly authors engaging in computational research and one that lawmakers should preserve. 

The problem U.S. scholarly authors presently face with AI training is that publishers restrict their access to these statutory rights through contracts that override them: In the United States, publishers can use private contracts to take away statutory fair use rights that researchers would otherwise hold under Federal law. In this case, the private contracts at issue are the electronic resource (e-resource) license agreements that academic research libraries sign to secure campus access to electronic journal, e-book, data, and other content that scholars need for their computational research.

Contractual override of fair use is a problem that disparately disadvantages U.S. researchers. As we have described elsewhere, more than forty countries, including the European Union, expressly reserve text mining and AI training rights for scientific research by research institutions. Not only do scholars in these countries not have to worry whether their computational research with AI is permitted, but also: They do not risk having those reserved rights overridden by contract. The European Union’s Copyright Digital Single Market Directive and recent AI Act nullify any attempt to circumscribe the text and data mining and AI training rights reserved for scientific research within research organizations. U.S. scholars are not as fortunate. 

In the U.S., most institutional e-resource licenses are negotiated and managed by research libraries, so it is imperative that scholars work closely with their libraries and advocate to preserve their computational research and AI training rights within the e-resource license agreements that universities sign. To that end, we have developed adaptable licensing language to support institutions in doing that nationwide. But while this language is helpful, the onus of advocacy and negotiation for those rights in the contracting process remains. Personally, we have found it helpful to explain to publishers that they must consent to these terms in the European Union, and can do so in the U.S. as well. That, combined with strong faculty and administrative support (such as at the University of California), makes for a strong stance against curtailment of these rights.

But we think there are additional practical ways for libraries to illustrate—both to publishers and scholarly authors—exactly what would happen to the advancement of knowledge if publishers’ licensing efforts to curtail AI training were successful. One way to do that is by “unpacking” or decoding a publisher’s proposed licensing restriction, and then demonstrating the impact that provision would have on research projects that were never objectionable to publishers before, and should not be now. We’ll take that approach below.

Decoding a publisher restriction

A commercial publisher recently proposed the following clause in an e-resource agreement:

Customer [the university] and its Authorized Users [the scholars] may not:

  1. directly or indirectly develop, train, program, improve, and/or enrich any artificial intelligence tool (“AI Tool”) accessible to anyone other than Customer and its Authorized Users, whether developed internally or provided by a third party; or
  2. reproduce or redistribute the Content to any third-party AI Tool, except to the extent limited portions of the Content are used solely for research and academic purposes (including to train an algorithm) and where the third-party AI Tool (a) is used locally in a self-hosted environment or closed hosted environment solely for use by Customer or Authorized Users; (b) is not trained or fine-tuned using the Content or any part thereof; and (c) does not share the Content or any part thereof with a third party.  

What does this mean?

  • The first paragraph forbids the training or improving of any AI tool if it’s accessible or released to third parties. And, it further forbids the use of any computational outputs or analysis that are derived from the licensed content from being used to train any tool available to third parties. 
  • The second paragraph is perhaps even more concerning. It provides that when using third party AI tools of any kind, a scholar can use only limited portions of the licensed content with the tools, and are prohibited from doing any training at all of third party tools even if it’s a non-generative AI tool and the scholar is performing the work in a completely closed and highly secure research environment.

What would the impact of such a restrictive licensing provision be on research? 

It would mean that every single one of the trained tools in the following projects could never be disseminated. In addition, for the projects below that used third-party AI tools, the research would have been prohibited full-stop because the third-party tools in those projects required training which the publisher above is attempting to prevent:

Tools that could not be disseminated

  1. In 2017, chemists created and trained a generative AI tool on 12,000 published research papers regarding synthesis conditions for metal oxides, so that the tool could identify anticipated chemical outputs and reactions for any given set of synthesis conditions entered into the tool. The generative tool they created is not capable of reproducing or redistributing any licensed content from the papers; it has merely learned conditions and outcomes and can predict chemical reactions based on those conditions and outcomes. And this beneficial tool would be prohibited from dissemination under the publisher’s terms identified above.
  2. In 2018, researchers trained an AI tool (that they had originally created in 2014) to understand whether a character is “masculine” or “feminine” by looking at the tacit assumptions expressed in words associated with that character. That tool can then look at other texts and identify masculine or feminine characters based on what it knows from having been trained before. The implications are that scholars can therefore use texts from different time periods with the tool to study representations of masculinity and femininity over time. No licensed content, no licensed or copyrighted books from a publisher can ever be released to the world by sharing the trained tool; the trained tool is merely capable of topic modeling—but the publisher’s above language would prohibit its dissemination nevertheless. 

Tools that could neither be trained nor disseminated 

  1. In 2019, authors used text from millions of books published over 100 years to analyze cultural meaning. They did this by training third-party non-generative AI word-embedding models called Word2Vec and GLoVE on multiple textual archives. The tools cannot reproduce content: when shown new text, they merely represent words as numbers, or vectors, to evaluate or predict how similar words in a given space are semantically or linguistically. The similarity of words can reveal cultural shifts in understanding of socioeconomic factors like class over time. But the publisher’s above licensing terms would prohibit the training of the tools to begin with, much less the sharing of them to support further or different inquiry. 
  2. In 2023, scholars trained a third-party-created open-source natural language processing (NLP) tool called Chemical Data Extractor (CDE). Among other things, CDE can be used to extract chemical information and properties identified in scholarly papers. In this case, the scholars wanted to teach CDE to parse a specific type of chemical information: metal-organic frameworks, or MoFs. Generally speaking, the CDE tool works by breaking sentences into “tokens” like parts of speech and referenced chemicals. By correlating tokens, one can determine that a particular chemical compound has certain synthetic properties, topologies, reactions with solvents, etc. The scholars trained CDE specifically to parse MoF names, synthesis methods, inorganic precursors, and more—and then exported the results into an open source database that identifies the MoF properties for each compound. Anyone can now use both the trained CDE tool and the database of MoF properties to ask different chemical property questions or identify additional MoF production pathways—thereby improving materials science for all. Neither the CDE tool nor the MoF database reproduces or contains the underlying scholarly papers that the tool learned from. Yet, neither the training of this third-party CDE tool nor its dissemination would be permitted under the publisher’s restrictive licensing language cited above.

Indeed, there are hundreds of AI tools that scholars have trained and disseminated—tools that do not reproduce licensed content—and that scholars have created or fine-tuned to extract chemical information, recognize faces, decode conversations, infer character types, and so much more. Restrictive licensing language like that shown above suppresses research inquiries and societal benefits that these tools make possible. It may also disproportionately affect the advancement of knowledge in or about developing countries, which may lack the resources to secure licenses or be forced to rely on open-source or poorly-coded public data—hindering journalism, language translation, and language preservation.

Protecting access to facts

Why are some publishers doing this? Perhaps to reserve the opportunity to develop and license their own scholarship-trained AI tools, which they could then license at additional cost back to research institutions. We could speculate about motivations, but the upshot is that publishers have been pushing hard to foreclose scholars from training and dissemination AI tools that now “know” something based on the licensed content. That is, such publishers wish to prevent tools from learning facts about the licensed content. 

However, this is precisely the purpose of licensing content. When institutions license content for their scholars to read, they are doing so for the scholars to learn information from the content. When scholars write about it or teach about the content, they are not regenerating the actual expression from the content—the part that is protected by copyright; rather the scholars are conveying the lessons learned from the content—facts not protected by copyright. Prohibiting the training of AI tools and the dissemination of those tools is functionally equivalent to prohibiting scholars from learning anything about the content that institutions are licensing for that very purpose, and that scholars have written to begin with! Publishers should not be able to monopolize the dissemination of information learned from scholarly content, and especially when that information is used non-commercially.

For these reasons, when we negotiate to preserve AI usage and training rights, we generally try to achieve the following outcomes which would promote—rather than prohibit—all of the research projects described above:

The sample language we’ve disseminated empowers others to negotiate for these outcomes. We hope that, when coupled with the advocacy tools we’ve provided above, scholars and libraries can protect their AI usage and training rights, while also being equipped to consider how they want their own works to be used.