Category Archives: Blog

Artificial Intelligence, Authorship, and the Public Interest

Posted January 9, 2025
Photo by Robert Anasch on Unsplash

Today, we’re pleased to announce a new project generously supported by the John S. and James L. Knight Foundation. The project, “Artificial Intelligence, Authorship, and the Public Interest,” aims to identify, clarify, and offer answers to some of the most challenging copyright questions posed by artificial intelligence (AI) and explain how this new technology can best advance knowledge and serve the public interest.

Artificial intelligence has dominated public conversation about the future of authorship and creativity for several years. Questions abound about how this technology will affect creators’ incentives, influence readership, and what it might mean for future research and learning. 

At the heart of these questions is copyright law. Over two dozen class-action copyright lawsuits have been filed between November 2022 and today against companies such as Microsoft, Google, OpenAI, Meta, and others. Additionally, congressional leadership, state legislatures, and regulatory agencies have held dozens of hearings to reconcile existing intellectual property law with artificial intelligence. As one of the primary legal mechanisms for promoting the “progress of science and the useful arts,” copyright law plays a critical role in creating, producing, and disseminating information. 

We are convinced that how policymakers shape copyright law in response to AI will have a lasting impact on whether and how the law supports democratic values and serves the common good. That is why Authors Alliance has already devoted considerable effort to these issues, and this project will allow us to expand those efforts at this critical moment. 

AI Legal Fellow
As part of the project, we’re pleased to add an AI Legal Fellow to our team to support the project. The position requires a law degree and demonstrated interest and experience with artificial intelligence, intellectual property, and legal technology issues. We’re particularly interested in someone with a demonstrated interest in how copyright law can serve the public interest. This role will require significant research and writing. Pay is $90,000/yr, and it is a two-year term position. Read more about the position here. We’ll begin reviewing applications immediately and do interviews on a rolling basis until filled. 

As we get going, we’ll have much more to say about this project. We will have some funds available to support research subgrants, organize several workshops and symposia, and offer numerous opportunities for public engagement. 

About the John S. and James L. Knight Foundation
We are social investors who support democracy by funding free expression and journalism, arts and culture in community, research in areas of media and democracy, and in the success of American cities and towns where the Knight brothers once had newspapers. Learn more at kf.org and follow @knightfdn on social media.

Public Domain Day—A Diversion to Sound Recordings

Posted December 29, 2024
Image of 78 RPM disc on Victor player
78 RPM disc on Victor player (photo © Eric Harbeson)

Happy Public Domain Day! 

Every January 1st the United States adds a new crop of works to its public domain. Though the term of copyright is very long, the Constitution provides that it must—eventually—end. This transition is arguably the most important moment in the life of a creative work, excepting only its initial creation. The end goal of copyright in the first place is to encourage the creation of new works, and the public domain is the shared pool out of which those new works may be freely forged. For a great explainer about the value of the public domain, check out the annual Public Domain Day post by our friends at the Center for the Study of the Public Domain! In this post, we thought we would do something a little different from our normal fare and spend some time talking about sound recordings, which only started entering the public domain in the last few years.

As a general rule, works published prior to 1978 have a maximum copyright term of 95 years. Thus (because copyright terms run through the end of the calendar year) on January 1 all works first published in 1929 will be free to use. Well, almost all works. From a copyright perspective, sound recordings are a bit different in several ways, and one way is they are subject to somewhat longer protection. The discrepancy illustrates the curious space sound recordings occupy in US copyright law.

The history of why sound recordings are treated differently is very interesting. It is also too involved to do it justice in a blog post. For thorough treatments of the subject, check out the fascinating articles by Bruce Epperson and Zvi S. Rosen. Each goes into depth on the legal conflicts that arose as artists, scholars, inventors, and policymakers struggled to understand and form policy around the emergence of two new media—sound recordings and piano rolls—that were unlike anything previously known in history.

In short, federal copyright law has only protected sound recordings since February 15, 1972, which was the effective date of the Sound Recordings Act of 1971. Recordings made since that date have been subject to the same laws as any other copyrightable work from the moment they were fixed. However, the Act was not retroactive, so recordings fixed before that date were excluded from federal protection. The Copyright Act of 1976, which completely revised U.S. copyright law, preserved that dichotomy: recordings fixed after 1972 were included; pre-1972 recordings were not.

Though pre-1972 recordings were not protected by federal law, states were free to protect them. States protected pre-1972 recordings perpetually through their common law (as though they unpublished works, which prior to the 1976 act were also protected by state common law), with most states then codifying that protection in criminal statutes. Though state protection was in theory perpetual, the 1976 Act nonetheless put a time limit on that protection—all state protection for pre-1972 recordings was to be preempted and cease in 2047 (75 years after the Sound Recordings Act). The Copyright Term Extension Act later extended that date by 20 years. Thus, all pre-1972 recordings, regardless of how old they were, would remain under state protection until 2067. Had the situation not changed, by the time any domestic recordings entered the public domain, the oldest recording would have been more than 200 years old!

The cabining of pre-1972 recordings was narrowed in 1994, when the Uruguay Round Agreements Act (URAA) took effect. The URAA amended the Copyright Act to provide retroactive copyright to foreign works that had entered the public domain due to failure to non-compliance with formalities, lack of national eligibility, or because they were pre-1972 recordings. This Act inspired the case of Golan v. Holder, which challenged (and eventually affirmed) Congress’s ability to remove works from the public domain. Among other things, the URAA brought pre-1972 foreign recordings (which were not actually in the public domain) under federal copyright. Meanwhile, domestic pre-1972 recordings remained under the exclusive care of the states, with a true public domain only in the distant future.

One result of leaving protection to the states was variations in treatment among the states. The term of protection is one good example. Most states protected recordings for the maximum term permitted by Congress; however, some states cut off protection earlier. One notable example is Colorado, which protected recordings only for 56 years—significantly shorter than in other states (Colorado was also one of only two states to refer to that protection as “copyright”). There were also differences in the nature of the protection. All but two states (Indiana and Vermont) enacted criminal statutes codifying protection; only one state (California) also codified civil penalties. Some states had exceptions for non-commercial use, or for libraries, or both; some had neither. Only one state (New York) had established a common law fair use doctrine for sound recordings, and none had codified the doctrine in a statute. 

This resulted in some strange paradoxes in the law, with often sharp disparities. One such result was that recordings and the underlying musical composition they embodied might have different terms of protection. As one typical example, the sheet music for George M. Cohan’s 1917 song, “Over There,” entered the public domain at the end of 1992; however, Nora Bayes’s recording of the same song from the same year would have been protected in most states until 2067. Another example of disparity resulted from differences in federal and state treatment of public performances. Because no state explicitly provided exclusive rights to public performances, for example, no license was needed to publicly play the sound recording of Marvin Gaye’s 1971 hit, “What’s Going On,” but a license was required to play his recording of “Let’s Get It On,” recorded only two years later (licenses were still required for performance of the underlying musical works).

Musical CompositionSound Recording
What’s Going On? (1971)License requiredNo license required
Let’s Get It On (1973)License requiredLicense required
Licensing public performance of Marvin Gaye works before the Music Modernization Act.

The situation was unnecessarily complicated, and was frustrating to nearly everyone involved. Recording artists and labels disliked the disparity in performance rights between pre- and post-1972 recordings. Public interest groups, such as librarians and archivists, disliked the lack of uniformity and the only sporadic limitations and exceptions. The issue came to a head when Mark Volman and Howard Kaylan (aka Flo & Eddie), of The Turtles, brought a series of lawsuits attempting to establish that a public performance right existed under the common law of the states (though none had codified one). The failure of those lawsuits in part led to Congress’s passing the Music Modernization Act (MMA) of 2018, which, among other things, finally brought all pre-1972 recordings under federal law. 

However, the MMA did not simply apply the existing federal copyright law to pre-1972 sound recordings. Instead, Congress opted to create a parallel statute, which looks very similar to the “normal” copyright laws but in fact comprises an independent scheme. The distinction is evident in the term of protection. Unlike other copyrightable works from the era, which are treated uniformly, the length of the term for sound recordings varies depending on the year of first publication. Recordings published between 1923 and 1946 are protected for 100 years, and recordings published between 1947 and 1956 are protected for 110 years from the date of publication. The recordings that are protected the longest—unpublished recordings—are ironically the ones that are the most threatened by extended protection. Those recordings will remain locked until 2067, regardless of their fixation date.

In addition to establishing limited terms and giving pre-1972 recordings some parity with post-1972 recordings as far as the public performance right, the MMA established that most important copyright limitations and exceptions—especially the fair use and first sale doctrines and the library and teaching exceptions—all apply to pre-1972 recordings. It also tried some new things! It established a mechanism for making noncommercial use of a recording that isn’t being commercially exploited—maybe testing the waters of orphan works legislation. It also expanded the Section 108(h) “last 20 years” exception for libraries to apply to all pre-1972 recordings, regardless of publication status.

There were many questions the MMA left open. For example, prior to the MMA, each state had their own definition of who the default owner of a sound recording was. Congress preserved the confusing and sometimes contradictory patchwork, leaving the state definitions in place. As a result, when ownership has not been established by contract—as is often the case, for example, with archival recordings—the ownership will need to be determined by courts. Congress also left in question the relationship between the MMA and pre-1972 foreign recordings, especially as to whether the MMA’s noncommercial use mechanism applies. Since Congress did not create criminal penalties under the MMA, there is also some question as to whether they left in place the state criminal statutes. But Congress did establish, very firmly, that all sound recordings should no longer be protected for at least another half-century.

Which brings us back to the public domain. As of January 1, most works first published in 1929 will be in the public domain in the USA. 1929 was an important year for sound recordings. It was the last year cylinder recordings were produced. It was the year the last recording studio switched from acoustic to electrical recording techniques (though most had switched a few years earlier). Because of the MMA, those recordings will enter the public domain in the near future, but as a result of the strange history of sound recordings copyright, it will not happen this year. Americans will have to wait five more years for the complete works from those eras to finish entering the public domain.

But thanks to the MMA, published recordings from 1924 are entering the public domain, which would otherwise not be! The class of 1924 includes several important recordings, including the very first recordings of George Gershwin’s Rhapsody in Blue, Al Jolson’s “California Here I Come,” and Isham Jones’s “It Had To Be You.” Despite sound recordings trailing other classes of works by a few years, the 2025 Public Domain Day is a good day for sound recordings enthusiasts!

AUTHORS ALLIANCE SUBMITS AMICUS BRIEF IN SEDLIK v. DRACHENBERG

Posted December 23, 2024
Kat Von D tracing the image of Miles Davis
in preparation for inking the tattoo

Although tattoos have existed for as long as human’s written history, legal disputes involving tattoos are a relatively new phenomenon. The case Sedlik v. Drachenberg, currently pending before the 9th Circuit, is particularly notable, as it marks the first instance of a court ruling on an artist’s use of copyrighted imagery in her tattoo art. 

More importantly, the case presents the 9th Circuit a first opportunity to interpret the fair use right in the wake of the Supreme Court’s 2023 Warhol decision. Authors Alliance has been closely monitoring circuit courts’ rulings on fair use and advocating for a proper interpretation of Warhol—including challenging the problematic fair use ruling issued by the 10th Circuit earlier this year, a decision that was later vacated in response to strong pushback from fair use advocates.

At the heart of the Sedlik v. Drachenberg legal debate are two creative professionals with very different backgrounds: 

The plaintiff in this case is Jeffery Sedlik. Sedlik is a successful professional photographer. He took a photo of the Jazz legend Miles Davis in 1989—an image that is at the focal point of the pending dispute. 

The defendant, Kat Von Drachenberg (“KVD”), is a celebrity tattoo artist. In recent years, she has shifted away from for-profit tattooing, opting instead to ink clients for free. In 2017, she freehand-tattooed Miles Davis on a client’s arm, largely drawing from the 1989 photograph captured by Sedlik.

Interestingly, neither party is new to the world of litigation. Sedlik has established a reputation for aggressive copyright enforcement—even filing a case with the Copyright Claims Board on its first day of operation. KVD, on the other hand, was sued by a former employee in 2022. 

Sedlik’s claims were straightforward—he alleges that KVD’s tattoo, as well as her social media posts documenting the process of her creating the tattoo, infringe his copyright in the Miles Davis photo.

For Sedlik to state a prima facie case of copyright infringement, he must prove that KVD had access to the Miles Davis photo (which is easy to prove in this case), and that the allegedly infringing tattoo and social media posts are substantially similar to the plaintiff’s photo. In this case, the district court left the question of substantial similarity and fair use to the jury, after refusing the motions for summary judgement on copyright infringement issues in May 2022. 

The jury returned a verdict in January 2024 that the tattoo inked by KVD and some of her social media posts are not substantially similar to Sedlik’s photo. The jury also determined that the rest of KVD’s social media posts, documenting her process of creating the tattoo in question, were fair use. In short, the jury concluded there was no copyright infringement.

On May 3rd, 2024, the district court judge denied Sedlik’s motions for judgment as a matter of law and for a new trial. Faced with the jury’s adverse decision, Sedlik argued, among other things, that the jury erred in finding no substantial similarity. The judge, however, upheld the jury’s finding that KVD’s works had a different concept and feel from Sedlik’s photo and that KVD only copied the unprotected elements of the photo. Sedlik tried to argue that the legal question of fair use should not have been left to the jury. However, the court was unpersuaded, highlighting that Sedlik had remained silent on this procedural issue until after receiving an unfavorable verdict.  

Following the ruling on his motions, Sedlik appealed, and the case is now in front of the 9th Circuit. Anticipating the far-reaching consequences for artists and authors depending on how the 9th Circuit will interpret Warhol, Authors Alliance filed an amicus brief in support of KVD.

Both Sedlik and KVD in this case argued that Warhol supported their side. Sedlik proposed a unique test, that a fair use must either target the original copyrighted work, or otherwise have a compelling justification for the use. In our amicus brief, we illustrated how that is not the correct reading of Warhol. Under Warhol, a distinct purpose is required for the first factor to tilt in favor of fair use. The Warhol Court only analyzed “targeting” and “compelling justification” because Warhol’s secondary use of the Goldsmith photo shared the exact same purpose as the photo, both for the purpose of appearing on the cover of a magazine. This is not the case with KVD’s freehand tattoo and Sedlik’s photo: they serve substantially distinct purposes.   

Authors routinely borrow from other’s copyrighted works for reporting, research, teaching, as well as to memorialize, preserve, or provide historical context. These uses by authors have historically been considered fair use, and often have purposes distinct from the copyrighted works used; but they do not necessarily “target” the works being used, nor do they have “compelling justifications” beyond the broad justification that authors are promoting the goal of copyright—”to promote the progress of science and the arts.”

In our brief, we also stressed how a successful commercial entity can nevertheless make noncommercial uses, as already demonstrated in the case of Google Books and Hachette. We also argued that social media posts are not commercial by default, just by virtue of drawing attention to the original poster. Many successful authors maintain active social media presence. The fact that authors invariably write to capture and build an audience through these sites does not automatically render their uses “commercial.” “Commerciality” under the fair use analysis has always been limited to the act of merchandising in the market, such as selling stamps, t-shirts, or mugs.

Finally, we explained to the court why copyright holders must offer concrete evidence to prove the existence, or likelihood of developing, licensing market, before they can argue a secondary use serves as “market substitute.” If we accepted Sedlik’s argument that his protected market includes everything that he’s willing to receive licensing fees for, it will all but wipe out fair use. We want authors and other creatives to continue to engage in fair use, including to document their creative processes—as KVD has done in this case in her social media posts, without being told they have to pay for each instance of use as soon as demanded by a rightsholder.  

Authors Alliance 2024 Annual Report

Posted December 17, 2024

Authors Alliance celebrated an important milestone in 2024: our 10th anniversary! 

Quite a lot has changed since 2014, but our mission remains the same. We exist to advance the interests of authors who want to serve the public good by sharing their creations broadly.  I’m pleased to share our 2024 annual report, where you can find highlights of our work this year to promote laws, policies, and practices that enable authors to reach wide audiences.

Our success in 2024 was largely due to the wonderful collaboration and support we have from our members. You’ll see in the report a number of ongoing projects and issues we are working to address: legal questions about open access publishing, rights reversion at scale, supporting text data mining research, addressing contractual override of fair use,  AI and copyright, and more. As we look to 2025, I would love to hear from you if you have a special interest in any of these projects and would like to contribute your ideas, time, or expertise to help us tackle them.

I’m grateful for those of you who contributed financially to make 2024 a success. Authors Alliance is funded almost entirely by gifts and grants, and so we truly rely on you. As we end the year, I hope you will consider giving if you haven’t done so already. You can donate online here.

Thank you,

Dave Hansen
Executive Director 


Restricting Innovation: How Publisher Contracts Undermine Scholarly AI Research

Posted December 6, 2024
Photo by Josh Appel on Unsplash

This post is by Rachael Samberg, Director, Scholarly Communication & Information Policy, UC Berkeley Library and Dave Hansen, Executive Director, Authors Alliance

This post is about the research and the advancement of science and knowledge made impossible when publishers use contracts to limit researchers’ ability to use AI tools with scholarly works. 

Within the scholarly publishing community, mixed messages pervade about who gets to say when and how AI tools can be used for research reliant on scholarly works like journal articles or books. Some scholars voiced concern (explained more here) when major scholarly publishers like Wiley or Taylor & Francis entered lucrative contracts with big technology companies to allow for AI training without first seeking permission from authors. We suspect that these publishers have the legal right to do so since most publishers demand that authors hand over extensive rights in exchange for publishing their work. And with the backdrop of dozens of pending AI copyright lawsuits, who can blame the AI companies for paying for licenses, if for no other reason than avoiding the pain of litigation? While it stings to see the same large commercial, academic publishers profit yet again off of the work academic authors submit to them for free, we continue to think there are good ways for authors to retain a say in the matter. 

 Big tech companies are one thing, but what about scholarly research? What about the large and growing number of scholars who are themselves using scholarly copyrighted content with AI tools to conduct their research? We currently face a situation in which publishers are attempting to dictate how and when researchers can do that work, even when authors’ fair use rights to use and derive new understandings from scholarship clearly allow for such uses. 

How vendor contracts disadvantage US researchers

We have written elsewhere (in an explainer and public comment to the Copyright Office) why training AI tools, particularly in the scholarly and research context, constitutes a fair use under U.S. Copyright law. Critical for the advancement of knowledge, training AI is based on a statutory right already held by all scholarly authors engaging in computational research and one that lawmakers should preserve. 

The problem U.S. scholarly authors presently face with AI training is that publishers restrict their access to these statutory rights through contracts that override them: In the United States, publishers can use private contracts to take away statutory fair use rights that researchers would otherwise hold under Federal law. In this case, the private contracts at issue are the electronic resource (e-resource) license agreements that academic research libraries sign to secure campus access to electronic journal, e-book, data, and other content that scholars need for their computational research.

Contractual override of fair use is a problem that disparately disadvantages U.S. researchers. As we have described elsewhere, more than forty countries, including the European Union, expressly reserve text mining and AI training rights for scientific research by research institutions. Not only do scholars in these countries not have to worry whether their computational research with AI is permitted, but also: They do not risk having those reserved rights overridden by contract. The European Union’s Copyright Digital Single Market Directive and recent AI Act nullify any attempt to circumscribe the text and data mining and AI training rights reserved for scientific research within research organizations. U.S. scholars are not as fortunate. 

In the U.S., most institutional e-resource licenses are negotiated and managed by research libraries, so it is imperative that scholars work closely with their libraries and advocate to preserve their computational research and AI training rights within the e-resource license agreements that universities sign. To that end, we have developed adaptable licensing language to support institutions in doing that nationwide. But while this language is helpful, the onus of advocacy and negotiation for those rights in the contracting process remains. Personally, we have found it helpful to explain to publishers that they must consent to these terms in the European Union, and can do so in the U.S. as well. That, combined with strong faculty and administrative support (such as at the University of California), makes for a strong stance against curtailment of these rights.

But we think there are additional practical ways for libraries to illustrate—both to publishers and scholarly authors—exactly what would happen to the advancement of knowledge if publishers’ licensing efforts to curtail AI training were successful. One way to do that is by “unpacking” or decoding a publisher’s proposed licensing restriction, and then demonstrating the impact that provision would have on research projects that were never objectionable to publishers before, and should not be now. We’ll take that approach below.

Decoding a publisher restriction

A commercial publisher recently proposed the following clause in an e-resource agreement:

Customer [the university] and its Authorized Users [the scholars] may not:

  1. directly or indirectly develop, train, program, improve, and/or enrich any artificial intelligence tool (“AI Tool”) accessible to anyone other than Customer and its Authorized Users, whether developed internally or provided by a third party; or
  2. reproduce or redistribute the Content to any third-party AI Tool, except to the extent limited portions of the Content are used solely for research and academic purposes (including to train an algorithm) and where the third-party AI Tool (a) is used locally in a self-hosted environment or closed hosted environment solely for use by Customer or Authorized Users; (b) is not trained or fine-tuned using the Content or any part thereof; and (c) does not share the Content or any part thereof with a third party.  

What does this mean?

  • The first paragraph forbids the training or improving of any AI tool if it’s accessible or released to third parties. And, it further forbids the use of any computational outputs or analysis that are derived from the licensed content from being used to train any tool available to third parties. 
  • The second paragraph is perhaps even more concerning. It provides that when using third party AI tools of any kind, a scholar can use only limited portions of the licensed content with the tools, and are prohibited from doing any training at all of third party tools even if it’s a non-generative AI tool and the scholar is performing the work in a completely closed and highly secure research environment.

What would the impact of such a restrictive licensing provision be on research? 

It would mean that every single one of the trained tools in the following projects could never be disseminated. In addition, for the projects below that used third-party AI tools, the research would have been prohibited full-stop because the third-party tools in those projects required training which the publisher above is attempting to prevent:

Tools that could not be disseminated

  1. In 2017, chemists created and trained a generative AI tool on 12,000 published research papers regarding synthesis conditions for metal oxides, so that the tool could identify anticipated chemical outputs and reactions for any given set of synthesis conditions entered into the tool. The generative tool they created is not capable of reproducing or redistributing any licensed content from the papers; it has merely learned conditions and outcomes and can predict chemical reactions based on those conditions and outcomes. And this beneficial tool would be prohibited from dissemination under the publisher’s terms identified above.
  2. In 2018, researchers trained an AI tool (that they had originally created in 2014) to understand whether a character is “masculine” or “feminine” by looking at the tacit assumptions expressed in words associated with that character. That tool can then look at other texts and identify masculine or feminine characters based on what it knows from having been trained before. The implications are that scholars can therefore use texts from different time periods with the tool to study representations of masculinity and femininity over time. No licensed content, no licensed or copyrighted books from a publisher can ever be released to the world by sharing the trained tool; the trained tool is merely capable of topic modeling—but the publisher’s above language would prohibit its dissemination nevertheless. 

Tools that could neither be trained nor disseminated 

  1. In 2019, authors used text from millions of books published over 100 years to analyze cultural meaning. They did this by training third-party non-generative AI word-embedding models called Word2Vec and GLoVE on multiple textual archives. The tools cannot reproduce content: when shown new text, they merely represent words as numbers, or vectors, to evaluate or predict how similar words in a given space are semantically or linguistically. The similarity of words can reveal cultural shifts in understanding of socioeconomic factors like class over time. But the publisher’s above licensing terms would prohibit the training of the tools to begin with, much less the sharing of them to support further or different inquiry. 
  2. In 2023, scholars trained a third-party-created open-source natural language processing (NLP) tool called Chemical Data Extractor (CDE). Among other things, CDE can be used to extract chemical information and properties identified in scholarly papers. In this case, the scholars wanted to teach CDE to parse a specific type of chemical information: metal-organic frameworks, or MoFs. Generally speaking, the CDE tool works by breaking sentences into “tokens” like parts of speech and referenced chemicals. By correlating tokens, one can determine that a particular chemical compound has certain synthetic properties, topologies, reactions with solvents, etc. The scholars trained CDE specifically to parse MoF names, synthesis methods, inorganic precursors, and more—and then exported the results into an open source database that identifies the MoF properties for each compound. Anyone can now use both the trained CDE tool and the database of MoF properties to ask different chemical property questions or identify additional MoF production pathways—thereby improving materials science for all. Neither the CDE tool nor the MoF database reproduces or contains the underlying scholarly papers that the tool learned from. Yet, neither the training of this third-party CDE tool nor its dissemination would be permitted under the publisher’s restrictive licensing language cited above.

Indeed, there are hundreds of AI tools that scholars have trained and disseminated—tools that do not reproduce licensed content—and that scholars have created or fine-tuned to extract chemical information, recognize faces, decode conversations, infer character types, and so much more. Restrictive licensing language like that shown above suppresses research inquiries and societal benefits that these tools make possible. It may also disproportionately affect the advancement of knowledge in or about developing countries, which may lack the resources to secure licenses or be forced to rely on open-source or poorly-coded public data—hindering journalism, language translation, and language preservation.

Protecting access to facts

Why are some publishers doing this? Perhaps to reserve the opportunity to develop and license their own scholarship-trained AI tools, which they could then license at additional cost back to research institutions. We could speculate about motivations, but the upshot is that publishers have been pushing hard to foreclose scholars from training and dissemination AI tools that now “know” something based on the licensed content. That is, such publishers wish to prevent tools from learning facts about the licensed content. 

However, this is precisely the purpose of licensing content. When institutions license content for their scholars to read, they are doing so for the scholars to learn information from the content. When scholars write about it or teach about the content, they are not regenerating the actual expression from the content—the part that is protected by copyright; rather the scholars are conveying the lessons learned from the content—facts not protected by copyright. Prohibiting the training of AI tools and the dissemination of those tools is functionally equivalent to prohibiting scholars from learning anything about the content that institutions are licensing for that very purpose, and that scholars have written to begin with! Publishers should not be able to monopolize the dissemination of information learned from scholarly content, and especially when that information is used non-commercially.

For these reasons, when we negotiate to preserve AI usage and training rights, we generally try to achieve the following outcomes which would promote—rather than prohibit—all of the research projects described above:

The sample language we’ve disseminated empowers others to negotiate for these outcomes. We hope that, when coupled with the advocacy tools we’ve provided above, scholars and libraries can protect their AI usage and training rights, while also being equipped to consider how they want their own works to be used.

Developing a public-interest training commons of books

Posted December 5, 2024
Photo by Zetong Li on Unsplash

Authors Alliance is pleased to announce a new project, supported by the Mellon Foundation, to develop an actionable plan for a public-interest book training commons for artificial intelligence. Northeastern University Library will be supporting this project and helping to coordinate its progress.

Access to books will play an essential role in how artificial intelligence develops. AI’s Large Language Models (LLMs) have a voracious appetite for text, and there are good reasons to think that these data sets should include books and lots of them. Over the last 500 years, human authors have written over 129 million books. These volumes, preserved for future generations in some of our most treasured research libraries, are perhaps the best and most sophisticated reflection of all human thinking. Their high editorial quality, breadth, and diversity of content, as well as the unique way they employ long-form narratives to communicate sophisticated and nuanced arguments and ideas make them ideal training data sources for AI.

These collections and the text embedded in them should be made available under ethical and fair rules as the raw material that will enable the computationally intense analysis needed to inform new AI models, algorithms, and applications imagined by a wide range of organizations and individuals for the benefit of humanity. 

Currently, AI development is dominated by a handful of companies that, in their rush to beat other competitors, have paid insufficient attention to the diversity of their inputs, questions of truth and bias in their outputs, and questions about social good and access. Authors Alliance, Northeastern University Library, and our partners seek to correct this tilt through the swift development of a counterbalancing project that will focus on AI development that builds upon the wealth of knowledge in nonprofit libraries and that will be structured to consider the views of all stakeholders, including authors, publishers, researchers, technologists, and stewards of collections. 

The main goal of this project is to develop a plan for either establishing a new organization or identifying the relevant criteria for an existing organization (or partnership of organizations) to take on the work of creating and stewarding a large-scale public interest training commons of books.

We seek to answer several key questions, such as: 

  • What are the right goals and mission for such an effort, taking into account both the long and short-term;
  • What are the technical and logistical challenges that might differ from existing library-led efforts to provide access to collections as data;
  • How to develop a sufficiently large and diverse corpus to offer a reasonable alternative to existing sources;
  • What a public-interest governance structure should look like that takes into account the particular challenges of AI development;
  • How do we, as a collective of stakeholders from authors and publishers to students, scholars, and libraries, sustainably fund such a commons, including a model for long-term sustainability for maintenance, transformation, and growth of the corpus over time;
  • Which combination of legal pathways is acceptable to ensure books are lawfully acquired in a way that minimizes legal challenges;
  • How to respect the interests of authors and rightsholders by accounting for concerns about consent, credit, and compensation; and
  • How to distinguish between the different needs and responsibilities of nonprofit researchers, small market entrants, and large commercial actors.

The project will include two meetings during 2025 to discuss these questions and possible ways forward, additional research and conversations with stakeholders, and the development and release of an ambitious yet achievable roadmap.

Support Authors Alliance!

Posted December 3, 2024

As we end the year, I’m writing to ask for your financial support by giving toward our end-of-year campaign (click here to donate online).

In May, Authors Alliance marked its 10th anniversary. We’ve experienced tremendous support and enthusiasm for our work over the last decade, and your collaboration has been an important part of our success. I hope you’ll help Authors Alliance take on our next decade. 

We’re proud of our work promoting authorship for the public good by supporting authors who write to be read. In the past year, we secured expanded copyright exemptions for text and data mining researchhelped defend authors’ fair use rights in courtlaunched an important initiative to clarify legal pathways for open access to federally funded research, and much more. We’ve also continued to help authors develop a deeper understanding of how complex policy issues can affect their work, drawing over 20,000 attendees for our in-person and online events on topics such as text and data mining, open access, artificial intelligence, and competition law. 

For 2025, we have our work cut out for us. As policymakers actively consider changes to how the law accommodates free expression, access to information, and new technology, we continue to find that we are among the only voices defending authors’ rights to research, write, and share their work for the benefit of the public. Your support for Authors Alliance will help us continue to speak out in support of authors who value the public interest.

Donate Online Today

Thank you,
Dave Hansen
Executive Director

New White Paper on Open Access and U.S. Federal Information Policy

Posted November 18, 2024
Photo by Sara Cottle on Unsplash

Authors Alliance and SPARC have released the first of four planned white papers addressing legal issues surrounding open access to scholarly publications under the 2022 OSTP memo (the “Nelson Memo”). The white papers are part of a larger project (described here) to support legal pathways to open access. 

This first paper discusses the “Federal Purpose License,” which is newly relevant to discussions of federal public access policies in light of the Nelson Memo.

The white paper is available here and supporting materials are here.

The FPL, found in 2 C.F.R. § 200.315(b), works like any other copyright licensing agreement between two parties. It is a voluntary agreement between author and agency that, as a condition of federal funding, the agency reserves a nonexclusive license to “reproduce, publish, or otherwise use the work for Federal purposes and to authorize others to do so.” The FPL was updated, effective October 1, to clarify that the reserved license specifically includes the right to deposit copyrighted works produced pursuant to a grant in agency-designated public access repositories.

With the OSTP memos instructing all agencies to make the results of federally-funded projects available to the public immediately upon publication, the FPL provides an elegant legal basis for doing so. Because the FPL is a signed, written, non-exclusive license that springs to life immediately when copyright in the works vest, it survives any future transfers of rights in the work. As a part of Uniform Guidance for all grant-making agencies, it provides consistency across federal grants, simplifying things for grant recipients, who have plenty of other things to worry about (it’s not entirely uniform, though, since some agencies have supplemented the FPL with License text of their own, expanding their rights under the License).

This protects both agencies and authors. Agencies must have permission in order to host and distribute works in their repositories. The FPL ensures that the agency has that authorization and that it continues even after publication rights have been subsequently assigned to a publisher. Meanwhile, authors are—or will be—required under their grant agreements to deposit their federally-funded peer-reviewed articles in the agency’s designated repository. The FPL ensures that, even if an author were to sign exclusive rights in a work to a publisher prior to complying with the deposit mandate, the author could still do so, despite no longer having any rights in the work herself.

The paper analyzes two ambiguous points in the FPL, namely, the scope of what rights agencies have as “Federal purposes” and what rights the agency may subsequently authorize for third parties. As there are no clear answers to these questions, the paper does not draw conclusions; it does, however, attempt to give some context and basis for how to interpret the FPL.

The next papers in this series will explore issues surrounding the legal authority underlying the public access policy, article versioning, and the policy’s interaction with institutional IP policies. Stay tuned for more!

Revived Class Action Against McGraw Hill: the Importance of Publishing Contracts

Posted November 15, 2024

open book with glasses on top

On November 6th, the 2nd Circuit Court of Appeals overturned the lower court’s dismissal in Flynn v. McGraw Hill, and allowed the plaintiffs’ breach of contract claim to move forward. 

The breach of contract claim involves McGraw Hill’s alleged practice of reducing or ceasing royalty payments on revenues generated through McGraw Hill’s online platform, Connect, which hosts electronic textbooks and related course materials since its launch in 2009. The publishing contracts at issue specified that McGraw Hill would publish the plaintiffs’ textbooks “at its own expense” and that royalties would be based on “Publisher’s net receipts”—defined mostly as “the Publisher’s selling price, less discounts, credits, and returns, or a reasonable reserve for returns;” although the initially signed contracts only covered print works, McGraw Hill later amended the contracts to cover electronic works under the same royalties structure. McGraw Hill paid royalties based on the entire revenue from ebook sales through Connect, which included both the ebook and its accompanying materials such as PowerPoint lesson plans and test banks.

This changed in 2020, according to the plaintiffs, when McGraw Hill started paying royalties solely on sales attributed to the ebooks, excluding the revenue derived from the accompanying materials, despite the fact that the accompanying materials cannot be bought independent of the ebook. Under the new practice, McGraw Hill would unilaterally determine which part of the revenue is attributable to the ebooks, their accompanying materials, or the Connect platform, even though the sales are always based on a “single unitary price”.

The plaintiffs argue that this new arrangement violated McGraw Hill’s promise to publish the works “at its own expense,” a provision that should have meant authors wouldn’t be charged for the cost of operating or maintaining the publisher’s infrastructure; this claim is now allowed to go forward. The claim related to “net receipts” was again dismissed.

While the ongoing developments in this case are worth watching closely, it also serves as a timely reminder—especially in light of publishers’ licensing content for AI training—for authors to carefully review and negotiate their publishing agreements, and to rely on the contractual terms that hold publishers accountable to their promises.

Let’s take this opportunity to quickly remind ourselves of a couple of less-discussed contractual terms that may in fact be too important to ignore.

1. “…media now known and may be developed in the future”

The harm plaintiffs are claiming, in this case, is a whopping 25% to 35% drop in royalties when works are published on McGraw Hill’s online platform. Although this case only arose out of the electronic rights of textbooks, it reminds us how the advent of new technology could easily undermine instead of boost the income of authors.

Barely a decade ago, most experts of the publishing industry believed that the economics of e-book publishing were more favorable to publishers, as e-books are cheaper to produce than print books. As a result, authors should expect to receive a much larger share of the revenue—well above the typical 10-15% of the retail price for trade books.

The Flynn case confirms many authors’ suspicion that authors may not necessarily share in the financial boon brought by new technologies. It is thus important for authors to be wary of a broad copyright license that allows all future technology for disseminating the authors’ works. 

It’s worth reviewing terms that address the publisher’s ability to license your works in specific contexts, including digital platforms and emerging technology that are not named. Instead of “media now known and may be developed in the future,” authors should consider limiting the publication of their works to specific, enumerated media, such as print books or ebooks. Failing that, authors should propose alternative terms that could safeguard their interests, such as a clause that allows for rights reversion if royalties fall below a certain level.

2. Royalty Audit

A common feature of publishing contracts is a clause that allows authors to audit the publisher’s accounting. While it may not seem like a top priority at first glance, authors should absolutely take advantage of this provision if it’s included in their agreement. An audit right provides authors with the legal right to review the publisher’s financial records to verify whether they are being compensated fairly and according to the terms of the contract.

Authors in the Flynn case learned about the new royalties arrangement through an email from the publisher. It is of course important for authors to monitor any communications sent by their publishers. However, it is not certain that publishers will always disclose it when they adopt a new method of calculating royalties, and certainly not a given that their accounting never makes any mistake. When authors become suspicious of their publisher’s deductions or other financial practices, the ability to audit can be crucial. Publishers may make deductions or shift expenses that are not immediately obvious to authors based on the royalties they receive. An audit can help uncover if a publisher is deducting expenses that are unjustified (such as fees for maintaining online systems, as in this case). The audit right can be an essential tool for discovering accounting discrepancies and ensuring the publisher is acting in good faith.

As generative AI tools become more prevalent, many authors are concerned about how their works may be used for AI training without their knowledge or consent. It’s important to remember that not all contracts automatically grant publishers or other entities the right to license works for use in AI training. If you have retained sublicensing rights, or your publishing contract offers a broader definition of net receipts or profits, you could be entitled to the revenue your publishers earned from selling your works to train AI. 

Just as with traditional royalties, income from AI licensing should be distributed according to the terms of the contract. If you’re uncertain about whether you are getting fairly compensated, don’t hesitate to utilize the auditing right to request detailed information from your publisher.

Final Thoughts: Be Proactive and Stay Informed

At the heart of the Flynn v. McGraw Hill case is a breach of contract claim. The plaintiffs argue that McGraw Hill’s royalty deductions for maintaining its online system violated the terms of the agreement. Central to the argument is the publisher’s promise to ‘publish at its own expense.’ This case serves as a prime example of how important it is to scrutinize the details of a publishing agreement, where the devil often lies.

Many publishing agreements are complex and may contain clauses that, while seemingly minor, can have significant financial and creative consequences. It’s essential that authors take the time to review their contracts thoroughly, ideally consulting with colleagues and mentors who have more extensive experience with similar situations, to fully understand—at the very least—how their income will be calculated and what rights they are granting to the publisher.

The DMCA 1201 Rulemaking: Summary, Key Takeaways, and Other Items of Interest

Posted November 8, 2024

Last month, we blogged about the key takeaways from the 2024 TDM exemptions recently put in place by the Librarian of Congress, including how the 2024 exemptions (1) expand researchers’ access to existing corpora, (2) definitively allow the viewing and annotation of copyrighted materials for TDM research purposes, and (3) create new obligations for researchers to disclose security protocols to trade associations. Beyond these key changes, the TDM exemptions remain largely the same: researchers affiliated with universities are allowed to circumvent TPMs to compile corpora for TDM research, provided that those copies of copyrighted materials are legally obtained and adequate security protocols are put in place.

We have since updated our resources page on Text and Data Mining and have incorporated the new developments into our TDM report: Text and Data Mining Under U.S. Copyright Law: Landscape, Flaws & Recommendations.

In this blog post, we share some further reflections on the newly expanded TDM exemptions—including (1) the use of AI tools in TDM research, (2) outside researchers’ access to existing corpora, (3) the disclosure requirement, and (4) a potential TDM licensing market—as well as other insights that emerged during the 9th triennial rulemaking.

The TDM Exemption

In other jurisdictions, such as the EU, Singapore, and Japan, legal provisions that permit “text data mining” also allow a broad array of uses, such as general machine learning and generative AI model training. In the US, exemptions allowing TDM so far have not explicitly addressed whether AI could be used as a tool for conducting TDM research. In this round of remaking, we were able to gain clarity on how AI tools are allowed to aid TDM research. Advocates for the TDM exemptions provided ample examples of how machine learning and AI are key to conducting TDM research and asked that “generative AI” not be deemed categorically impermissible as a tool for TDM research. The Copyright Office agreed that a wide array of tools could be utilized for TDM research under the exemptions, including AI tools, as long as the purpose is to conduct “scholarly text and data mining research and teaching.” The Office was careful to limit its analysis to those uses and not address other applications such as compiling data—or reusing existing TDM corpora—for training generative AI models; those are an entirely separate issue from facilitating non-commercial TDM research.

Besides clarifying that AI tools are allowed for TDM research and that viewing and annotation are permitted for copyrighted materials, the new exemptions offer meaningful improvement to TDM researchers’ access to corpora. The previous 2021 exemptions allowed access for purposes of “collaboration,” but many researchers interpreted that narrowly, and the Office confirmed that “collaboration” was not meant to encompass outside research projects entirely unrelated to the original research for which the corpus was created. Under the 2021 exemptions, a TDM corpus could only be accessed by outside researchers if they are working on the same research project as the original compiler of the corpus. The 2024 exemptions’ expansion of access to existing corpora has two main components and advantages. 

The expansion now allows for new research projects to be conducted on existing corpora, permitting institutions that have created a corpus to provide access “to researchers affiliated with other nonprofit institutions of higher education, with all access provided only through secure connections and on the condition of authenticated credentials, solely for purposes of text and data mining research or teaching.” At the same time, it also opens up new possibilities for researchers at institutions who otherwise would not have access, as the new exemption does not require a precondition that the outside researchers’ institutions otherwise own copies of works in the corpora. The new exemptions pose some important limitations: only researchers at institutions of higher education are allowed this access, and nothing more than “access” is allowed—it does not, for example, allow the transfer of a corpus for local use. 

The Office emphasized the need for adequate security protections, pointing back to cases such as Authors Guild v. Google and Authors Guild v. HathiTrust, which emphasized how careful both organizations were, respectively, to prevent their digitized corpora from being misused. To take advantage of this newly expanded TDM exemption, it will be crucial for universities to provide adequate IT support to ensure that technical barriers do not impede TDM researchers. That said, the record for the exemption shows that existing users are exceedingly conscientious when it comes to security. There have been zero reported instances of security breaches or lapses related to TDM corpora being compiled and used under the exemptions. 

As we previously explained, the security requirements are changed in a few ways. The new rule clarifies that trade associations can send inquiries on behalf of rightsholders. However, inquiries must be supported by a “reasonable belief” that the sender’s works are in a corpus being used for TDM research. It remains to be seen how the new obligation to disclose security measures to trade associations would impact TDM researchers and their institutions. The Register circuitously called out demands by trade associations sent to digital humanities researchers in the middle of the exemption process with a two-week response deadline as unreasonable and quoted NTIA (which provides input on the exemptions) in agreement that  “[t]he timing, targeting, and tenor of these requests [for institutions to disclose their security protocols] are disturbing.”  We are hopeful that this discouragement from the Copyright Office will prevent any future large-scale harassment towards TDM researchers and their institutions, but we will also remain vigilant in case trade associations were to abuse this new power. 

Alongside the concerns over disclosure requirements, we have some questions about the Copyright Office’s treatment of fair use as a rationale for circumventing TPMs for TDM research. The Register restated her 2021 conclusion that “under Authors Guild, Inc. v. HathiTrust, lost licensing revenue should only be considered ‘when the use serves as a substitute for the original.’” The Office, in its recommendations, placed considerable weight on the lack of a viable licensing market for TDM, which raises a concern that, in the Office’s view, a use that once was fair and legal might lose that status when the rightsholder starts to offer an adequate licensing option. While this may never become a real issue for the existing TDM exemptions (because no sufficient licensing options exist for TDM researchers, and for the breadth and depth of content needed, it seems unlikely to ever develop), it nonetheless contributes to the growing confusion surrounding the stability of a fair use defense in the face of new licensing markets. 

These concerns highlight the need for ongoing advocacy in the realm of TDM research. Overall, the Register of Copyright recognizes TDM as “a relatively new field that is quickly evolving.” This means that we could ask the Library of Congress to relax the limitations placed on TDM if we can point to legitimate research-related purposes. But, due to the nature of this process, it also means TDM researchers do not have a permanent and stable right to circumvent TPMs. As the exemptions remain subject to review every three years, many large trade associations advocate for the TDM exemptions to be greatly limited or even canceled, wishing to stifle independent TDM research. We will continue to advocate for TDM researchers, as we did during the 8th and 9th triennial rulemaking. 

Looking beyond the TDM exemption, we noted a few other developments: 

Warhol has not fundamentally changed fair use

First, the Opponents of the renewal of the existing exemptions repeatedly pointed to Warhol Foundation v. Goldsmith—the Supreme Court’s most recent fair use opinion—to argue that it has changed the fair use analysis such that the existing exemptions should not be renewed. For example, the Opponents argued that the fair use analysis for repairing medical devices changed under Warhol because, according to them, commercial nontransformative uses were less likely to be fair. The Copyright Office did not agree. The Register said that the same fair use analysis as in 2021 applied and that the Opponents failed “to show that the Warhol decision constitutes intervening legal precedent rendering the Office’s prior fair use analysis invalid.” In another instance where the Opponents tried to argue that commerciality must be given more weight under Warhol, the Register pointed out that under Warhol commerciality is not dispositive and must be weighed against the purpose of the new use.  The arguments for revisiting the 2021 fair use analyses were uniformly rejected, which we think is good news for those of us who believe Warhol should be read as making a modest adjustment to fair use and not a wholesale reworking of the fair use doctrine. 

Does ownership and control of copies matter for access? 

One of the requests before the Office was an expansion of an exemption that allows for access to preservation copies of computer programs and video games. The Office rejected the main thrust of the request but, in doing so, also provided an interesting clarification that may reveal some of the Office’s thinking about the relationship between fair use and access to copies owned by the user: 

The Register concludes that proponents did not show that removing the single user limitation for preserved computer programs or permitting off-premises access to video games are likely to be noninfringing. She also notes the greater risk of market harm with removing the video game exemption’s premises limitation, given the market for legacy video games. She recommends clarifying the single copy restriction language to reflect that preservation institutions can allow a copy of a computer program to be accessed by as many individuals as there are circumvented copies legally owned.”

That sounds a lot like an endorsement of the idea that the owned-to-loaned ratio, a key concept in the controlled digital lending analysis, should matter in the fair use analysis (which is something the Hachette v. Internet Archive controlled digital lending court gave zero weight to). For future 1201 exemptions, we will have to wait and see whether the Office will use this framework in other contexts. 

Addressing other non-copyright and AI questions in the 1201 process

The Librarian of Congress’s final rule included a number of notes on issues not addressed by the rulemaking: 

“The Librarian is aware that the Register and her legal staff have invested a great deal of time over the past two years in analyzing the many issues underlying the 1201 process and proposed exemptions. 

Through this work, the Register has come to believe that the issue of research on artificial intelligence security and trustworthiness warrants more general Congressional and regulatory attention. The Librarian agrees with the Register in this assessment. As a regulatory process focused on technological protection measures for copyrighted content, section 1201 is ill-suited to address fundamental policy issues with new technologies.” 

Proponents tried to argue that the software platforms’ restrictions and barriers to conducting AI research, such as their account requirements, rate limits, and algorithmic safeguards, are circumventable TPMs under 1201, but the Register disagreed. The Register maintained that the challenges Proponents described arose not out of circumventable TPMs but out of third-party controlled Software as a Service platforms. This decision can be illuminating for TDM researchers seeking to conduct TDM research on online streaming media or social media posts.

The Librarian’s note went on to say: “The Librarian is further aware of the policy and legal issues involving a generalized ‘‘right to repair’’ equipment with embedded software. These issues have now occupied the White House, Congress, state legislatures, federal agencies, the Copyright Office, and the general public through multiple rounds of 1201 rulemaking. 

Copyright is but one piece in a national framework for ensuring the security, trustworthiness, and reliability of embedded software, as well as other copyright-protected technology that affects our daily lives. Issues such as these extend beyond the reach of 1201 and may require a broader solution, as noted by the NTIA.”

These notes give an interesting, though a bit confusing, insight into how the Librarian of Congress and the Copyright Office think about the role of 1201 rulemaking when they address issues that go beyond copyright’s core concerns. While we can agree that 1201 is ill-suited to address fundamental policy issues with new technology, it is also somewhat concerning that the Office and the Librarian view copyright more generally as part of a broader “national framework for ensuring the security, trustworthiness, and reliability of embedded software.”  While of course, copyright is sometimes used to further ends outside of its intended purpose, these issues are far from the core constitutional purpose of copyright law and we think they are best addressed through other means.