Category Archives: News

Licensing research content via agreements that authorize uses of artificial intelligence

Posted January 10, 2024
Photo by Hal Gatewood on Unsplash

This is a guest post by Rachael G. Samberg, Timothy Vollmer, and Samantha Teremi, professionals within the Office of Scholarly Communication Services at UC Berkeley Library. 

On academic and library listservs, there has emerged an increasingly fraught discussion about licensing scholarly content when scholars’ research methodologies rely on artificial intelligence (AI). Scholars and librarians are rightfully concerned that non-profit educational research methodologies like text and data mining (TDM) that can (but do not necessarily) incorporate usage of AI tools are being clamped down upon by publishers. Indeed, libraries are now being presented with content license agreements that prohibit AI tools and training entirely, irrespective of scholarly purpose. 

Conversely, publishers, vendors, and content creators—a group we’ll call “rightsholders” here—have expressed valid concerns about how their copyright-protected content is used in AI training, particularly in a commercial context unrelated to scholarly research. Rightsholders fear that their livelihoods are being threatened when generative AI tools are trained and then used to create new outputs that they believe could infringe upon or undermine the market for their works.

Within the context of non-profit academic research, rightsholders’ fears about allowing AI training, and especially non-generative AI training, are misplaced. Newly-emerging content license agreements that prohibit usage of AI entirely, or charge exorbitant fees for it as a separately-licensed right, will be devastating for scientific research and the advancement of knowledge. Our aim with this post is to empower scholars and academic librarians with legal information about why those licensing outcomes are unnecessary, and equip them with alternative licensing language to adequately address rightsholders’ concerns

To that end, we will: 

  1. Explain the copyright landscape underpinning the use of AI in research contexts;
  2. Address ways that AI usage can be regulated to protect rightsholders, while outlining opportunities to reform contract law to support scholars; and 
  3. Conclude with practical language that can be incorporated into licensing agreements, so that libraries and scholars can continue to achieve licensing outcomes that satisfy research needs.

Our guidance is based on legal analysis as well as our views as law and policy experts working within scholarly communication. While your mileage or opinions may vary, we hope that the explanations and tools we provide offer a springboard for discussion within your academic institutions or communities about ways to approach licensing scholarly content in the age of AI research.

Copyright and AI training

As we have recently explored in presentations and posts, the copyright law and policy landscape underpinning the use of AI models is complex, and regulatory decision-making in the copyright sphere will have ramifications for global enterprise, innovation, and trade. A much-discussed group of lawsuits and a parallel inquiry from the U.S. Copyright Office raise important and timely legal questions, many of which we are only beginning to understand. But there are two precepts that we believe are clear now, and that bear upon the non-profit education, research, and scholarship undertaken by scholars who rely on AI models. 

First, as the UC Berkeley Library has explained in greater detail to the Copyright Office, training artificial intelligence is a fair use—and particularly so in a non-profit research and educational context. (For other similar comments provided to the Copyright Office, see, e.g., the submissions of Authors Alliance and Project LEND). Maintaining its continued treatment as fair use is essential to protecting research, including TDM. 

TDM refers generally to a set of research methodologies reliant on computational tools, algorithms, and automated techniques to extract revelatory information from large sets of unstructured or thinly-structured digital content. Not all TDM methodologies necessitate usage of AI models in doing so. For instance, the words that 20th century fiction authors use to describe happiness can be searched for in a corpus of works merely by using algorithms looking for synonyms and variations of words like “happiness” or “mirth,” with no AI involved. But to find examples of happy characters in those books, a researcher would likely need to apply what are called discriminative modeling methodologies that first train AI on examples of what qualities a happy character demonstrates or exhibits, so that the AI can then go and search for occurrences within a larger corpus of works. This latter TDM process involves AI, but not generative AI; and scholars have relied non-controversially on this kind of non-generative AI training within TDM for years. 

Previous court cases like Authors Guild v. HathiTrust, Authors Guild v. Google, and A.V. ex rel. Vanderhye v. iParadigms have addressed fair use in the context of TDM and confirmed that the reproduction of copyrighted works to create and conduct text and data mining on a collection of copyright-protected works is a fair use. These cases further hold that making derived data, results, abstractions, metadata, or analysis from the copyright-protected corpus available to the public is also fair use, as long as the research methodologies or data distribution processes do not re-express the underlying works to the public in a way that could supplant the market for the originals. 

For the same reasons that the TDM processes constitute fair use of copyrighted works in these contexts, the training of AI tools to do that text and data mining is also fair use. This is in large part because of the same transformativeness of the purpose (under Fair Use Factor 1) and because, just like “regular” TDM that doesn’t involve AI, AI training does not reproduce or communicate the underlying copyrighted works to the public (which is essential to the determination of market supplantation for Fair Use Factor 4). 

But, while AI training is no different from other TDM methodologies in terms of fair use, there is an important distinction to make between the inputs for AI training and generative AI’s outputs. The overall fair use of generative AI outputs cannot always be predicted in advance: The mechanics of generative AI models’ operations suggest that there are limited instances in which generative AI outputs could indeed be substantially similar to (and potentially infringing of) the underlying works used for training; this substantial similarity is possible typically only when a training corpus is rife with numerous copies of the same work. And a recent case filed by the New York Times addresses this potential similarity problem with generative AI outputs.  

Yet, training inputs should not be conflated with outputs: The training of AI models by using copyright-protected inputs falls squarely within what courts have already determined in TDM cases to be a transformative fair use. This is especially true when that AI training is conducted for non-profit educational or research purposes, as this bolsters its status under Fair Use Factor 1, which considers both transformativeness and whether the act is undertaken for non-profit educational purposes. 

Were a court to suddenly determine that training AI was not fair use, and AI training was subsequently permitted only on “safe” materials (like public domain works or works for which training permission has been granted via license), this would curtail freedom of inquiry, exacerbate bias in the nature of research questions able to be studied and the methodologies available to study them, and amplify the views of an unrepresentative set of creators given the limited types of materials available with which to conduct the studies.

The second precept we uphold is that scholars’ ability to access the underlying content to conduct fair use AI training should be preserved with no opt-outs from the perspective of copyright regulation. 

The fair use provision of the Copyright Act does not afford copyright owners a right to opt out of allowing other people to use their works in any other circumstance, for good reason: If content creators were able to opt out of fair use, little content would be available freely to build upon. Uniquely allowing fair use opt-outs only in the context of AI training would be a particular threat for research and education, because fair use in these contexts is already becoming an out-of-reach luxury even for the wealthiest institutions. What do we mean?

In the U.S., the prospect of “contractual override” means that, although fair use is statutorily provided for, private parties like publishers may “contract around” fair use by requiring libraries to negotiate for otherwise lawful activities (such as conducting TDM or training AI for research). Academic libraries are forced to pay significant sums each year to try to preserve fair use rights for campus scholars through the database and electronic content license agreements that they sign. This override landscape is particularly detrimental for TDM research methodologies, because TDM research often requires use of massive datasets with works from many publishers, including copyright owners who cannot be identified or who are unwilling to grant such licenses. 

So, if the Copyright Office or Congress were to enable rightsholders to opt-out of having their works fairly used for training AI for scholarship, then academic institutions and scholars would face even greater hurdles in licensing content for research. Rightsholders might opt out of allowing their work to be used for AI training fair uses, and then turn around and charge AI usage fees to scholars (or libraries)—essentially licensing back fair uses for research. 

Fundamentally, this undermines lawmakers’ public interest goals: It creates a risk of rent-seeking or anti-competitive behavior through which a rightsholder can demand additional remuneration or withhold granting licenses for activities generally seen as being good for public knowledge or that rely on exceptions like fair use. And from a practical perspective, allowing opt-outs from fair uses would impede scholarship by or for research teams who lack grant or institutional funds to cover these additional licensing expenses; penalize research in or about underfunded disciplines or geographical regions; and result in bias as to the topics and regions that can be studied. 

“Fair use” does not mean “unregulated” 

Although training AI for non-profit scholarly uses is fair use from a copyright perspective, we are not suggesting AI training should be unregulated. To the contrary, we support guardrails because training AI can carry risk. For example, researchers have been able to use generative AI like ChatGPT to solicit personal information by bypassing platform safeguards.

To address issues of privacy, ethics, and the rights of publicity (which govern uses of people’s voices, images, and personas), there should be the adoption of best practices, private ordering, and other regulations. 

For instance, as to best practices, scholar Matthew Sag has suggested preliminary guidelines to avoid violations of privacy and the right to publicity. First, he recommends that AI platforms avoid training their large language models on duplicates of the same work. This would reduce the likelihood that the models could produce copyright-infringing outputs (due to memorization concerns), and it would also lessen the likelihood that any content containing potentially private or sensitive information would be outputted from having been fed into the training process multiple times. Second, Sag suggests that AI platforms engage in “reinforcement learning through human feedback” when training large language models. This practice could cut down on privacy or rights of publicity concerns by involving human feedback at the point of training, instead of leveraging filtering at the output stage.  

Private ordering would rely on platforms or communities to implement appropriate policies governing privacy issues, rights of publicity, and ethical concerns. For example, the UC Berkeley Library has created policies and practices (called “Responsible Access Workflows”) to help it make decisions around whether—and how—special collection materials may be digitized and made available online. Our Responsible Access Workflows require review of collection materials across copyright, contracts, privacy, and ethics parameters. Through careful policy development, the Library applies an ethics of care approach to making available online the collection content with ethical concerns. Even if content is not shared openly online, it doesn’t mean it’s unavailable for researchers for use in person; we simply have decided not to make that content available in digital formats with lower friction for use. We aim to apply transparent information about our decision-making, and researchers must make informed decisions about how to use the collections, whether or not they are using them in service of AI.

And finally, concerning regulations, countries like those in the EU have recently introduced an AI training framework that requires, among other things, the disclosure of source content, and the rights for content creators to opt out of having their works included in training sets except when the AI training is being done for research purposes by research organizations, cultural heritage institutions, and their members or scholars. United States agencies could consider implementing similar regulations here. 

But from a copyright perspective, and within non-profit academic research, fair use in AI training should be preserved without the opportunity to opt out for the reasons we discuss above. Such an approach regarding copyright would also be consistent with the distinction the EU has made for AI training in academic settings, as the EU’s Digital Single Market Directive bifurcates practices outside the context of scholarly research

While we favor regulation that preserves fair use, it is also important to note that merely preserving fair use rights in scholarly contexts for training AI is not the end of the story in protecting scholarly inquiry. So long as the United States permits contractual override of fair uses, libraries and researchers will continue to be at the mercy of publishers aggregating and controlling what may be done with the scholarly record, even if authors dedicate their content to the public domain or apply a Creative Commons license to it. So in our view, the real work that should be done is pursuing legislative or regulatory arrangements like the approximately 40 other countries that have curtailed the ability of contracts to abrogate fair use and other limitations and exceptions to copyright within non-profit scholarly and educational uses. This is a challenging, but important, mission.

Licensing guidance in the meantime 

While the statutory, regulatory, and private governance landscapes are being addressed, libraries and scholars need ways to preserve usage rights for content when training AI as part of their TDM research methodologies. We have developed sample license language intended to address rightsholders’ key concerns while maintaining scholars’ ability to train AI in text and data mining research. We drafted this language to be incorporated into amendments to existing licenses that fail to address TDM, or into stand-alone TDM and AI licenses; however, it is easily adaptable into agreements-in-chief (and we encourage you to do so). 

We are certain our terms can continue to be improved upon over time or be tailored for specific research needs as methodologies and AI uses change. But in the meantime, we think they are an important step in the right direction.

With that in mind, it is important to understand that within contracts applying U.S. law, more specific language controls over general language in a contract. So, even if there is a clause in a license agreement that preserves fair use, if it is later followed by a TDM clause that restricts how TDM can be conducted (and whether AI can be used), then that more specific language governs TDM and AI usage under the agreement. This means that libraries and scholars must be mindful when negotiating TDM and AI clauses as they may be contracting themselves out of rights they would otherwise have had under fair use. 

So, how can a library or scholar negotiate sufficient AI usage rights while acknowledging the concerns of  publishers? We believe publishers have attempted to curb AI usage because they are concerned about: (1) the security of their licensed products, and the fear that researchers will leak or release content behind their paywall; and (2) AI being used to create a competing product that could substitute for the original licensed product and undermine their share of the market. While these concerns are valid, they reflect longstanding fears over users’ potential generalized misuse of licensed materials in which they do not hold copyright. But publishers are already able to—and do—impose contractual provisions disallowing the creation of derivative products and systematically sharing licensed content with third-parties, so additionally banning the use of AI in doing so is, in our opinion, unwarranted.

We developed our sample licensing language to precisely address these concerns by specifying in the grant of license that research results may be used and shared with others in the course of a user’s academic or non-profit research “except to the extent that doing so would substantially reproduce or redistribute the original Licensed Materials, or create a product for use by third parties that would substitute for the Licensed Materials.” Our language also imposes reasonable security protections in the research and storage process to quell fears of content leakage. 

Perhaps most importantly, our sample licensing language preserves the right to conduct TDM using “machine learning” and “other automated techniques” by expressly including these phrases in the definition for TDM, thereby reserving AI training rights (including as such AI training methodologies evolve), provided that no competing product or release of the underlying materials is made. 

The licensing road ahead

As legislation and standards around AI continue to develop, we hope to see express contractual allowance for AI training become the norm in academic licensing. Though our licensing language will likely need to adapt to and evolve with policy changes and research or technological advancements over time, we hope the sample language can now assist other institutions in their negotiations, and help set a licensing precedent so that publishers understand the importance of allowing AI training in non-profit research contexts. While a different legislative and regulatory approach may be appropriate in the commercial context, we believe that academic research licenses should preserve the right to incorporate AI, especially without additional costs being passed to subscribing institutions or individual users, as a fundamental element of ensuring a diverse and innovative scholarly record.

Authors Alliance Submits Amicus Brief to the Second Circuit in Hachette Books v. Internet Archive

Posted December 21, 2023
Photo by Dylan Dehnert on Unsplash

We are thrilled to announce that we’ve submitted an amicus brief to the Second Circuit Court of Appeals in Hachette Books v. Internet Archive—the case about whether controlled digital lending is a fair use—in support of the Internet Archive. Authored by Authors Alliance Senior Staff Attorney, Rachel Brooke, the brief reprises many of the arguments we made in our amicus brief in the district court proceedings and elaborates on why and how the lower court got it wrong, and why the case matters for our members and other authors who write to be read.

The Case

We’ve been writing about this case for years—since the complaint was first filed back in 2020. But to recap: a group of trade publishers sued the Internet Archive in federal court in the Southern District of New York over (among other things) the legality of its controlled digital lending (CDL) program. The publishers argued that the practice infringed their copyrights, and Internet Archive defended its project on the grounds that it was fair use. We submitted an amicus brief in support of IA and CDL (which we have long supported as a fair use) to the district court, explaining that copyright is about protecting authors, and many authors strongly support CDL

The case finally went to oral argument before a judge in March of this year. Unfortunately, the judge ruled against Internet Archive, finding that each of the fair use factors favored the publishers. Internet Archive indicated that it planned to appeal, and we announced that we planned to support them in those efforts. Now, the case is before the Second Circuit Court of Appeals. After Internet Archive filed its opening brief last week, we (and other amici) filed our briefs in support of a reversal of the lower court’s decision.

Our Brief

Our amicus brief argues, in essence, that the district court  judge failed to adequately consider the interests of authors.  While the commercial publishers in the case did not support CDL, those publishers’ interests do not always align with authors’ and they certainly do not speak for all authors. We conducted outreach to authors, including launching a CDL survey, and uncovered a diversity of views on CDL—most of them extremely positive. We offered up these authors’ perspectives to show the court that many authors do support CDL, contrary to the representations of the publishers. Since copyright is about incentivizing new creation for the benefit of the public and protecting author interests, we felt these views were important for the Second Circuit to hear. 

We also sought to explain how the district court judge got it wrong when it comes to fair use. One of the key findings in the lower court decision was that loans of CDL scans were direct substitutes for loans of licensed ebooks. We explained that this is not the case: a CDL scan is not the same thing as an ebook, they look different and have different functions and features. And CDL scans can be resources for authors conducting research in some key ways that licensed ebooks cannot. Out of print books and older editions of books are often available as CDL scans but not licensed ebooks, for example.

Another issue from the district court opinion that we addressed was the judge’s finding that IA’s use of the works in question was “commercial.” We strongly disagreed with this conclusion: borrowing a CDL scan from IA’s Open Library is free, and the organization—which is also a nonprofit—actually bears a lot of expenses related to digitization. Moreover, the publishers had failed to establish any concrete financial harm they had suffered as a result of IA’s CDL program. We discussed a recent lawsuit in the D.C. Circuit, ASTM v. PRO, to further push back on the district court’s conclusion on commerciality. 

You can read our brief for yourself here, or find it embedded at the bottom of this post. In the new year, you can expect another post or two with more details about our amicus brief and the other amicus briefs that have been, or soon will be, submitted in this case.

What’s Next?

Earlier this week, the publishers proposed that they file their own brief on March 15, 2024—91 days after Internet Archive filed its opening brief. The court’s rules stipulate that any amici supporting the publishers file their briefs within seven days of the publishers’ filing. Then, the parties can decide to submit reply briefs, and will notify the court of their intent to do so. Finally, the parties can choose to request oral argument, though the court might still decide to decide the case “on submission,” i.e., without oral argument. If the case does proceed to oral argument, a three-judge panel will hear from attorneys for each side before rendering their decision. We expect the process to extend into mid-2024, but it can take quite a while for appeals courts to actually hand down their decision. We’ll keep our readers apprised of any updates as the case moves forward.

Authors-Alliance-Second-Circuit-Amicus-Brief_Filed

Authors Alliance Releases New Legal Guide to Writing About Real People

Posted December 5, 2023

We are delighted to announce the publication of our brand new guide, the Authors Alliance Guide to Writing About Real People, a legal guide for authors writing nonfiction works about real people. The guide was written by students in two clinical teams at the UC Berkeley Samuelson Law and Public Policy Clinic—Lily Baggott, Jameson Davis, Tommy Ferdon, Alex Harvey, Emma Lee, and Daniel Todd—as well as clinical supervisors Jennifer Urban and Gabrielle Daley, along with Authors Alliance’s Senior Staff Attorney, Rachel Brooke. The guide was edited by Executive Director Dave Hansen and former Executive Director, Brianna Schofield. This long list of names is a testament to the fact that it took a village to create this guide, and we are so excited to finally share it with our members, allies, and any and all authors who need it. You can read and download our guide here

On Thursday, we are hosting a webinar about our guide, where Authors Alliance staff will share more about what went into producing it, those who partnered with us or supported the guide, and the particulars of the guide’s contents. Sign up here!

The Writing About Real People guide covers several different legal issues that can arise for authors writing about real people in nonfiction books like memoirs, biographies, and other narrative nonfiction projects. The issues it addresses are “causes of action” (or legal theories someone might sue under) based on state law. The requirements and considerations involved vary from state to state, so the guide highlights trends and commonalities among states. Throughout the guide, we emphasize that even though these causes of action might sound scary, the First Amendment to the U.S. Constitution in most cases empowers authors to write freely about topics of their choosing. The causes of action in this guide are exceptions to that rule, and each of them is limited in their reach and scope by the First Amendment’s guarantees. 

False Statements and Portrayals

The first section in the Writing About Real People guide concerns false statements and portrayals. This encompasses two different causes of action: defamation and false light. 

You have probably heard of defamation: it’s one of the most common causes of action related to writing about a real person. Defamation occurs when someone makes a false statement about another person that injures that person’s reputation, when the statement is made with some degree of “fault.” The level of fault required turns on what kind of person the statement is made about. For public people—people with some renown or governmental authority—the speaker must exercise “actual malice,” or reckless disregard as to whether the statement is true. But for private people, a speaker must be negligent as to whether the statement was true, meaning that the speaker failed to take an ordinary amount of care in verifying the veracity of the statement. An author might expose themselves to defamation liability if they write something untrue about another person in their published work that is held up as factual, that statement injures a person’s reputation, and the author failed to take the requisite level of care to ensure that the statement was factual. 

False light is similar to defamation, and many states do not recognize false light since these causes of action are so similar. Where defamation concerns false statements represented as factual, false light concerns false portrayals. It can occur when a speaker creates a misleading impression about a subject, through implication or omission, by example. Like defamation, false light requires fault on the part of the speaker, and the public person/private person standards are the same as for defamation. 

Invasions of Privacy

The second section in the Writing About Real People guide concerns invasions of privacy, or violations of a person’s rights to privacy. This covers two related causes of action: intrusion on seclusion and public disclosure of private facts. 

Intrusion on seclusion occurs when someone intentionally intrudes on another’s private place or affairs in a way that is highly offensive—judged by the perspective of an ordinary, reasonable person. For authors, intrusion on seclusion can arise when an author uses research or information-gathering methods that are invasive. This could include things like entering someone’s home without permission or digging through personal information like health or banking records without permission. Intrusion on seclusion might be an issue for authors during the research and writing stages of their processes, not when the work is actually published, as is the case with other causes of action in this guide.

Public disclosure of private facts occurs when someone makes private facts about a person public, when that disclosure is highly offensive and made with some degree of fault, and when the information disclosed doesn’t relate to a matter of public concern. Essentially, public disclosure of private facts liability exists to address situations where a speaker shares highly private information about a person that the public has no interest in knowing about, and the subject suffers as a result. Like defamation and false light, the level of fault required for a speaker to be liable depends on whether the subject is a public or private person, and these levels are the same as for defamation (actual malice for public people, and negligence for private people). This means that authors have much more leeway to share private information about public people than private people. And the “public concern” piece provides even more protection for speech about public people. 

Right of Publicity and Identity Rights

The third section in the Writing About Real People Guide concerns the right of publicity and unauthorized use of identity. Violations of the right of publicity, or unauthorized uses of identity, can occur when someone uses another person’s identity in a way that is “exploitative” and derives a benefit from that use. Importantly for authors, this excludes merely writing about someone in a book, article, or other piece of writing. The right of publicity is mostly concerned with commercial uses, like using someone’s name or likeness to sell a product without permission, but it can also apply to non-commercial uses that are exploitative, like using someone’s identity to generate attention for a work. In most cases, the right of publicity involves uses of someone’s image or likeness rather than just evoking their identity in text, but this is not necessarily the case. This section might be informative for authors who want to use someone’s image on their book cover or evoke an identity in advertising, but most authors merely writing nonfiction text about a real person do not have to worry too much about the right of publicity. 

Practical Guidance

A final section in our guide covers practical guidance for authors on how to avoid legal liability for the causes of action discussed in the guide in ways that are simple to understand and implement. Using reliable research methods and sources, obtaining consent from subjects where that is practicable, and carefully documenting your research and sources can go a long way towards helping you avoid legal liability while still empowering you to write freely.

Join for the Launch of our Latest Legal Guide: Writing About Real People

Register here to join us on December 7, 2023 at 1pm ET/ 10am PT for the launch of our latest legal guide “Writing about Real People.”

Writing about real people can raise a number of complicated legal issues for authors. Laws governing defamation, privacy, and rights of publicity have a number of fact-specific rules,  exceptions, and exceptions to exceptions that can be difficult to navigate without help. We’ve found that these issues can be an obstacle to creation for all types of authors, from bloggers to narrative nonfiction authors to historians, cultural anthropologists, and other scholarly authors. 

As part of our highly used series of guides on legal issues for authors, Authors Alliance has created a guide to writing about real people for nonfiction authors. This latest guide covers three main legal issues: false statements and portrayals (e.g., defamation), invasions of privacy, and rights of publicity and identity rights. The guide includes substantial practical guidance, addressing issues such as permission, documenting your research and working with an IRB.

Join us on December 7 to learn more about the guide and what it covers, how you might use it in your work, about plans we have for accompanying materials we will release in the near future, such as one-page summaries for quick reference.

Authors Alliance Submits Comment to Copyright Office in Generative AI Notice of Inquiry

Posted November 3, 2023
Photo by erica steeves on Unsplash

We are pleased to announce that we have submitted a comment to the Copyright Office in response to their recent notice of inquiry regarding how copyright law interacts with generative AI. In our comment, we shared our views on copyright and generative AI (which you can read about here) and the stories we heard from authors about how they are using generative AI to support their creative labors, research, and the mundane but important tasks being involved with being a working author. The Office received over 10,000 comments in response to its NOI, showing the high level of interest in how copyright regulates AI-generated works and training data for generative AI. We hope the Office will appreciate our perspective as it considers policy interventions to address copyright issues involved in the use of generative AI by creators. You can read our full comment here, or at the bottom of this post. 

You can hear more about our comment, and about contributions from other commenters, at the Berkeley Center for Law and Technology virtual roundtable on Monday, November 13th, where Authors Alliance senior staff attorney Rachel Brooke will be a panelist. The event is free and open to the public, and you can sign up here. 

Background

Since the Copyright Office issued an opinion letter on copyright in a graphic novel containing AI-generated images back in February, the debate about copyright and generative AI has grown to a near fever pitch. Authors Alliance has been engaged in these issues since the decision letter was released: we exist to support authors who want to leverage the tools available in the digital age to see their creations reach broad audiences and create innovative new works, and we see generative AI systems as one such tool that can support authors and authorship. We participated in the Copyright Office’s listening session on copyright issues in AI-generated textual works this spring, and were eager to further weigh in as the Copyright Office wades through the thorny issues involved. 

In late August, the Copyright Office issued a notice of inquiry, asking stakeholders to weigh in on a series of questions about copyright policy and generative AI. These were broken down into general questions, questions about training AI models, questions about transparency and recordkeeping, and various issues related to AI outputs—copyrightability, infringement, and labeling and identification. 

Our Comment

Our comment was devoted in large part to sharing the ways that authors are using generative AI systems and tools to support their creative labors and research. We heard from authors that used generative AI systems for ideation, late stage editing, and generating text. We also learned that authors are using generative AI systems in ways we wouldn’t have anticipated—like creating books of prompts for other authors to use as inputs for generative AI systems. Generative AI has helped authors who don’t publish with conventional publishers create marketing copy and even generate book covers (despite the common adage, these are pretty important for attracting readers). We also heard from researchers using generative AI for literature reviews as well as to make their writing process more efficient so they can focus on doing the work of researching and innovating. Generative AI also has the potential to lower barriers to entry for scientific researchers who are not native English speakers, but want to make contributions to scientific fields in which literature tends to be written in English. 

We also spent some time explaining our views on why the use of copyrighted materials in training datasets for AI models constitutes fair use and how fair use analysis applies when copyrighted materials are included in training datasets. The use of creative works in training datasets is a transformative one with a different purpose than the works themselves—regardless of whether the institutions that develop and deploy them are commercial or nonprofit. And it’s highly unlikely that a generative AI system could harm the markets for the works in the training sets for the underlying models: a generative AI system is not a substitute for a book a reader is interested in reading, for example. We also explained that the market harm consideration (factor four in fair use analysis) should consider the effect of the use (using training data on AI models) on the market for the specific work in question (i.e., in an infringement action, the work that is alleged to have been infringed), and not the market for that author’s other works, similar works, or anything else.

Our comment also argued that new copyright legislation on AI—either to codify copyright’s human authorship requirement and explain how it applies to AI-generated content or to address other issues related to copyright and generative AI—is not warranted. AI systems, AI models, and the ways creators use them are still evolving. Copyright law is already highly flexible, having adapted to new technologies that weren’t anticipated when the copyright legislation itself was enacted. And legislating around nascent technologies can result in laws that are eventually ill-suited to deal with unexpected challenges that new technologies bring about (recall that the DMCA, which has faced a lot of criticism as a statute intended to regulate copyright online, was passed in 1998). We instead suggest that the Office stick with a “wait and see” approach as generative AI and how we use it continue to develop rather than recommending legislation to Congress. 

Next, we explained why a licensing system for AI works in training data is neither desirable nor practicable. Because we consider the use of copyrighted works in training data to be a fair use, licenses are not necessary in the first place. We also explained the host of problems that either a compulsory licensing regime or a collective licensing scheme would bring about. The large size of datasets for training AI models make it difficult to envision systematically seeking licenses for each and every copyrighted work in the training dataset, and the “orphan works problem” means that a majority of rightsholders might not be able to be found. It’s also not clear who would administer licensing under a licensing regime, and we could not think of any appropriate party that exists or is likely to emerge. The Office’s past failed investigations into possible collective rights management organizations (or CMOs) only underscore this point. 

Finally, we echoed our support for the substantial similarity test as a way to handle generative AI outputs that look very similar to existing copyrighted works. The substantial similarity test has been around for decades and has been applied across the country in a variety of contexts. It seems to us to be a good way to approach the rare cases in which generative AI outputs are strikingly similar to copyrighted works (so-called “memorization”) such that a rightsholder might sue for infringement. 

What’s Next?

The same day we submitted our comment, the Biden Administration released an executive order on “Safe, Secure, and Trustworthy Artificial Intelligence,” directing federal agencies to take a variety of measures to ensure that the use of generative AI is not harmful to innovation, privacy, labor, and more. Then on Wednesday, representatives from a coalition of countries (including the U.S.) signed “The Bletchley Declaration” following an AI Safety Summit in the U.K., warning of the dangers of generative AI and pledging to work together to find solutions. All of this is to say that how public policy should regulate generative AI, and whether and how the law needs to change to accommodate it, is a live issue that continues to evolve every day. Dozens of lawsuits are pending about the interaction between copyright and the use of generative AI systems, and as these cases move through the courts, judges will have their opportunity to weigh in. As ever, we will keep our readers and members appraised in any new legal developments around copyright and generative AI. 

COLC-2023-0006-8976_attachment_1

Copyright Office Recommends Renewal of the Existing Text Data Mining Exemptions for Literary Works and Films

Posted October 19, 2023
Photo by Tim Mossholder on Unsplash

Authors Alliance is delighted to announce that the Copyright Office has recommended that the Librarian of Congress renew both of the exemptions to DMCA liability for text and data mining in its Notice of Proposed Rulemaking for this year’s DMCA exemptions, released today. While the Librarian of Congress could technically disagree with the recommendation to renew, this rarely if ever happens in practice. 

Renewal Petitions and Recommendations

Authors Alliance petitioned the Office to renew the exemptions in July, along with our co-petitioners the American Association of University Professors and the Library Copyright Alliance. Then, the Office entertained comments from stakeholders and the public at large who wished to make statements in support of or in opposition to renewal of the existing exemptions, before drawing conclusions about renewal in today’s notice. 

The Office did not receive any comments arguing against renewal of the TDM exemption for literary works distributed electronically; our petition was unopposed. The Office agreed with Authors Alliance and our co-petitioners, ARL and AAUP, observing that “researchers are actively relying on the current exemption” and citing to an example of such research that we highlighted in our petition. Apparently agreeing with our statement that there have not been “material changes in facts, law, technology, or other circumstances” since the 1201 rulemaking cycle when the exemption was originally obtained, the Office stated it intended to recommend that the exemption be renewed. 

Our renewal petition for the text and data mining exemption for motion pictures, which is identical to the literary works exemption in all aspects but the type of works involved, did receive one opposition comment, but the Copyright Office found that it did not meet the standard for meaningful opposition, and recommended renewal. DVD CCA (the DVD Copyright Control Association) and AACS LA (the Advanced Access Content System Licensing Administrator) submitted a joint comment arguing that a statement in our petition indicated that there had been a change in the facts surrounding the exemption. More specifically, they argued that our statement that “[c]ommercially licensed text and data mining products continue to be made available to research institutions” constituted an admission that new licensed databases motion pictures had emerged since the previous rulemaking. DVD CCA and AACS LA did not actually offer any evidence of the emergence of new licensed databases for motion pictures. We believed this opposition comment was without merit—while licensed databases for text and data mining of audiovisual works are not as prevalent as licensed databases for text and data mining of text-based works, some were available during the 2021 rulemaking, and continue to be available today. We are pleased that the Office agreed, citing to the previous rulemaking record as supporting evidence.

Expansions and Next Steps

In addition to requesting that the Office renew the current exemptions, we (along with AAUP and LCA) also requested that the Office consider expanding these exemptions to enhance a researcher’s ability to share their corpus with other researchers that are not their direct collaborators. The two processes run in parallel, and today’s announcement means that even if we do not ultimately obtain expanded exemptions, the existing exemptions are very likely to be renewed. 

In its NPRM, the Office also announced deadlines for the various submissions that petitions for expansions and new exemptions will require. The first round of comments in support of  our proposed expansion—including documentary evidence from researchers who are being adversely affected by the limited sharing permitted under the existing exemptions—will be due December 22nd. Opposition comments are due February 20, 2024. Reply comments to these opposition comments are then due March 24, 2024. Then, later in the spring, there will be a hearing with the Copyright Office regarding our proposed expansion. We will—as always—keep our readers apprised as the process moves forward. 

Call to Action: Share your Experiences with Generative AI!

Posted October 9, 2023
Photo by Patrick Fore on Unsplash

Authors Alliance is currently at work on a submission to the Copyright Office regarding our views on generative AI (which you can read about here). If you’re an author who has used generative AI in your research or writing, we’d love to hear from you! Please reach out to Rachel Brooke, Authors Alliance Senior Staff Attorney, at rachel@authorsalliance.org.

Analysis: ASTM v. PRO

Posted September 28, 2023
Photo by John Schnobrich on Unsplash

Last week, the Court of Appeals for the D.C. Circuit released its opinion in the American Society for Testing and Medical Materials v. Public.Resource.org (“ASTM v. PRO”), an important fair use case that has been percolating in the D.C. Circuit for the past few years. Authors Alliance filed an amicus brief in the case in support of Public Resource, along with the Library Futures Institute, the EveryLibrary Institute, and Public Knowledge. The case is about public access to the law and the role of fair use in safeguarding that access, but it also has big implications for the ever-evolving doctrine of fair use. In general, we applaud the decision, which found for Public Resource, affirming the importance of access to the law and the important role that the fair use doctrine plays within copyright law. In today’s post, we summarize the case and offer our thoughts about what it might mean for fair use going forward, particularly regarding cases that impact our members and their interests.

Background

The case concerns standard-developing organizations and public access to the standards they produce. These organizations set standards and best practices for “particular industries, products, or problems,” including fire prevention and medical testing, among others. These standards are often incorporated into laws and regulations that govern these industries by various federal, state, and local lawmaking bodies. Government agencies incorporate these standards into law “by reference” when they refer to them in a given regulation, without reproducing the standards verbatim. For example, a federal regulation governing shipyard operators requires them to “select, maintain, and test portable fire extinguishers” in accordance with a particular National Fire Protection Association standard, but that regulation does not reproduce the standard itself. 

Public.Resource.org, a nonprofit organization that disseminates legal materials by posting them publicly online, posted on its website “hundreds of incorporated standards—including standards produced and copyrighted by the plaintiffs.” Then, in 2013, the standard-developing organizations sued for copyright infringement. Public Resource defended its posting of the standards as a fair use, but the lower court disagreed, requiring Public Resource to take the posted standards at issue down. After appeals, further fact development and multiple hearings at both the district court and appellate court level, the district court ultimately found Public Resource’s posting of the standards which were incorporated into law to be fair use. The standard-developing organizations appealed to the appeals court, which released its decision on September 12th.

Our Amicus Brief

In our amicus brief, we argued that “when a law-making body incorporates a standard by reference into legally-binding rule or regulation, the contents of the whole of that publication must be freely and fully accessible by the public.” Public access to the law is crucial for an informed citizenry and well-functioning democracy, which is why more conventional legal materials—like statutes, regulations, court cases, and agency rulemakings—have long been freely available to the public, online or otherwise. This principle ought to extend to legal standards that are incorporated by reference into law, despite the fact that private organizations create these standards, because incorporation by reference essentially gives them the force of law. We emphasized the potential harm to researchers and librarians were public access to standards incorporated by reference into law restricted. 

In fact, our brief argues that these standards should not be afforded copyright protection at all. Allowing private organizations to claim copyright in what is effectively the law does not serve the core purpose of copyright—to incentivize new creation for the benefit of the public. Materials authored by the federal government are automatically a part of the public domain, which also supports the important principle that no one can own the law—an idea which is enshrined in our Constitution and court cases dating to the 19th century. Due process—a Constitutional principle requiring the legal rights of all persons to be respected—mandates this kind of access, and it is often painted as one that is “beyond question.” While the standard-setting organizations have online “reading rooms” where the public can access the standards in question, this requires users to register, provide personal information, and agree to lengthy terms of service. As we explain in our brief, this is not sufficient for the free public access that the law requires.

The Decision

In its decision, the court determined that Public Resource’s posting of the standards that were incorporated by reference into law was a fair use, holding that three out of the four fair use factors favored a finding of fair use. While the court did not hold that the standards incorporated by reference into law were free from copyright protection, it did affirm the legal and policy justifications for free public access to the law. 

The first fair use factor, the purpose and character of the use, weighed in favor of Public Resource. On this point, the court emphasized that “Public Resource’s use is for nonprofit, educational purposes.” The question of whether a use is commercial can impact the way a court views this factor, as can the degree to which a court finds the use to be “transformative.” The court similarly found that Public Resource’s use was transformative, in that it was new and different from the purpose of the works themselves. Unlike the purposes of the original standards developed by the organizations—to promulgate best practices for industries and problems in the interest of industries and consumers—Public Resource’s purpose was to share with the public “only what the law is, not what industry groups may regard as current best practices.” The court summarized: “Public Resource’s message (‘this is the law’) is very different from the plaintiffs’ message (‘these are current best practices for the engineering of buildings and products’).” 

The second fair use factor directs courts to consider the nature of the copyrighted work—in this case, the standards that were incorporated by reference into law. The court found that this factor strongly favored a finding of fair use. The further a work from the “core of intended copyright protection,” i.e., the more creative it is, the more this factor favors fair use. In other words, because the standards at issue were highly factual in nature, rather than creative (like fiction writing), the second factor weighed in favor of fair use. 

The third fair use factor considers the amount and substantiality of the portion of the original work that was used, asking whether the portion of the work that was used is reasonable in light of the purpose of the secondary user’s use. The court found that this factor also weighed in favor of fair use. The various standards promulgated by the standard-setting organizations tended to be much longer in their entirety than the portions that were incorporated by reference into law. Public Resource only posted the portions of these standards that were incorporated into law, which was of course reasonably in light of its purpose of educating the public about what the law is. 

The fourth fair use factor considers the effect of the use on the market for the copyrighted works, and the court found that this factor was, on balance, neutral, and “[did] not significantly tip the balance one way or the other.” The standard setting organizations argued that their customers—industry members that needed to understand best practices—would fail to pay for the standards if they could obtain them for free from Public Resource. The court pointed out that only the standards incorporated into law were at issue, and the most up-to-date standards relied on by these industries were not necessarily incorporated into law. Moreover, the standard-setting organizations could not actually produce any evidence of market harm, despite the fact that Public Resource had been posting them online for approximately 15 years. The court also indicated that the public benefit of sharing this information with the public had to be balanced against any potential market harm. But because there was a possibility that Public Resource’s online posting could have lowered demand for the standards, the court found that this factor was neutral.

Impact on the Fair Use Doctrine

It remains to be seen how this case will impact the fair use doctrine and fair use decisions going forward, but it seems quite likely that this new judicial precedent might make a difference in future fair use decisions.

First, the contours of factor one—the purpose and character of the use—are very much a live issue following the recent decision in Warhol Foundation v. Goldsmith. In that case (in which we also submitted an amicus brief, supporting the Warhol Foundation’s fair use argument), the Supreme Court emphasized the fact that Warhol’s use was commercial in finding the use not to be fair. It seemed to emphasize commerciality over “transformativeness,” a longstanding aspect of factor one analysis (though that court found the use to not be transformative). The court in ASTM v. PRO certainly discussed commerciality as part of factor one, emphasizing Public Resource’s nonprofit status. But regarding the question of transformativeness, the court also gave a lengthy and eloquent summary of the different purposes of the two uses, indicating that transformativeness is still an important inquiry, and is not necessarily secondary to commerciality.

The weight of commerciality in factor one analysis can make a big difference in the outcome of cases, and it is an issue many have been watching with the dearth of copyright lawsuits concerning the use of copyrighted works to train generative AI models. This is because while there is a strong argument that the use of training data for these models is highly transformative, it is also true that the companies behind many of the models—like OpenAI, Midjourney, and Stability AI—are commercial in nature, and monetize their programs in different ways. The recent ASTM v. PRO decision could affect how courts weigh the commerciality of these companies’ uses of copyrighted training data against the extent to which the uses are transformative, potentially tipping the scale towards fair use in the upcoming copyright lawsuits about generative AI and training data. 

Second, the question of market harm in factor four can be a complicated one, and this case may provide some guidance for courts going forward. This issue was animated in the recent decision in Hachette Books v. Internet Archive—the case about whether controlled digital lending is a fair use, which we have been covering and involved in for years now, notably as an amicus in support of the Internet Archive. In the Hachette decision, the judge found that factor four weighed in favor of the publishers without direct evidence of financial harm, based on the idea that CDL scans could be substitutes for licensed ebooks. But in ASTM v. PRO, the court was skeptical that an allegation of potential market harm, without actual evidence, was sufficiently convincing. Since Hachette has been appealed and will soon be before the Second Circuit, we are hopeful that ASTM v. PRO will be a useful precedent for those judges. Extending the logic of ASTM v. PRO, it may be that the publishers will need to demonstrate market harm with tangible evidence (such as concrete evidence of lost sales) in that case in order to prevail on factor four.

An Open Letter Regarding Copyright Reform on Behalf of South African Authors

Posted September 25, 2023
Photo by Jacques Nel on Unsplash

Today we are very pleased to share an open letter regarding copyright reform on behalf of South African authors. The letter is available here and is also available as a PDF (with names as of today) here.

The letter comes at a critical decision making moment for South Africa’s Copyright Amendment Bill which has been debated for years (read more here and here on our views). We believe it is important for lawmakers to hear from authors who support this bill, and in particular hear from us about why we view its fair use provisions and author remuneration provisions so positively.

We welcome other South African authors to add their names to the letter to express their support. You can do so by completing this form.

An-open-letter-regarding-copyright-reform-on-behalf-of-South-African-Authors-FINAL

Assessing the U.S. Copyright Small Claims Court After One Year

Posted September 18, 2023

Authors Alliance members will recall the series of posts we’ve made about the United States’s new copyright small claims court. The below is a post by Dave Hansen and Authors Alliance member Katie Fortney, based on a forthcoming article we recently posted assessing how this court has fared in its first year of operations. This post was originally published on the Kluwer Copyright Blog.

In June 2023 the U.S. Copyright Office celebrated the one-year anniversary of operations of the Copyright Claims Board (“CCB”), a novel new small claims court housed within the agency with a budget request for $2.2 million in ongoing yearly costs. Though not entirely unique (e.g., the UK’s IP Enterprise court has been described as filling a similar role since 2012), the CCB has been closely watched and hotly debated (see here, here, and here).

The CCB was preceded by years of argument about the benefits and risks of such a small claims court.  Proponents argued that the CCB would offer rightsholders a low-cost, efficient alternative to litigation in federal courts (which can easily cost over $100,000 to litigate), allowing small creators to more effectively defend their rights. Opponents feared that the CCB would foster abuse, encouraging frivolous lawsuits while creating a trap for unwary defendants.

We set out to assess these arguments in light of data on the CCB’s first year of operation, which is explored in more detail in our article here, forthcoming in the Journal of the Copyright Society of the USA, and the data used for this article available here. The post summarizes from that article, which is itself based on an empirical review of the CCB’s first year of operations using data extracted from the CCB’s online filing system for the 487 claims filed with the court between June 2022 and June 2023.

How the CCB Works

To assess the work of the CCB, it’s first important to understand how the new court works. For claimants to successfully pursue a claim, they must first pass three hurdles:

  • their claim must be compliant, which means that it must include some key information regarding, e.g., ownership of a copyright, access to the work by the respondent in order to copy it, and substantial similarity between the allegedly infringing copy and the original;
  • their claim must also be properly served or delivered to the respondent, following the specific procedures that the Copyright Office has established;
  • the claimant must wait 60 days to see if the respondent decides to opt-out of the proceedings (in which case the claimant can refile in the more expensive, but more robust federal district court).

Once the opt-out window has passed, the proceeding becomes “active” and a scheduling order is issued. Then the parties can engage in discovery, have hearings and conferences, and eventually receive a final determination where the CCB may award damages.

CCB By the Numbers

In the first year of the CCB 487 claims were filed. However, only 43 of these 487 claims–less than 9%–had been issued scheduling orders and made it to the active phase by June 15, 2023.

Meanwhile, 302 cases had been closed, most of them dismissed without prejudice (meaning the case did not reach the merits and the claimant could choose to file again). The remaining claims were either awaiting review by the CCB, or waiting for an action from the claimant like filing an amended claim or filing proof of service.

Though the CCB gives claimants multiple opportunities to amend their complaint to fix problems with it (even offering detailed and helpful suggestions on how to fix those problems), over 150 claims were dismissed because the claimant did not file a proper claim. Failure to state facts sufficient to support Access and Substantial Similarity were common problems, showing up about 110 times each in CCB orders to amend (sometimes in the same order to amend). In some cases, however, there was no way to fix the complaint. For example, 35 claims were trying to pursue cases against foreign respondents, over whom the CCB has no jurisdiction. And over 100 claims were copyright infringement claims where the claimant hadn’t filed for copyright registration of the work allegedly infringed (a prerequisite to filing).

Claimants also had problems with service: 60 claims were dismissed in the first year because claimants didn’t file documentation showing that they’d accomplished valid proof of service. Finally, opt-out (which some proponents of the CCB feared would undermine the court) is an important but much smaller pathway out of the CCB: it accounted for 35 dismissals.

Perhaps because copyright is technical and complicated, it may not be surprising to find that having a lawyer helps avoid dismissal:  90% of claims from represented claimants had been certified as compliant; for claims from self-represented claimants, only 46% were compliant. Unregistered claimants account for over 70% of claims filed, but only 40% of those that make it to the active phase.

Looking more closely at the claimants themselves, we do see that the CCB system is being used by aggressive and prolific copyright litigants, but we haven’t seen the volume of copyright-troll litigation seen in the past in federal district courts.This may be in part because the Copyright Office took these concerns seriously and created rules to discourage it, such as limiting the number of claims a plaintiff can file within one year. The number of repeat filers was low – only nine filers had more than five claims. Those include, however, 17 claims filed by Higbee and Associates (sometimes referred to as a “troll” though the label may not exactly fit), and 20 by David C. Deal (another known and aggressive serial copyright litigant). And the only case in which the CCB had issued an order was in favor of David Oppenheimer, who has separately filed more than 170 copyright suits in federal courts.

Because the process has been so slow, it’s difficult to evaluate how the CCB is working for respondents. Opponents of the CCB feared that its ability to make default determinations (issuing monetary awards when the respondent never shows up) could be a trap for the unwary. The CCB has issued only two such determinations so far (both in August 2023, for $3000 each), and only one final determination that wasn’t the result of a default, withdrawal, or settlement. So, it’s too early to tell how common defaults will be. However, they will continue to be an issue to watch: in the first year, respondents were as likely to end up on the path to default as they were to participate in a proceeding.

Our Takeaways and Conclusion

On the one hand, we haven’t seen rampant abuse of the system. To be sure, serial copyright litigants are actively using the CCB, but in numbers far fewer than previously seen even in federal district court. And damage awards have been modest.

However, it also seems that the CCB has not achieved its promised efficiency for small litigants–for most claimants the system seems to be too complicated and slow, with the CCB only issuing a final determination in a single case in its entire first year, and the vast majority of claims dismissed for failure to adequately comply with CCB rules. The CCB has already gone to great lengths to explain the process and to help claimants correct errors early in the process. It may be hard for the CCB to adjust its rules to lower barriers unless it is willing to sacrifice basic procedural safeguards for respondents (something we think it should not do). Despite the hope of advocates and legislators and the admirable efforts of those working at the CCB, the early results lead us to think that it may just be that complex copyright disputes are ill-suited for a self-service small claims tribunal.