We are pleased to announce that we have submitted a comment to the Copyright Office in response to their recent notice of inquiry regarding how copyright law interacts with generative AI. In our comment, we shared our views on copyright and generative AI (which you can read about here) and the stories we heard from authors about how they are using generative AI to support their creative labors, research, and the mundane but important tasks being involved with being a working author. The Office received over 10,000 comments in response to its NOI, showing the high level of interest in how copyright regulates AI-generated works and training data for generative AI. We hope the Office will appreciate our perspective as it considers policy interventions to address copyright issues involved in the use of generative AI by creators. You can read our full comment here, or at the bottom of this post.
You can hear more about our comment, and about contributions from other commenters, at the Berkeley Center for Law and Technology virtual roundtable on Monday, November 13th, where Authors Alliance senior staff attorney Rachel Brooke will be a panelist. The event is free and open to the public, and you can sign up here.
Background
Since the Copyright Office issued an opinion letter on copyright in a graphic novel containing AI-generated images back in February, the debate about copyright and generative AI has grown to a near fever pitch. Authors Alliance has been engaged in these issues since the decision letter was released: we exist to support authors who want to leverage the tools available in the digital age to see their creations reach broad audiences and create innovative new works, and we see generative AI systems as one such tool that can support authors and authorship. We participated in the Copyright Office’s listening session on copyright issues in AI-generated textual works this spring, and were eager to further weigh in as the Copyright Office wades through the thorny issues involved.
In late August, the Copyright Office issued a notice of inquiry, asking stakeholders to weigh in on a series of questions about copyright policy and generative AI. These were broken down into general questions, questions about training AI models, questions about transparency and recordkeeping, and various issues related to AI outputs—copyrightability, infringement, and labeling and identification.
Our Comment
Our comment was devoted in large part to sharing the ways that authors are using generative AI systems and tools to support their creative labors and research. We heard from authors that used generative AI systems for ideation, late stage editing, and generating text. We also learned that authors are using generative AI systems in ways we wouldn’t have anticipated—like creating books of prompts for other authors to use as inputs for generative AI systems. Generative AI has helped authors who don’t publish with conventional publishers create marketing copy and even generate book covers (despite the common adage, these are pretty important for attracting readers). We also heard from researchers using generative AI for literature reviews as well as to make their writing process more efficient so they can focus on doing the work of researching and innovating. Generative AI also has the potential to lower barriers to entry for scientific researchers who are not native English speakers, but want to make contributions to scientific fields in which literature tends to be written in English.
We also spent some time explaining our views on why the use of copyrighted materials in training datasets for AI models constitutes fair use and how fair use analysis applies when copyrighted materials are included in training datasets. The use of creative works in training datasets is a transformative one with a different purpose than the works themselves—regardless of whether the institutions that develop and deploy them are commercial or nonprofit. And it’s highly unlikely that a generative AI system could harm the markets for the works in the training sets for the underlying models: a generative AI system is not a substitute for a book a reader is interested in reading, for example. We also explained that the market harm consideration (factor four in fair use analysis) should consider the effect of the use (using training data on AI models) on the market for the specific work in question (i.e., in an infringement action, the work that is alleged to have been infringed), and not the market for that author’s other works, similar works, or anything else.
Our comment also argued that new copyright legislation on AI—either to codify copyright’s human authorship requirement and explain how it applies to AI-generated content or to address other issues related to copyright and generative AI—is not warranted. AI systems, AI models, and the ways creators use them are still evolving. Copyright law is already highly flexible, having adapted to new technologies that weren’t anticipated when the copyright legislation itself was enacted. And legislating around nascent technologies can result in laws that are eventually ill-suited to deal with unexpected challenges that new technologies bring about (recall that the DMCA, which has faced a lot of criticism as a statute intended to regulate copyright online, was passed in 1998). We instead suggest that the Office stick with a “wait and see” approach as generative AI and how we use it continue to develop rather than recommending legislation to Congress.
Next, we explained why a licensing system for AI works in training data is neither desirable nor practicable. Because we consider the use of copyrighted works in training data to be a fair use, licenses are not necessary in the first place. We also explained the host of problems that either a compulsory licensing regime or a collective licensing scheme would bring about. The large size of datasets for training AI models make it difficult to envision systematically seeking licenses for each and every copyrighted work in the training dataset, and the “orphan works problem” means that a majority of rightsholders might not be able to be found. It’s also not clear who would administer licensing under a licensing regime, and we could not think of any appropriate party that exists or is likely to emerge. The Office’s past failed investigations into possible collective rights management organizations (or CMOs) only underscore this point.
Finally, we echoed our support for the substantial similarity test as a way to handle generative AI outputs that look very similar to existing copyrighted works. The substantial similarity test has been around for decades and has been applied across the country in a variety of contexts. It seems to us to be a good way to approach the rare cases in which generative AI outputs are strikingly similar to copyrighted works (so-called “memorization”) such that a rightsholder might sue for infringement.
What’s Next?
The same day we submitted our comment, the Biden Administration released an executive order on “Safe, Secure, and Trustworthy Artificial Intelligence,” directing federal agencies to take a variety of measures to ensure that the use of generative AI is not harmful to innovation, privacy, labor, and more. Then on Wednesday, representatives from a coalition of countries (including the U.S.) signed “The Bletchley Declaration” following an AI Safety Summit in the U.K., warning of the dangers of generative AI and pledging to work together to find solutions. All of this is to say that how public policy should regulate generative AI, and whether and how the law needs to change to accommodate it, is a live issue that continues to evolve every day. Dozens of lawsuits are pending about the interaction between copyright and the use of generative AI systems, and as these cases move through the courts, judges will have their opportunity to weigh in. As ever, we will keep our readers and members appraised in any new legal developments around copyright and generative AI.
When the tech platforms promised a future of “connection,” they were lying. They said their “walled gardens” would keep us safe, but those were prison walls.
The platforms locked us into their systems and made us easy pickings, ripe for extraction. Twitter, Facebook and other Big Tech platforms hard to leave by design. They hold hostage the people we love, the communities that matter to us, the audiences and customers we rely on. The impossibility of staying connected to these people after you delete your account has nothing to do with technological limitations: it’s a business strategy in service to commodifying your personal life and relationships.
We can – we must – dismantle the tech platforms. In The Internet Con, Cory Doctorow explains how to seize the means of computation, by forcing Silicon Valley to do the thing it fears most: interoperate. Interoperability will tear down the walls between technologies, allowing users leave platforms, remix their media, and reconfigure their devices without corporate permission.
Interoperability is the only route to the rapid and enduring annihilation of the platforms. The Internet Con is the disassembly manual we need to take back our internet.
ABOUT THE AUTHOR CORY DOCTOROW is a science fiction author, activist and journalist. He is the author of many books, most recently RADICALIZED and WALKAWAY, science fiction for adults; HOW TO DESTROY SURVEILLANCE CAPITALISM, nonfiction about monopoly and conspiracy; IN REAL LIFE, a graphic novel; and the picture book POESY THE MONSTER SLAYER. His latest book is ATTACK SURFACE, a standalone adult sequel to LITTLE BROTHER. In 2020, he was inducted into the Canadian Science Fiction and Fantasy Hall of Fame. He works for the Electronic Frontier Foundation, is a MIT Media Lab Research Affiliate, is a Visiting Professor of Computer Science at Open University, a Visiting Professor of Practice at the University of North Carolina’s School of Library and Information Science and co-founded the UK Open Rights Group.
Book Talk: The Internet Con by Cory Doctorow Tuesday, October 31 @ 10am PT / 1pm ET Register now for the virtual discussion!
Authors Alliance is delighted to announce that the Copyright Office has recommended that the Librarian of Congress renew both of the exemptions to DMCA liability for text and data mining in its Notice of Proposed Rulemaking for this year’s DMCA exemptions, released today. While the Librarian of Congress could technically disagree with the recommendation to renew, this rarely if ever happens in practice.
Renewal Petitions and Recommendations
Authors Alliance petitioned the Office to renew the exemptions in July, along with our co-petitioners the American Association of University Professors and the Library Copyright Alliance. Then, the Office entertained comments from stakeholders and the public at large who wished to make statements in support of or in opposition to renewal of the existing exemptions, before drawing conclusions about renewal in today’s notice.
The Office did not receive any comments arguing against renewal of the TDM exemption for literary works distributed electronically; our petition was unopposed. The Office agreed with Authors Alliance and our co-petitioners, ARL and AAUP, observing that “researchers are actively relying on the current exemption” and citing to an example of such research that we highlighted in our petition. Apparently agreeing with our statement that there have not been “material changes in facts, law, technology, or other circumstances” since the 1201 rulemaking cycle when the exemption was originally obtained, the Office stated it intended to recommend that the exemption be renewed.
Our renewal petition for the text and data mining exemption for motion pictures, which is identical to the literary works exemption in all aspects but the type of works involved, did receive one opposition comment, but the Copyright Office found that it did not meet the standard for meaningful opposition, and recommended renewal. DVD CCA (the DVD Copyright Control Association) and AACS LA (the Advanced Access Content System Licensing Administrator) submitted a joint comment arguing that a statement in our petition indicated that there had been a change in the facts surrounding the exemption. More specifically, they argued that our statement that “[c]ommercially licensed text and data mining products continue to be made available to research institutions” constituted an admission that new licensed databases motion pictures had emerged since the previous rulemaking. DVD CCA and AACS LA did not actually offer any evidence of the emergence of new licensed databases for motion pictures. We believed this opposition comment was without merit—while licensed databases for text and data mining of audiovisual works are not as prevalent as licensed databases for text and data mining of text-based works, some were available during the 2021 rulemaking, and continue to be available today. We are pleased that the Office agreed, citing to the previous rulemaking record as supporting evidence.
Expansions and Next Steps
In addition to requesting that the Office renew the current exemptions, we (along with AAUP and LCA) also requested that the Office consider expanding these exemptions to enhance a researcher’s ability to share their corpus with other researchers that are not their direct collaborators. The two processes run in parallel, and today’s announcement means that even if we do not ultimately obtain expanded exemptions, the existing exemptions are very likely to be renewed.
In its NPRM, the Office also announced deadlines for the various submissions that petitions for expansions and new exemptions will require. The first round of comments in support of our proposed expansion—including documentary evidence from researchers who are being adversely affected by the limited sharing permitted under the existing exemptions—will be due December 22nd. Opposition comments are due February 20, 2024. Reply comments to these opposition comments are then due March 24, 2024. Then, later in the spring, there will be a hearing with the Copyright Office regarding our proposed expansion. We will—as always—keep our readers apprised as the process moves forward.
Authors Alliance is currently at work on a submission to the Copyright Office regarding our views on generative AI (which you can read about here). If you’re an author who has used generative AI in your research or writing, we’d love to hear from you! Please reach out to Rachel Brooke, Authors Alliance Senior Staff Attorney, at rachel@authorsalliance.org.
Last week, the Court of Appeals for the D.C. Circuit released its opinion in the American Society for Testing and Medical Materials v. Public.Resource.org (“ASTM v. PRO”), an important fair use case that has been percolating in the D.C. Circuit for the past few years. Authors Alliance filed an amicus brief in the case in support of Public Resource, along with the Library Futures Institute, the EveryLibrary Institute, and Public Knowledge. The case is about public access to the law and the role of fair use in safeguarding that access, but it also has big implications for the ever-evolving doctrine of fair use. In general, we applaud the decision, which found for Public Resource, affirming the importance of access to the law and the important role that the fair use doctrine plays within copyright law. In today’s post, we summarize the case and offer our thoughts about what it might mean for fair use going forward, particularly regarding cases that impact our members and their interests.
Background
The case concerns standard-developing organizations and public access to the standards they produce. These organizations set standards and best practices for “particular industries, products, or problems,” including fire prevention and medical testing, among others. These standards are often incorporated into laws and regulations that govern these industries by various federal, state, and local lawmaking bodies. Government agencies incorporate these standards into law “by reference” when they refer to them in a given regulation, without reproducing the standards verbatim. For example, a federal regulation governing shipyard operators requires them to “select, maintain, and test portable fire extinguishers” in accordance with a particular National Fire Protection Association standard, but that regulation does not reproduce the standard itself.
Public.Resource.org, a nonprofit organization that disseminates legal materials by posting them publicly online, posted on its website “hundreds of incorporated standards—including standards produced and copyrighted by the plaintiffs.” Then, in 2013, the standard-developing organizations sued for copyright infringement. Public Resource defended its posting of the standards as a fair use, but the lower court disagreed, requiring Public Resource to take the posted standards at issue down. After appeals, further fact development and multiple hearings at both the district court and appellate court level, the district court ultimately found Public Resource’s posting of the standards which were incorporated into law to be fair use. The standard-developing organizations appealed to the appeals court, which released its decision on September 12th.
Our Amicus Brief
In our amicus brief, we argued that “when a law-making body incorporates a standard by reference into legally-binding rule or regulation, the contents of the whole of that publication must be freely and fully accessible by the public.” Public access to the law is crucial for an informed citizenry and well-functioning democracy, which is why more conventional legal materials—like statutes, regulations, court cases, and agency rulemakings—have long been freely available to the public, online or otherwise. This principle ought to extend to legal standards that are incorporated by reference into law, despite the fact that private organizations create these standards, because incorporation by reference essentially gives them the force of law. We emphasized the potential harm to researchers and librarians were public access to standards incorporated by reference into law restricted.
In fact, our brief argues that these standards should not be afforded copyright protection at all. Allowing private organizations to claim copyright in what is effectively the law does not serve the core purpose of copyright—to incentivize new creation for the benefit of the public. Materials authored by the federal government are automatically a part of the public domain, which also supports the important principle that no one can own the law—an idea which is enshrined in our Constitution and court cases dating to the 19th century. Due process—a Constitutional principle requiring the legal rights of all persons to be respected—mandates this kind of access, and it is often painted as one that is “beyond question.” While the standard-setting organizations have online “reading rooms” where the public can access the standards in question, this requires users to register, provide personal information, and agree to lengthy terms of service. As we explain in our brief, this is not sufficient for the free public access that the law requires.
The Decision
In its decision, the court determined that Public Resource’s posting of the standards that were incorporated by reference into law was a fair use, holding that three out of the four fair use factors favored a finding of fair use. While the court did not hold that the standards incorporated by reference into law were free from copyright protection, it did affirm the legal and policy justifications for free public access to the law.
The first fair use factor, the purpose and character of the use, weighed in favor of Public Resource. On this point, the court emphasized that “Public Resource’s use is for nonprofit, educational purposes.” The question of whether a use is commercial can impact the way a court views this factor, as can the degree to which a court finds the use to be “transformative.” The court similarly found that Public Resource’s use was transformative, in that it was new and different from the purpose of the works themselves. Unlike the purposes of the original standards developed by the organizations—to promulgate best practices for industries and problems in the interest of industries and consumers—Public Resource’s purpose was to share with the public “only what the law is, not what industry groups may regard as current best practices.” The court summarized: “Public Resource’s message (‘this is the law’) is very different from the plaintiffs’ message (‘these are current best practices for the engineering of buildings and products’).”
The second fair use factor directs courts to consider the nature of the copyrighted work—in this case, the standards that were incorporated by reference into law. The court found that this factor strongly favored a finding of fair use. The further a work from the “core of intended copyright protection,” i.e., the more creative it is, the more this factor favors fair use. In other words, because the standards at issue were highly factual in nature, rather than creative (like fiction writing), the second factor weighed in favor of fair use.
The third fair use factor considers the amount and substantiality of the portion of the original work that was used, asking whether the portion of the work that was used is reasonable in light of the purpose of the secondary user’s use. The court found that this factor also weighed in favor of fair use. The various standards promulgated by the standard-setting organizations tended to be much longer in their entirety than the portions that were incorporated by reference into law. Public Resource only posted the portions of these standards that were incorporated into law, which was of course reasonably in light of its purpose of educating the public about what the law is.
The fourth fair use factor considers the effect of the use on the market for the copyrighted works, and the court found that this factor was, on balance, neutral, and “[did] not significantly tip the balance one way or the other.” The standard setting organizations argued that their customers—industry members that needed to understand best practices—would fail to pay for the standards if they could obtain them for free from Public Resource. The court pointed out that only the standards incorporated into law were at issue, and the most up-to-date standards relied on by these industries were not necessarily incorporated into law. Moreover, the standard-setting organizations could not actually produce any evidence of market harm, despite the fact that Public Resource had been posting them online for approximately 15 years. The court also indicated that the public benefit of sharing this information with the public had to be balanced against any potential market harm. But because there was a possibility that Public Resource’s online posting could have lowered demand for the standards, the court found that this factor was neutral.
Impact on the Fair Use Doctrine
It remains to be seen how this case will impact the fair use doctrine and fair use decisions going forward, but it seems quite likely that this new judicial precedent might make a difference in future fair use decisions.
First, the contours of factor one—the purpose and character of the use—are very much a live issue following the recent decision in Warhol Foundation v. Goldsmith. In that case (in which we also submitted an amicus brief, supporting the Warhol Foundation’s fair use argument), the Supreme Court emphasized the fact that Warhol’s use was commercial in finding the use not to be fair. It seemed to emphasize commerciality over “transformativeness,” a longstanding aspect of factor one analysis (though that court found the use to not be transformative). The court in ASTM v. PRO certainly discussed commerciality as part of factor one, emphasizing Public Resource’s nonprofit status. But regarding the question of transformativeness, the court also gave a lengthy and eloquent summary of the different purposes of the two uses, indicating that transformativeness is still an important inquiry, and is not necessarily secondary to commerciality.
The weight of commerciality in factor one analysis can make a big difference in the outcome of cases, and it is an issue many have been watching with the dearth of copyright lawsuits concerning the use of copyrighted works to train generative AI models. This is because while there is a strong argument that the use of training data for these models is highly transformative, it is also true that the companies behind many of the models—like OpenAI, Midjourney, and Stability AI—are commercial in nature, and monetize their programs in different ways. The recent ASTM v. PRO decision could affect how courts weigh the commerciality of these companies’ uses of copyrighted training data against the extent to which the uses are transformative, potentially tipping the scale towards fair use in the upcoming copyright lawsuits about generative AI and training data.
Second, the question of market harm in factor four can be a complicated one, and this case may provide some guidance for courts going forward. This issue was animated in the recent decision in Hachette Books v. Internet Archive—the case about whether controlled digital lending is a fair use, which we have been covering and involved in for years now, notably as an amicus in support of the Internet Archive. In the Hachette decision, the judge found that factor four weighed in favor of the publishers without direct evidence of financial harm, based on the idea that CDL scans could be substitutes for licensed ebooks. But in ASTM v. PRO, the court was skeptical that an allegation of potential market harm, without actual evidence, was sufficiently convincing. Since Hachette has been appealed and will soon be before the Second Circuit, we are hopeful that ASTM v. PRO will be a useful precedent for those judges. Extending the logic of ASTM v. PRO, it may be that the publishers will need to demonstrate market harm with tangible evidence (such as concrete evidence of lost sales) in that case in order to prevail on factor four.
Today we are very pleased to share an open letter regarding copyright reform on behalf of South African authors. The letter is available here and is also available as a PDF (with names as of today) here.
The letter comes at a critical decision making moment for South Africa’s Copyright Amendment Bill which has been debated for years (read more here and here on our views). We believe it is important for lawmakers to hear from authors who support this bill, and in particular hear from us about why we view its fair use provisions and author remuneration provisions so positively.
We welcome other South African authors to add their names to the letter to express their support. You can do so by completing this form.
Authors Alliance members will recall the seriesofposts we’ve made about the United States’s new copyright small claims court. The below is a post by Dave Hansen and Authors Alliance member Katie Fortney, based on a forthcoming article we recently posted assessing how this court has fared in its first year of operations. This post was originally published on the Kluwer Copyright Blog.
In June 2023 the U.S. Copyright Office celebrated the one-year anniversary of operations of the Copyright Claims Board (“CCB”), a novel new small claims court housed within the agency with a budget request for $2.2 million in ongoing yearly costs. Though not entirely unique (e.g., the UK’s IP Enterprise court has been described as filling a similar role since 2012), the CCB has been closely watched and hotly debated (see here, here, and here).
The CCB was preceded by years of argument about the benefits and risks of such a small claims court. Proponents argued that the CCB would offer rightsholders a low-cost, efficient alternative to litigation in federal courts (which can easily cost over $100,000 to litigate), allowing small creators to more effectively defend their rights. Opponents feared that the CCB would foster abuse, encouraging frivolous lawsuits while creating a trap for unwary defendants.
We set out to assess these arguments in light of data on the CCB’s first year of operation, which is explored in more detail in our article here, forthcoming in the Journal of the Copyright Society of the USA, and the data used for this article available here. The post summarizes from that article, which is itself based on an empirical review of the CCB’s first year of operations using data extracted from the CCB’s online filing system for the 487 claims filed with the court between June 2022 and June 2023.
How the CCB Works
To assess the work of the CCB, it’s first important to understand how the new court works. For claimants to successfully pursue a claim, they must first pass three hurdles:
their claim must be compliant, which means that it must include some key information regarding, e.g., ownership of a copyright, access to the work by the respondent in order to copy it, and substantial similarity between the allegedly infringing copy and the original;
the claimant must wait 60 days to see if the respondent decides to opt-out of the proceedings (in which case the claimant can refile in the more expensive, but more robust federal district court).
Once the opt-out window has passed, the proceeding becomes “active” and a scheduling order is issued. Then the parties can engage in discovery, have hearings and conferences, and eventually receive a final determination where the CCB may award damages.
CCB By the Numbers
In the first year of the CCB 487 claims were filed. However, only 43 of these 487 claims–less than 9%–had been issued scheduling orders and made it to the active phase by June 15, 2023.
Meanwhile, 302 cases had been closed, most of them dismissed without prejudice (meaning the case did not reach the merits and the claimant could choose to file again). The remaining claims were either awaiting review by the CCB, or waiting for an action from the claimant like filing an amended claim or filing proof of service.
Though the CCB gives claimants multiple opportunities to amend their complaint to fix problems with it (even offering detailed and helpful suggestions on how to fix those problems), over 150 claims were dismissed because the claimant did not file a proper claim. Failure to state facts sufficient to support Access and Substantial Similarity were common problems, showing up about 110 times each in CCB orders to amend (sometimes in the same order to amend). In some cases, however, there was no way to fix the complaint. For example, 35 claims were trying to pursue cases against foreign respondents, over whom the CCB has no jurisdiction. And over 100 claims were copyright infringement claims where the claimant hadn’t filed for copyright registration of the work allegedly infringed (a prerequisite to filing).
Claimants also had problems with service: 60 claims were dismissed in the first year because claimants didn’t file documentation showing that they’d accomplished valid proof of service. Finally, opt-out (which some proponents of the CCB feared would undermine the court) is an important but much smaller pathway out of the CCB: it accounted for 35 dismissals.
Perhaps because copyright is technical and complicated, it may not be surprising to find that having a lawyer helps avoid dismissal: 90% of claims from represented claimants had been certified as compliant; for claims from self-represented claimants, only 46% were compliant. Unregistered claimants account for over 70% of claims filed, but only 40% of those that make it to the active phase.
Looking more closely at the claimants themselves, we do see that the CCB system is being used by aggressive and prolific copyright litigants, but we haven’t seen the volume of copyright-troll litigation seen in the past in federal district courts.This may be in part because the Copyright Office took these concerns seriously and created rules to discourage it, such as limiting the number of claims a plaintiff can file within one year. The number of repeat filers was low – only nine filers had more than five claims. Those include, however, 17 claims filed by Higbee and Associates (sometimes referred to as a “troll” though the label may not exactly fit), and 20 by David C. Deal (another known and aggressive serial copyright litigant). And the only case in which the CCB had issued an order was in favor of David Oppenheimer, who has separately filed more than 170 copyright suits in federal courts.
Because the process has been so slow, it’s difficult to evaluate how the CCB is working for respondents. Opponents of the CCB feared that its ability to make default determinations (issuing monetary awards when the respondent never shows up) could be a trap for the unwary. The CCB has issued only two such determinations so far (both in August 2023, for $3000 each), and only one final determination that wasn’t the result of a default, withdrawal, or settlement. So, it’s too early to tell how common defaults will be. However, they will continue to be an issue to watch: in the first year, respondents were as likely to end up on the path to default as they were to participate in a proceeding.
Our Takeaways and Conclusion
On the one hand, we haven’t seen rampant abuse of the system. To be sure, serial copyright litigants are actively using the CCB, but in numbers far fewer than previously seen even in federal district court. And damage awards have been modest.
However, it also seems that the CCB has not achieved its promised efficiency for small litigants–for most claimants the system seems to be too complicated and slow, with the CCB only issuing a final determination in a single case in its entire first year, and the vast majority of claims dismissed for failure to adequately comply with CCB rules. The CCB has already gone to great lengths to explain the process and to help claimants correct errors early in the process. It may be hard for the CCB to adjust its rules to lower barriers unless it is willing to sacrifice basic procedural safeguards for respondents (something we think it should not do). Despite the hope of advocates and legislators and the admirable efforts of those working at the CCB, the early results lead us to think that it may just be that complex copyright disputes are ill-suited for a self-service small claims tribunal.
Earlier today Authors Alliance joined a broad coalition of public interest organizations, creators, academics, and others in a letter to members of Congress urging caution when considering proposals to revamp copyright law to address concerns about artificial intelligence. As we explained previously, we believe that copyright law currently has the appropriate tools needed to both protect creators and encourage innovation.
As the letter states, the signatories share a common interest in ensuring that artificial intelligence meets its potential to enrich the American economy, empower creatives, accelerate the progress of science and useful arts, and expand humanity’s overall welfare. Many creators are already using AI to conduct innovative new research, address long-standing questions, and produce new creative works. Some, such as some of these artists, have used this technology for many years.
So our message is simple: existing copyright doctrine has evolved and adapted to accommodate many revolutionary technologies, and is well equipped to address the legitimate concerns of creators. Our courts are the proper forum to apply those doctrines to the myriad fact patterns that AI will present over the coming years and decades.
Current copyright law isn’t perfect, and we certainly believe creativity and innovation would benefit from some changes. However, we should be careful about reactionary, alarmist politics. It seldom makes for good law. Unfortunately, that’s what we’re seeing right now with AI, and we hope that Congress has the wisdom to see through it.
We encourage our members to reach out to your own Congressional representative to express the need to tread carefully, and (if you are) to explain how you are using AI in your work. We’d also be very happy to hear from you as we develop our own further policy communications to Congress and to agencies such as the U.S. Copyright Office.
Authors Alliance is pleased to announce that in recent weeks, we have submitted petitions to the Copyright Office requesting that it recommend renewing expanding the existing text data mining exemptions to DMCA liability to make the current legal carve-out that enables text and data mining more flexible, so that researchers can share their corpora of works with other researchers who want to conduct their own text data mining research. On each of these petitions, we were joined by two co-petitioners, the American Association of University Professors and the Library Copyright Alliance. These were short filings—requesting changes and providing brief explanations—and will be the first of many in our efforts to obtain a renewal and expansion of the existing TDM exemptions.
Background
The Digital Millennium Copyright Act (DMCA) includes a provision that forbids people from bypassing technical protection measures on copyrighted works. But it also implements a triennial rulemaking process whereby organizations and individuals can petition for temporary exemptions to this rule. The Office recommends an exemption when its proponents show that they, or those they represent, are “adversely affected in their ability to make noninfringing [fair] uses due to the prohibition on circumventing access controls.” Every three years, petitioners must ask the Office to renew existing exemptions in order for them to continue to apply. Petitioners can also ask the Office to recommend expanding an existing exemption, which requires the same filings and procedure as petitioning for a new exemption.
Back in 2020, during the eighth of these triennial rulemakings, Authors Alliance—along with the Library Copyright Alliance and the American Association of University Professors—petitioned the Copyright Office to create an exemption to DMCA liability that would enable researchers to conduct text and data mining. Text and data mining is a fair use, and the DMCA prohibitions on bypassing DRM and similar technical protection measures made it difficult or even impossible for researchers to conduct text and data mining on in-copyright e-books and films. After a long process which included filing a comment in support of the exemption and an ex parte meeting with the Copyright Office, the Office ultimately recommended that the Librarian of Congress grant our proposed exemption (which she did). The Office also recommended that the exemption be split into two parts, with one exemption addressing literary works distributed electronically, and the other addressing films.
While the ninth triennial rulemaking does not technically happen until 2024, petitions for renewals, expansions, and new exemptions have already been filed.
Our Petitions
Back in early July, we made our first filings with the Copyright Office in the form of renewal petitions for both exemptions. For this step, proponents of current exemptions simply ask the Copyright Office to renew them for another three year cycle, accompanied by a short explanation of whether and how the exemption is being used and a statement that neither law nor technology has changed such that the exemption is no longer warranted. Other parties are then given an opportunity to respond to or oppose renewal petitions. The Office recommends that exemption proponents who want to expand a current exemption also petition for its renewal—which is just what we did. In our renewal petitions, we explained how researchers are using the exemptions and how neither recent case law nor the continued availability of licensed TDM databases represent changes in the law or technology, making renewal of the TDM exemptions proper and justified. The renewal petitions follow a streamlined process, where they are generally simply granted unless the Office finds there to be “meaningful opposition” to a renewal petition, articulating a change in the law or facts. You can find our renewal petition for the literary works TDM exemption here, and our renewal petition for the film TDM exemption here.
But we also sought to expand the current exemptions, in two petitions submitted a few weeks back. In our expansion petitions, we proposed a simple change that we would like to see made to the current DMCA exemptions for text data mining. In the exemption’s current form, academic researchers can bypass technical protection measures to assemble a corpus on which to conduct TDM research, but they can only share it with other researchers for purposes of “collaboration and verification.” We asked the Office to permit these researchers to share their corpora with other researchers who want to use the corpus to conduct TDM research, but are not direct collaborators. However, this second group of researchers would still have to comply with the various requirements of the exemption, such as complying with security measures. Essentially, we seek to expand the sharing provision of the current exemption while leaving the other provisions intact. This is largely based on feedback we have received from those using the exemption and our understanding of how the regulation can be improved so that their desired noninfringing uses are no longer adversely affected by this limitation. You can find our expansion petition for the literary works TDM exemption here, and our expansion petition for the film TDM exemption here.
What’s Next?
The next step in the triennial rulemaking process is the Copyright Office issuing a notice of proposed rulemaking, where it will lay out its plan of action. While we do not have a set timeline for the notice of proposed rulemaking, during the last rulemaking cycle, it happened in mid-October—meaning it is reasonable to expect the Office to issue this notice in the next two months or so. Then, there will be several rounds of comments in support of or in opposition to the proposals. Finally, the Office will issue a final recommendation, and the Librarian of Congress will issue a final rule. While the Librarian of Congress is not legally obligated to adopt the Copyright Office’s recommendations, they traditionally do. Based on last year’s cycle, we can expect a final rule to be issued around October 2024. So we are in for a long wait and a lot of work! We will keep our readers updated as the rulemaking moves forward.
Authors Alliance readers will surely have noticed that we have been writing a lot about generative AI and copyright lately. Since the Copyright Office issued its decision letter on copyright registration in a graphic novel that included AI-generated images a few months back, many in the copyright community and beyond have struggled with the open questions around generative AI and copyright.
The Copyright Office has launched an initiative to study generative AI and copyright, and today issued a notice of inquiry to solicit input on the issues involved. The Senate Judiciary Committee has also held multiple hearings on IP rights in AI-generated works, including one last month focused on copyright. And of course there are numerous lawsuits pending over its legality, based on theories ranging from copyright infringement to to privacy to defamation. It’s also clear that there is little agreement about a one-size-fits-all rule for AI-generated works that applies across industries.
At Authors Alliance, we care deeply about access to knowledge because it supports free inquiry and learning, and we are enthusiastic about ways that generative AI can meaningfully further those ideals. In addition to all the mundane but important efficiency gains generative AI can assist with, we’ve already seen authors incorporate generative AI into their creative processes to produce new works. We’ve also seen researchers incorporate these tools to help make new discoveries. There are some clear concerns about how generative AI tools, for example, can make it easier to engage in fraud and deception, as well as perpetuating disinformation. There have been many calls for legal regulation of generative AI technologies in recent months, and we wanted to share our views on the copyright questions generative AI poses, recognizing that this is a still-evolving set of questions.
Copyright and AI
Copyright is at its core an economic regulation meant to provide incentives for creators to produce and disseminate new expressive works. Ultimately, its goal is to benefit the public by promoting the “progress of science,” as the U.S. Constitution puts it. Because of this, we think new technology should typically be judged by what it accomplishes with respect to those goals, and not by the incidental mechanical or technological means that it uses to achieve its ends.
Within that context, we see generative AI as raising three separate and distinct legal questions. The first and perhaps most contentious is whether fair use should permit use of copyrighted works as training data for generative AI models. The second is how to treat generative AI outputs that are substantially similar to existing copyrighted works used as inputs for training data—in other words, how to navigate claims that generative AI outputs infringe copyright in existing works. The third question is whether copyright protection should apply to new outputs created by generative AI systems. It is important to consider these questions separately, and avoid the temptation to collapse them into a single inquiry, as different copyright principles are involved. In our view, existing law and precedent give us good answers to all three questions, though we know those answers may be unpalatable to different segments of a variety of content industries.
Training Data and Fair Use
The first area of difficulty concerns the inputstage of generative AI. Is the use of training data which includes copyrighted works a fair use, or does it infringe on a copyright owner’s exclusive rights in her work? The generative AI models used by companies like OpenAI, Stability AI, and Stable Diffusion are based on massive sets of training data. Much of the controversy around intellectual property and generative AI concerns the fact that these companies often do not seek permission from rights holders before training their models on works controlled by these rights holders (although some companies, like Adobe, are building generative AI models based on their own stock images, openly-licensed images, and public domain content). Furthermore, due to the size of the data sets and nature of their collection (often obtained via scraping websites), the companies that deploy these models do not make clear what works make up the training data. This question is one that is controversial and highly debated in the context of written works, images, and songs. Some creators and creator communities in these areas have made calls for “consent, credit, and compensation” when their works are included in training data. The obstacle to that point of view is, if the use of training data is a fair use, none of this is required, at least not by copyright.
We believe that the use of copyrighted works as training data for generative AI tools should generally be considered fair use. We base this view on our reading of numerous fair use precedents including Google Books and HathiTrust cases as well others such as iParadigms.These and other cases support the idea that fair use allows for copying for non-expressive uses—copying done as an “intermediate step” in producing non-infringing content, such as by extracting non-expressive content such as patterns, facts, and data in or about the work. The notion that non-expressive (also called “non-consumptive”) uses do not infringe copyrights is based in large part on a foundational principle in copyright law: copyright protection does not extend to facts or ideas. If it did, copyright law would run the risk of limiting free expression and inhibiting the progress of knowledge rather than furthering it. Using in-copyright works to create a tool or model with a new and different purpose from the works themselves, which does not compete with those works in any meaningful way, is a prototypical fair use. Like the Google Books project (as well as text data mining), generative AI models use data (like copyrighted works) to produce information about the works they ingest, including abstractions and metadata, rather than replicating expressive text.
In addition, fair use of copyrighted works as training data for generative AI has several practical implications for the public utility of these tools. For example, without it, AI could be trained on only “safe materials,” like public domain works or materials specifically authorized for such use. Models already contain certain filters—often excluding hateful content or pornography as part of its training set. However, a more general limit on copyrighted content—virtually all creative content published in the last one hundred years—would tend to amplify bias and the views of an unrepresentative set of creators.
Generative AI Outputs and Copyright Infringement
The feature that most distinguishes generative AI from technology in copyright cases that preceded it, such as Google Books and HathiTrust, is that generative AI not only ingests copyrighted works for the purpose of extracting data for analysis or search functionality, but for using this extracted data to produce new content. Can content produced by a generative AI tool infringe on existing copyrights?
Some have argued that the use of training data in this context is not a fair use, and is not truly a “non-expressive use” because generative AI tools produce new works based on data from originals and because these new works could in theory serve as market competitors for works they are trained on. While it is a fair point that generative AI is markedly different from those earlier technologies because of these outputs, the point also conflates the question of inputs and outputs. In our view, e using copyrighted works as inputs to develop a generative AI tool is generally not infringement, but this does not mean that the tool’s outputs can’t infringe existing copyrights.
We believe that while inputs as training data is largely justifiable as fair use, it is entirely possible that certain outputs may cross the line into infringement. In some cases, a generative AI tool can fall into the trap of memorizing inputs such that it produces outputs that are essentially identical to a given input. While evidence to date indicates that memorization is rare, it does exist.
So how should copyright law address outputs that are essentially memorized copies of inputs? We think the law already has the tools it needs to address this. Where fair use does not apply, copyright’s “substantial similarity” doctrine is equipped to handle the question of whether a given output is similar enough to an input to be infringing. The substantial similarity doctrine is appropriately focused on protection of creative expression while also providing room for creative new uses that draw on unprotectable facts or ideas. Substantial similarity is nothing new: it has been a part of copyright infringement analysis for decades, and is used by federal courts across the country. And it may well be that standards, such as a set of “Best Practices for Copyright Safety for Generative AI” proposed by law professor Matthew Sag, will become an important measure of assessing whether companies offering generative AI have done enough to guard against the risk of their tools producing infringing outputs.
Copyright Protection of AI Outputs
A third major question is, what exactly is the copyright status of the outputs of generative AI programs: are they protected by copyright at all, and if so, who owns those copyrights? Under the Copyright Office’s recent registration guidance, the answer seems to be that there is no copyright protection in the outputs. This does not sit well with some generative AI companies or many creators who rely on generative AI programs in their own creative work.
We generally agree with the Copyright Office’s recent guidance concerning the copyright status of AI-generated works, and believe that they are unprotected by copyright. This is based on the simple but enduring “human authorship” requirement in copyright law, which dates back to the late 19th century. In order to be protected by copyright, a work must be the product of a human author and contain a modicum of human creativity. Purely mechanical processes that occur without meaningful human creative input cannot generate copyrightable works. The Office has categorized generative AI models as this kind of mechanical tool: the output responds to the human prompt, but the human making the prompt does not have sufficient control over how the model works to make them an “author” of the output for the purposes of copyright law. The district court for D.C. recently issued a decision agreeing with this take in Thaler v. Perlmutter, a case that challenged the human authorship requirement in the context of generative AI.
It’s interesting to note here that in the Copyright Office listening session on text-based works, participants nearly universally agreed that outputs should not be protected by copyright, agreeing with the Copyright Office’s guidance. Yet the other listening sessions had more of a diversity of views. In particular, the participants in the listening sessions on audiovisual works and sound recordings were concerned about this issue. In industries like the music and film industries, where earlier iterations of generative AI tools have long been popular (or are even industry norms), the prospect of being denied copyright protection in songs or films, simply due to the tools used, can understandably be terrifying for creators who want to make a profit from their works. On this front, we’re sympathetic. Creators who rely on their copyrights to defend and monetize their works should be permitted to use generative AI as a creative tool without losing that protection. While we believe that the human authorship requirement is sound, it would be helpful to have more clarity on the status of works that incorporate generative AI content. How much additional human creativity is needed to render an AI-generated work a work of human authorship, and how much can a creator use a generative AI tool as part of their creative process without foregoing copyright protection in the work they produce? The Copyright Office seems to be grappling with these questions as well and seeking to provide additional guidance, such as in a recent webinar with more in-depth registration guidance for creators relying on generative AI tools in their creative efforts.
Other Information Policy Issues Affecting Authors
Generative AI has generated questions in other areas of information policy beyond the copyright questions we discuss above. Fraudulent content or disinformation, the harm caused by deep fakes and soundalikes, defamation, and privacy violations are serious problems that ought to be addressed. Those uses do nothing to further learning, and actually pollute public discourse rather than enhance it. They can also cause real monetary and reputational harm to authors.
In some cases, these issues can be addressed by information policy doctrines outside of copyright, and in others, they can be best handled by regulations or technical standards addressing development and use of generative AI models. A sound application of state laws such as defamation law, right of publicity laws, and various privacy torts could go a long way towards mitigating these harms. Some have proposed that the U.S. implement new legislation to enact a federal right of publicity. This would represent a major change in law and the details of such a proposal would be important. Right now, we are not convinced that this would serve creators better than the existing state laws governing the right of publicity. While it may take some time for courts to figure out how to adapt legal regimes outside of copyright to questions around generative AI, adapting the law to new technologies is nothing new. Other proposals call for regulations like labeling AI-generated content, which could also be reasonable as a tool to combat disinformation and fraudulent content.
In other cases, creators’ interests could be protected through direct regulation of the development and use of generative AI models. For example, certain creators’ desire for consent, credit, and compensation when their works are included in training data sets for generative AI programs is an issue that could be perhaps addressed through regulation of AI models. As for consent, some have called for an opt-out system where creators could have their works removed from the training data, or the deployment of a “do not train” tag similar to the robots.txt “do not crawl” tag. As we explain above, under the view that training data is generally a fair use, this is not required by copyright law. But the views that using copyrighted training data without some sort of recognition of the original creator is unfair, which many hold, may support arguments for other regulatory or technical approaches that would encourage attribution and pathways for distributing new revenue streams to creators.
Similarly, some have called for collective licensing legislation for copyrighted content used to train generative AI models, potentially as an amendment to the Copyright Act itself. We believe that this would not serve the creators it is designed to protect and we strongly oppose it. In addition to conflicting with the fundamental principles of fair use and copyright policy that have made the U.S. a leader in innovation and creativity, collective licensing at this scale would be logistically infeasible and ripe for abuse, and would tend to enrich established, mostly large rights holders while leaving out newer entrants. Similar efforts several years ago were proposed and rejected in the context of mass digitization based on similar concerns.
Generative AI and Copyright Going Forward
What is clear is that the copyright framework for AI-generated works is still evolving, and just about everyone can agree on that. Like many individuals and organizations, our views may well shift as we learn more about the real-world impacts of generative AI on creative communities and industries. It’s likely that as these policy discussions continue to move forward and policymakers, advocacy groups, and the public alike grapple with the open questions involved, the answers to these open questions will continue to develop. Changes in generative AI technology and the models involved may also influence these conversations. Today, the Copyright Office published issued a notice of inquiry on the topic of copyright in AI-generated works. We plan to submit a comment sharing our perspective, and are eager to learn about the diversity of views on this important issue.