We are pleased to announce that we have submitted a comment to the Copyright Office in response to their recent notice of inquiry regarding how copyright law interacts with generative AI. In our comment, we shared our views on copyright and generative AI (which you can read about here) and the stories we heard from authors about how they are using generative AI to support their creative labors, research, and the mundane but important tasks being involved with being a working author. The Office received over 10,000 comments in response to its NOI, showing the high level of interest in how copyright regulates AI-generated works and training data for generative AI. We hope the Office will appreciate our perspective as it considers policy interventions to address copyright issues involved in the use of generative AI by creators. You can read our full comment here, or at the bottom of this post.
You can hear more about our comment, and about contributions from other commenters, at the Berkeley Center for Law and Technology virtual roundtable on Monday, November 13th, where Authors Alliance senior staff attorney Rachel Brooke will be a panelist. The event is free and open to the public, and you can sign up here.
Background
Since the Copyright Office issued an opinion letter on copyright in a graphic novel containing AI-generated images back in February, the debate about copyright and generative AI has grown to a near fever pitch. Authors Alliance has been engaged in these issues since the decision letter was released: we exist to support authors who want to leverage the tools available in the digital age to see their creations reach broad audiences and create innovative new works, and we see generative AI systems as one such tool that can support authors and authorship. We participated in the Copyright Office’s listening session on copyright issues in AI-generated textual works this spring, and were eager to further weigh in as the Copyright Office wades through the thorny issues involved.
In late August, the Copyright Office issued a notice of inquiry, asking stakeholders to weigh in on a series of questions about copyright policy and generative AI. These were broken down into general questions, questions about training AI models, questions about transparency and recordkeeping, and various issues related to AI outputs—copyrightability, infringement, and labeling and identification.
Our Comment
Our comment was devoted in large part to sharing the ways that authors are using generative AI systems and tools to support their creative labors and research. We heard from authors that used generative AI systems for ideation, late stage editing, and generating text. We also learned that authors are using generative AI systems in ways we wouldn’t have anticipated—like creating books of prompts for other authors to use as inputs for generative AI systems. Generative AI has helped authors who don’t publish with conventional publishers create marketing copy and even generate book covers (despite the common adage, these are pretty important for attracting readers). We also heard from researchers using generative AI for literature reviews as well as to make their writing process more efficient so they can focus on doing the work of researching and innovating. Generative AI also has the potential to lower barriers to entry for scientific researchers who are not native English speakers, but want to make contributions to scientific fields in which literature tends to be written in English.
We also spent some time explaining our views on why the use of copyrighted materials in training datasets for AI models constitutes fair use and how fair use analysis applies when copyrighted materials are included in training datasets. The use of creative works in training datasets is a transformative one with a different purpose than the works themselves—regardless of whether the institutions that develop and deploy them are commercial or nonprofit. And it’s highly unlikely that a generative AI system could harm the markets for the works in the training sets for the underlying models: a generative AI system is not a substitute for a book a reader is interested in reading, for example. We also explained that the market harm consideration (factor four in fair use analysis) should consider the effect of the use (using training data on AI models) on the market for the specific work in question (i.e., in an infringement action, the work that is alleged to have been infringed), and not the market for that author’s other works, similar works, or anything else.
Our comment also argued that new copyright legislation on AI—either to codify copyright’s human authorship requirement and explain how it applies to AI-generated content or to address other issues related to copyright and generative AI—is not warranted. AI systems, AI models, and the ways creators use them are still evolving. Copyright law is already highly flexible, having adapted to new technologies that weren’t anticipated when the copyright legislation itself was enacted. And legislating around nascent technologies can result in laws that are eventually ill-suited to deal with unexpected challenges that new technologies bring about (recall that the DMCA, which has faced a lot of criticism as a statute intended to regulate copyright online, was passed in 1998). We instead suggest that the Office stick with a “wait and see” approach as generative AI and how we use it continue to develop rather than recommending legislation to Congress.
Next, we explained why a licensing system for AI works in training data is neither desirable nor practicable. Because we consider the use of copyrighted works in training data to be a fair use, licenses are not necessary in the first place. We also explained the host of problems that either a compulsory licensing regime or a collective licensing scheme would bring about. The large size of datasets for training AI models make it difficult to envision systematically seeking licenses for each and every copyrighted work in the training dataset, and the “orphan works problem” means that a majority of rightsholders might not be able to be found. It’s also not clear who would administer licensing under a licensing regime, and we could not think of any appropriate party that exists or is likely to emerge. The Office’s past failed investigations into possible collective rights management organizations (or CMOs) only underscore this point.
Finally, we echoed our support for the substantial similarity test as a way to handle generative AI outputs that look very similar to existing copyrighted works. The substantial similarity test has been around for decades and has been applied across the country in a variety of contexts. It seems to us to be a good way to approach the rare cases in which generative AI outputs are strikingly similar to copyrighted works (so-called “memorization”) such that a rightsholder might sue for infringement.
What’s Next?
The same day we submitted our comment, the Biden Administration released an executive order on “Safe, Secure, and Trustworthy Artificial Intelligence,” directing federal agencies to take a variety of measures to ensure that the use of generative AI is not harmful to innovation, privacy, labor, and more. Then on Wednesday, representatives from a coalition of countries (including the U.S.) signed “The Bletchley Declaration” following an AI Safety Summit in the U.K., warning of the dangers of generative AI and pledging to work together to find solutions. All of this is to say that how public policy should regulate generative AI, and whether and how the law needs to change to accommodate it, is a live issue that continues to evolve every day. Dozens of lawsuits are pending about the interaction between copyright and the use of generative AI systems, and as these cases move through the courts, judges will have their opportunity to weigh in. As ever, we will keep our readers and members appraised in any new legal developments around copyright and generative AI.
COLC-2023-0006-8976_attachment_1