Category Archives: Blog

An Update on our Text and Data Mining: Demonstrating Fair Use Project

Posted April 28, 2023

Back in December we announced a new Authors Alliance’s project, Text and Data Mining: Demonstrating Fair Use, which is about lowering and overcoming legal barriers for researchers who seek to exercise their fair use rights, specifically within the context of text data mining (“TDM”) research under current regulatory exemptions. We’ve heard from lots of you about the need for support in navigating the law in this area. This post gives a few updates. 

Text and Data Mining Workshops and Consultations

We’ve had a tremendous amount of interest and engagement with our offers to hold hands-on workshops and trainings on the scope of legal rights for TDM research. Already this spring, we’ve been able to hold two workshops in the Research Triangle hosted at Duke University, and a third workshop at Stanford followed by a lively lunch-time discussion. We have several more coming. Our next stop is in a few weeks at the University of Michigan, and we have plans in the works for workshops in the Boston area, New York, a few locations on the West Coast, and potentially others as well. If you are interested in attending or hosting a workshop with TDM researchers, librarians, or other research support staff, please let us know! We’d love to hear from you. The feedback so far has been really encouraging, and we have heard both from current TDM researchers and those for whom the workshops have opened their eyes to new possibilities. 

ACH Webinar: Overcoming Legal Barriers to Text and Data Mining
Join us! In addition to the hands-on in-person workshops on university campuses, we’re also offering online webinars on overcoming legal barriers to text and data mining. Our first is hosted by the Association for Computers and the Humanities on May 15 at 10am PT / 1pm ET. All are welcome to attend, and we’d love to see you online!
Read more and register here. 

Research 

A second aspect of our project is to research how the current law can both help and hinder TDM researchers, with specific attention to fair use and the DMCA exemption that Authors Alliance obtained for TDM researchers to break digital locks when building a corpus of digital content such as ebooks or DVDs.

Christian Howard-Sukhil, Authors Alliance Text and Data Mining Legal Fellow

To that end, we’re excited to announce that Christian Howard-Sukhil will be joining Authors Alliance as our Text and Data Mining Legal Fellow. Christian holds a PhD in English Language and Literature from the University of Virginia and is currently pursuing a JD from the UC Berkeley School of Law. Christian has extensive digital humanities and text data mining experience, including in previous roles at UVA and Bucknell University. Her work with Authors Alliance will focus on researching and writing about the ways that current law helps or hinders text and data mining researchers in the real world. 

The research portion of this project is focused on the practical implications of the law and will be based heavily on feedback we hear from TDM researchers. We’ve already had the opportunity to gather some feedback from researchers including through the workshops mentioned above, and plan to do more systematic outreach over the coming months. Again, if you’re working in this field (or want to but can’t because of concerns about legal issues), we’d love to hear from you. 

At this stage we want to share some preliminary observations, based on recent research into these issues (supported by the work of several teams of student clinicians) as well as our recent and ongoing work with TDM researchers:

1) Licenses restrictions are a problem. We’ve heard clearly that licenses and terms of use impose a significant barrier to TDM research. While researchers are able to identify uses that would qualify as fair use and also many uses that likely qualify under the DMCA exemption, terms of use accompanying ebook licenses can override both. These terms vary, from very specific prohibitions–e.g., Amazon’s, which says that users “may not attempt to bypass, modify, defeat, or otherwise circumvent any digital rights management system”–to more general prohibitions on uses that go beyond the specific permissions of the license–e.g., Apple’s terms, which state that “No portion of the Content or Services may be transferred or reproduced in any form or by any means, except as expressly permitted.” Even academic licenses, often negotiated by university libraries to have  more favorable terms, can still impose significant restrictions on reuse for TDM purposes. Although we haven’t heard of aggressive enforcement of those terms to restrict academic uses, even the mere existence of those terms can have chilling and negative real world impacts on research using TDM techniques.

The problem of licenses overriding researchers rights under fair use and other parts of copyright law is of course not limited to just inhibiting text and data mining research. We wrote about the issue, and how easy it is to evade fair use, a few months ago, discussing the many ways that restrictive license terms can inhibit normal, everyday uses of works such as criticism, commentary and quotation. We are currently working on a separate paper documenting the scope and extent of “contractual override,” and will be part of a symposium on the subject in May, hosted by the Association of Research Libraries and the American University, Washington College of Law Program on Information Justice and Intellectual Property.

2) The TDM exemption is flexible, but local interpretation and support can vary. We’ve heard that the current TDM exemption–allowing researchers to break technological protection measures such as DRM on ebooks and CSS on DVDs–is an important tool to facilitate research on modern digital works. And we believe the terms of that exemption are sufficiently flexible to meet the needs of a variety of research applications (how wide a variety remains to be seen through more research). But local understanding and support for researchers using the exemption can vary. 

For example, the exemption requires that the university that the TDM research is associated with implement “effective security measures” to ensure that the corpus of copyrighted works isn’t used for another purpose. The regulation further explains that in the absence of a standard negotiated with content holders, “effective security measures” means “measures that the institution uses to keep its own highly confidential information secure.” University  IT data security standards don’t always use the same language or define their standard to cover “highly confidential information” and so university IT offices must interpret this language and implement the standard in their own local context. This can create confusion about what precisely universities need to do to secure the TDM corpora. 

Some of these definitional issues are likely growing pains–the exemption is still new and universities need time to understand and implement standards to satisfy its terms in a reasonable way–it will be important to explore further where there is confusion on similar terms and how that might best be resolved. 

3) Collaboration and sharing are important. Text and data mining projects are often conceived of as part of a much larger research agenda, with multiple potential research outputs both from the initial inquiry and follow-up studies with a number of researchers, sometimes from a number of institutions. Fair use clearly allows for collaborative TDM work –e.g., in  Authors Guild v. HathiTrust, a foundational fair use case for TDM research in the US, we observe that the entire structure of HathiTrust is a collective of a number of research institutions with shared digital assets. And likewise, the TDM exemption permits a university to provide access to “researchers affiliated with other institutions of higher education solely for purposes of collaboration or replication of the research.” The collaborative aspect of this work raises some challenging questions, both operationally and conceptually. For example, the exemption for breaking digital locks doesn’t define precisely who qualifies as a researcher who is “affiliated,” leaving open questions for universities implementing the regulation. More conceptually, the issue of research collaboration raises questions about how precisely the TDM purpose must be defined when building a corpora under the existing exemption, for example when researchers collaborate but investigate different research questions over time. Finally, the issue of actually sharing copies of the corpus with researchers at other institutions is important because at least in some cases, local computing power is needed to effectively engage with the data. 

Again, just preliminary research, but some interesting and important questions! If you are working in this area in any capacity, we’d love to talk. The easiest way to reach us is at  info@authorsalliance.org

Want to Learn More?
This current Authors Alliance project is generously supported by the Mellon Foundation, which has also supported a number of other important text and data mining projects. We’ve been fortunate to be part of a broader network of individuals and organizations devoted to lowering legal barriers for TDM researchers. This includes efforts spearheaded by a team at UC Berkeley to produce the “Legal Literacies for Text Data Mining” and its current project to address cross-border TDM research, as well as efforts from the Global Network on Copyright and User Rights, which has (among other things) led efforts on copyright exceptions for TDM globally.

Authors Alliance Joins Copyright Office Listening Session On Copyright in AI-Generated Literary Works

Posted April 20, 2023
Photo by Possessed Photography on Unsplash

Yesterday, I represented Authors Alliance in a Copyright Office listening session on copyright issues in AI-generated literary works, in the first of two of such sessions that the Office convened yesterday afternoon. I was pleased to be invited to share our views with the Office and participate in a rousing discussion among nine other stakeholders, representing a diverse group of industries and positions. Generative AI raises challenging legal questions, particularly for its skeptics, but it also presents some incredible opportunities for authors and other creators.

During the listening session, I emphasized the potential for generative AI programs (like OpenAI’s Chat GPT, Microsoft’s Bing AI, Jasper, and others) to support authorship in a number of different ways. For instance, generative AI programs support authors by increasing the efficiency of some of the practical aspects of being a working author aside from their writings. But more importantly, generative AI programs can actually help authors express themselves and create new works of authorship. 

In the first category, generative AI programs can support authors by, for example, helping them create text for pitch letters to send to agents and editors, produce copy for their professional websites, and develop marketing strategies for their books. Making these activities more efficient frees up time for authors to focus on their writing, particularly for authors whose writing time is limited by other commitments. 

In the second category, generative AI has tremendous potential to help authors come up with new ideas for stories, develop characters, summarize their writings, and perform early stage edits of manuscripts. Moreover, and particularly for academic authors, generative AI can be an effective research tool for authors seeking to learn from a large corpus of texts. Generative AI programs can help authors research by providing short and simple summaries of complex issues, surveys of the landscape of various fields, or even guidance on what human works to turn to in their research. Authors Alliance is committed to protecting authors’ right to conduct research, and we see generative AI tools as a new, innovative, and efficient form of conducting this research. Making research easier helps authors save time, and has a particular benefit for authors with disabilities that make it difficult to travel to multiple libraries or otherwise rely on analog forms of research. 

These programs undoubtedly have the potential to serve as powerful creative tools that support authorship in these ways and more, but, when discussing the copyright implications of the programs and the works they produce, it’s important to remember just how new these technologies are. Because generative AI remains in its infancy, and the costs and benefits for different segments of the creative industry have yet to be seen, it seems to me to be sensible to preserve the development of these tools before crafting legal solutions to problems they might pose in the future. And in fact, in our view, U.S. copyright law already has the tools to deal with many of the legal challenges that these programs might post. When generative AI outputs look too much like the copyrighted inputs they are trained on, the substantial similarity test can be used to assess claims of copyright infringement to vindicate an authors’ exclusive rights in their works when those outputs do infringe. 

In any case, in order for generative AI programs to be effective creative tools, it’s necessary that they are trained on large corpora. Narrowing the corpus of works the programs are trained on—through compulsory licensing or other mechanisms—can have disastrous effects. For example, research has shown that narrow data sets are more likely to produce racial and gender bias in AI outputs. In our view, the “input” step, where the programs are trained on a large corpus of works, is a fair use of these texts. And the holdings in Google Books and HathiTrust indicate that it is consistent with fair use to build large corpora of works, including works that remain protected by copyright, for applications such as computational research and information discovery. Additionally, the Copyright Office has recognized this principle in the context of research and scholarship, as demonstrated by its approval of Authors Alliance’s petition for an exemption from DMCA restrictions for text and data mining

The question of the copyright status of AI-generated works is an important one. Most if not all of the stakeholders participating in this discussion agreed with the Copyright Office’s recent guidance regarding registration in AI-generated works: under ordinary copyright principles, the lack of human authorship means these texts are not protected by copyright. This being said, we also recognize that there may be challenges in reconciling existing copyright principles with these new types of works and the questions about authorship, creativity, and market competition that they might pose. 

But importantly, while this technology is still in its early stages, it serves the core purposes of copyright—furthering the progress of science and the useful arts by incentivizing new creation—to allow these systems to develop and confront new legal challenges as they emerge. Copyright is not only about protecting the exclusive rights of copyright holders (a concern that underlies many arguments against generative AI as a fair use), but incentivizing creativity for the public benefit. The new forms of creation made possible through generative AI can incentivize people who would not otherwise create expressive works to do so, bringing more people into creative industries and adding new creative expression to the world to the benefit of the public.

The listening sessions were recorded, and will be available on the Copyright Office website in the coming weeks. And these listening sessions are only the beginning of the Office’s investigation of copyright in AI generated works. Other listening sessions on visual works, music, and audiovisual works will be held in the coming weeks, and the Office has indicated that there will be an opportunity for written public comments in order for stakeholders to weigh in further. We are committed to remaining involved in these cutting edge issues, through written comments and otherwise, and we will keep our readers informed as policy around generative AI continues to evolve. 

Authors Alliance Submits Comment to Copyright Office Regarding Ex Parte Communications

Posted April 4, 2023
Photo by erica steeves on Unsplash

Yesterday, Authors Alliance submitted a comment to the U.S. Copyright Office in response to a notice of proposed rulemaking asking for feedback from the public on new rules to govern ex parte communications. “Ex parte communications” refer to communications outside the normal, permitted channels of communication—in this case, to communications between organizations or members of the public and Copyright Office staff outside of hearings or other formal proceedings. Ex parte communications with the Copyright Office are important, because they allow stakeholders and the office to work out open questions in rulemakings or other proceedings outside of the formal channels. Authors Alliance relied on our ability to make ex parte communications during the last Section1201 rulemaking cycle (where we obtained our text data mining exemption) in order to clarify certain issues. Now, the Office is proposing establishing formal rules for how these communications can be made, as well as establishing transparency around them. We support this proposal, and shared our thoughts in a comment. You can read our full comment here.

Judge Rules Against Internet Archive on Controlled Digital Lending

Posted March 28, 2023
Photo by Wesley Tingey on Unsplash

On Friday, Southern District of New York Judge John Koeltl issued a much-anticipated decision in Hachette Books v. Internet Archive. Unfortunately, as many of our members and allies are aware, the judge ruled against the Internet Archive, finding that its CDL program was not protected by the doctrine of fair use and granting the publishers’ motion for summary judgment. You can read the 47-page decision for yourself here

In his fair use analysis, Judge Koeltl found that each of the four fair use factors weighed in favor of the publishers, emphasizing above all else his view that IA’s controlled digital lending program was not transformative, an important consideration under the first fair use factor, which considers the purpose and character of the use. This inquiry also involves asking whether the use in question was commercial. To the surprise of many, the decision stated that IA’s use of the publishers’ works was commercial, because the Open Library is part of the IA’s website, which it uses “to attract new members, solicit donations, and bolster its standing in the library community.” The judge found this to be the case in spite of the fact that IA “does not make a monetary profit” from CDL. In other words, the judge held that the indirect, attenuated benefits the Internet Archive (which is, after all, a nonprofit) reaps from operating the Open Library makes its CDL program commercial. 

Judge Koeltl gave less attention to the fourth factor in the fair use analysis, “the effect of the use on the potential market for the work,” which is often held up to be of significant importance. One consideration under this factor is whether the use creates a competing substitute with the original work. Unfortunately, on this point too, the court—in our view—missed the mark. This is because the decision does not draw a distinction between CDL scans and ebooks, going so far as to call CDL scans “ebooks” throughout. As we explained in our summary of the proceedings last week, many features of both CDL and ebooks make them both functionally and aesthetically distinct from one another. By glossing over these differences, the judge reached the conclusion that CDL scans are direct substitutes for licensed ebooks.

Authors Alliance is deeply concerned about the ramifications of this decision, which was exceedingly broad in scope, striking a tremendous blow to the CDL model, rather than only IA’s implementation of it. Local libraries across the country practice CDL, and library patrons and authors alike depend on it to read, research, and participate in academic discourse. 

As it stands, this decision only applies to Internet Archive and is only about the 127 books on which the publishers based their lawsuit. It does not set a binding precedent for any other library, but if left in place (or worse, if affirmed on appeal), it could cause libraries to avoid digitizing and lending books under a CDL model, which in our view would not serve the interests of many authors. This decision makes it harder for those authors to reach wide audiences: CDL enables many authors to reach more readers than they could otherwise, and authors like our members who write to be read would not be served if fewer readers could access their books. 

The decision also hampers efforts to preserve books—aside from IA’s scanning program, there are few if any centralized efforts to preserve books in digital format once their commercial life is over. Without CDL, those books could quite literally disappear, and the knowledge they advance could be lost. IA’s scanning operations do preserve such books, which is one reason we have strongly supported them in this lawsuit. By the same token, if this decision stands, it will also limit authors’ ability to conduct efficient research online. The CDL survey we launched last year revealed that CDL is an effective research tool for authors who need to consult other books as part of their writing process, and in many cases it enables them to access far more works than they could at their local library alone. Authors who rely on CDL in this way would be harmed by this decision, as they could well be forced to undergo a more time-consuming research process, detracting from time that could be spent writing. 

The Internet Archive has already indicated that it will be appealing Judge Koeltl’s ruling, and we look forward to supporting those efforts. We will continue to keep our readers and members apprised of updates as this case moves forward.

Judge Hears Oral Arguments in Hachette Book Group v. Internet Archive

Posted March 20, 2023
Photo by Timothy L Brock on Unsplash

Earlier today, Judge John Koeltl of the Southern District of New York heard oral arguments in Hachette Book Group v. Internet Archive—a case Authors Alliance has been following since the lawsuit was first filed back in 2020. The case is about—among other things—whether Internet Archive’s controlled digital lending program qualifies as a fair use. Authors Alliance submitted an amicus brief in support of the Internet Archive back in July, arguing that CDL serves the interests of authors who write to be read. IA’s attorney cited to our brief during oral argument, and we are pleased that we were able to magnify the voices of authors who write to be read through its submission. You can learn more about the case and read our brief here.

In the hearing, the judge considered each party’s motion for summary judgment. The parties hotly contested a number of key issues in the case, including whether each side’s experts had properly demonstrated market harm (or lackthereof), what the appropriate market to consider was for purposes of fair use analysis, the commerciality of IA’s use, and what legal cases supported both arguments in favor of and against fair use. Judge Koeltl asked the Internet Archive’s attorney a number of probing questions on these points, grappling with the difficult questions in this case. The judge further implied that there may be open issues of fact in this case, which could indicate the need for additional briefings or hearings. 

CDL and Commerciality

The parties disagreed on the commerciality of IA’s use when it produces and makes CDL scans available. The publishers attorney argued that IA’s CDL operations are “intertwined” with its other functions, such as its ownership of the book vendor Better World Books, and further emphasizing its argument that CDL loans result in lost revenue for the publisher—in other words, that the supposed commercial harm to the publishers that results from CDL lending makes the CDL lending itself commercial. The Internet Archive’s attorney answered that IA is a nonprofit organization that does not profit at all from its CDL program. He pointed to the fact that traditional library lending is not commercial in nature and does not provide libraries like IA with commercial benefits. 

CDL and Market Effects

The plaintiffs’ attorney began by setting forth plaintiffs’ views on the issue of market harm—the fourth factor in fair use analysis, often cited as one of the most important factors in the inquiry. Plaintiffs discussed what they see as massive financial harm stemming from IA’s CDL program, which they estimated to amount to “millions of dollars in licensing revenues.” Plaintiffs also emphasized that, were CDL “given the green light,” or upheld as a fair use, the plaintiffs would suffer even greater losses. Throughout her argument, plaintiffs’ attorney emphasized the “basic economic principle and common sense is that you cannot compete with free.” In other words, the publishers argue that the ebook library licensing market could collapse altogether if CDL were allowed to continue. Yet this misses the point that CDL is a longstanding and established practice, which has seen adoption and growth in libraries across the country while the ebook licensing market has continued to thrive. 

Judge Koeltl, however, pressed the publishers on whether they had shown evidence of actual market harm, i.e. proof that IA’s CDL program had directly harmed their bottom line. In response, plaintiffs criticized the expert evidence offered by IA’s experts to show that no such harm had occurred. This is a difficult question because the party asserting a fair use defense typically has the burden of showing that the use has not harmed the market, but it exceedingly difficult to prove a negative. 

The judge also questioned whether CDL actually could represent such a loss: the publishers’ argument rests on the premise that libraries loan out CDL scans in lieu of paying to license ebooks, and were CDL not permitted under the law, IA and other libraries would instead choose to pay licensing fees to lend out ebooks. The judge pointed out that the result might in fact be that libraries would choose not to lend digital copies of works out at all, or would instead lend out physical books, undercutting the lost licensing revenue argument. 

IA’s attorney argued that the publishers had not offered empirical evidence of market harm in this case, focusing on the fact that when a library lends out a CDL scan, it does so in lieu of a physical book, “simulating the limitations of physical books.” This is due to CDL’s “owned to loaned” ratio requirement: a library can only loan out the number of CDL scans as it has physical books in its collection, and can only loan these scans out to one patron at a time. When a library lends out a CDL scan, it does so in lieu of loaning the physical book, for which it has already paid. And while the plaintiffs mentioned harm to authors (who are, after all, the people that copyright law is intended to protect) several times during their argument, they did this in a way that linked authors with publishers as parties that are financially invested in a works’ sale—author interests and the finer details of the economics of author income and library lending were absent from the discussion. 

The parties also disagreed about which market was the appropriate one to look to when discussing market harm in the context of fair use analysis. The publishers argued, and the judge seemed to assume, that the proper market is the library ebook licensing market. The judge opined that libraries could, instead of using CDL to lend out their books, simply purchase an ebook license. He seemed to view CDL scans and licensed ebooks as one and the same, despite the fact that there are several key differences between these types of loans, both in form and function, as explained in other amicus briefs in the case. Moreover, missing from the argument was the fact that, in many cases, libraries loan out CDL scans because no ebook is available to them: particularly for older books in a publisher’s backlist, or for books that are no longer available commercially, there is in many cases no ebook available, or no ebook available to libraries. Library patrons with print or mobility disabilities in need of digital copies of these kinds of works in order to read them would be greatly harmed if CDL were no longer permitted. 

CDL and Transformativeness

The publishers’ attorney started from the premise that CDL as a use was not transformative, explaining that a licensed ebook and a CDL scan served precisely the same function. In response, IA’s attorney in response argued that CDL is a transformative use because it “utilizes technology to achieve the transformative purpose of improving efficiency of delivering content without unreasonably encroaching on the rights of the rightsholder.” He further explained that fair uses are favored when they serve the key purpose of copyright: incentivizing new creation for the public benefit without harming the interests of rightsholders. To illustrate these benefits, he cited to Authors Alliance’s amicus brief, in which we explained the myriad ways that CDL benefits authors and can even incentivize the creation of new works. 

Adding to its transformativeness argument, IA explained that, when it comes to speculative or actual market harm, such an effect must be balanced against the public benefit that results from the use. And when it comes to CDL, this public benefit is tremendous: numerous amici, as well as Authors Alliance, explained that CDL serves the interests of library patrons, authors, and the public writ large. 

What’s Next?

Now that the judge has heard both sides’ arguments, he will issue a decision in the case. While there is no way of knowing exactly when this will happen, Judge Koeltl is known for issuing decisions fairly quickly, so we may have a decision as soon as later this week. As always, we will keep our members and readers apprised of any developments in this pivotal case as it moves forward.

Copyright Office Issues Opinion Letter on Copyright in AI-Generated Images

Posted March 8, 2023
Photo by Michael Dziedzic on Unsplash

In late February, the Copyright Office issued a letter revoking a copyright registration it had previously granted artist Kristina Kashtanova for a comic that used images generated using Midjourney, a generative AI program that creates images in response to user prompts. While this may seem minor, or simply another data point in the ongoing fight about copyright protection for AI-generated works, the determination is quite significant: it comes at a moment when AI-generated art has captured public attention, and moreover shows the Copyright Office’s thoughts on the important question of whether an artist who relies on a program like Midjourney can obtain copyright protection for an original compilation of AI-generated works. In today’s post, we explain the Copyright Office letter, contextualize it within the growing debate over AI and copyright, and share our thoughts on what all of this might mean for authors who write to be read. 

Copyright and Human Authorship

As technology has advanced to allow the creation of works without the direct involvement of a human, courts have grappled with whether these creations are entitled to copyright protection. In the late 19th century, the Supreme Court established that copyright was intended to protect the products of human labors and creativity, creating the “human authorship” requirement. In an early case on the topic, the Court held that a photograph was copyrightable despite the fact that a camera literally created the image, since photographs were “representatives of original intellectual conceptions of the author.” It cautioned, however, that when it came to creations resulting from processes that were “merely mechanical,” lacking “novelty, invention, or originality” by a human author, such hypothetical works might be beyond the scope of copyright protection.

This principle was tested in the 2010s: in 2011, an Indonesian crested macaque monkey named Naruto seized a photographer’s camera and took hundreds of images of himself. The photographer, David Slater, shared some of these images online, which promptly went viral. Several websites posted these images as well, prompting Slater to assert that he owned the copyright in the images and request their removal. The Wikimedia Foundation, which had uploaded the image to Wikimedia Commons, a repository of public domain and free license content, argued that the image was a part of the public domain due to the lack of a human creator. Several years later, Slater published a book of nature photographs which included Naruto’s selfie. Then, in 2015, the People for the Ethical Treatment of Animals (PETA) filed a lawsuit in the Northern District of California on Naruto’s behalf, asserting that the macaque owned the copyright in the image and requesting damages. The district court judge held that Naruto could not own the copyright in the image due to copyright’s human authorship requirement. However, the judge did indicate that Congress might be free to do away with the human authorship requirement and permit copyright ownership by animals, suggesting that the requirement was not a constitutional one, but indicating that it was beyond the power of the judiciary to decide. The Ninth Circuit Court of Appeals later affirmed the district court’s ruling.

Currently, the Copyright Office is defending a lawsuit in the D.C. district court brought by AI system developer, Dr. Stephen Thaylor, regarding the constitutionality of copyright law’s human authorship requirement. Thaylor argues that the Copyright Act does not forbid treating AI systems as “authors” for the purpose of copyright law, and contends that the human authorship principle is unsupported by contemporary case law. While it seems unlikely that Thaylor will prevail on this argument, the case will at the very least generate new attention about the human authorship requirement and how it fits into creation in the digital age. 

The Creativity Requirement and Zarya of the Dawn

Kashtanova’s assertion of copyright ownership in her comic, Zarya of the Dawn, is in many ways similar to the photographer David Slater’s claim that he owned the copyright in Naruto’s selfie. In each case, the Copyright Office indicated that when a work is not the product of human authorship, a human may not claim copyright in that work (the latest compendium of Copyright Office practices lists “a photograph taken by a monkey” as an example of work that is not entitled to copyright protection since it does not meet the human authorship requirement). 

Kashtanova’s attorney had argued that Midjourney served “merely as an assistive tool,” and that Kashtanova should be considered the work’s author. But the Office likened Midjourney to a “merely mechanical process” lacking “novelty, invention, or originality” by a human creator, quoting the Supreme Court’s warning about the limits of copyright protection in the 19th century case discussed earlier in this post. And it was not only the human authorship requirement that made Zarya of the Dawn beyond the scope of copyright protection, but also copyright’s creativity requirement: for a work to be copyrightable, it must possess at least a “modicum” of creativity, a very low bar that rarely forecloses copyright protection for works of human authorship. 

The Office explained that Midjourney generates images in response to user prompts, “text commands entered in one of Midjourney’s channels.” But these are not “specific instructions” for generating an image, rather input data that Midjourney compares to its training data before generating an image. The Office also argued that these images lack human authorship because the process is “unpredictable” and “not controlled by the user.” In other words, the “creativity” in these images comes not from the human entering prompts, but from the interaction between the prompt and Midjourney’s training data. This makes it different from a tool like a camera over which a user exercises total control—there is little to no unpredictability when we use digital cameras to photograph the world around us, rather all creative choices come from the human using the device. 

The Office also noted that this opinion was not necessarily the final world on AI-generated images, as “other [generative] AI offerings” might operate differently, such that the creativity and human authorship requirements could be met. Kashtanova argued that minor edits she had made to the images were sufficiently creative to give her copyright ownership in the work as a whole. While the Office disagreed in this specific case (the before and after images demonstrating the editing were nearly identical), it did leave this possibility intact for future cases. Moreover, the Office granted Kashtanova ownership in the comic’s text, which she alone had written, as well as copyright ownership in the compilation of Midjourney-generated images. Compilations of uncopyrightable subject matter can sometimes be protected by copyright, because both the human authorship and creativity requirements are met when a human selects and arranges the material. The copyright owner does not own a copyright in the material itself, but in the original compilation they have created.

What Does this Mean for Authors?

The Copyright Office’s denial of registration in the Midjourney-generated images has important implications for the public domain and authors’ abilities to use new forms of technology as assistive tools in the creation of their works. But the Office’s action also leaves some open questions about the copyright status of images generated by Midjourney and similar systems. One possibility is—as was asserted by Wikimedia in the case of Naruto’s selfie—these images are a part of the public domain. Were that to be the case, it could be a boon for artists and creators. Recall that once a work is in the public domain, it becomes free for all to use without fear of copyright infringement. The case of the monkey selfie is further instructive here, as the owner of the camera in that case did not prevail on claiming his own copyright in Naruto’s selfie. By the same token, it is unlikely that the creators of Midjourney could claim a copyright in images like those used by Kashtova, despite their role in creating and making available the “assistive tool.” 

If AI systems could be used to generate infinite public domain content—whether through text-based systems like ChatGPT or image-generating systems like Midjourney—this would greatly expand public domain content. The public domain can be a boon for creators, as they are free to do anything they wish with this material. On the other hand, some have expressed fear that, should all AI-produced works be considered a part of the public domain, these public domain works could compete with works produced by human authors. It is also important to remember the practical economic realities of systems like Midjourney. Whether or not the Copyright Office and other policymakers determine that AI-generated content is a part of the public domain, the creators of those systems could employ other means to assert ownership or forbid onward uses of the content created by these systems. Contractual override, the employment of so-called “digital locks” like DRM, or other legal and technical mechanisms could conceivably limit authors’ ability to use AI-generated works the way they might use more traditional public domain materials. 

The First Copyright Small Claims Court Judgment

Posted March 6, 2023

Authors Alliance members will recall the posts we’ve made over the years about the enactment and implementation of a new copyright small claims court, the “Copyright Claims Board,”  housed within the U.S. Copyright Office. 

Late last week, the CCB issued its very first judgment. It came in a case brought by photographer David Oppenheimer against an California attorney, David Prutton, who had used an unlicensed copy of one of Oppenheimer’s photos (a picture of the federal courthouse in Oakland) on his solo-practitioner website (h/t to Plagiarism Today, where we first saw reporting about the case, here). 

Screenshot of Prutton’s website, showing use of Oppenheimer’s photo of the Federal Courthouse in Oakland (twin buildings on the right).

The case had a head start because it was originally filed in federal district court, where the parties voluntarily agreed to dismiss the federal case and have the case referred to the CCB. You can read the entire history, including all the filings, here. The CCB ruled in favor of Oppenheimer, and awarded the photographer an award of statutory damages of $1,000, significantly less than the $30,000 (the maximum amount available to claimants in CCB proceedings) that Oppenheimer originally sought. 

In many ways, this was a pretty easy case for the CCB. Prutton readily admitted that he had used Oppenheimer’s unlicensed photo, in whole, on his website. Though Prutton raised a fair use defense, he didn’t bother to argue any except one of the four fair use factors. Prutton’s sole contention was that the impact on the market was so minimal—and that Oppenheimer had shown no evidence of harm—that Prutton should win on the fourth fair use factor. 

The CCB, noting that the fair use factors need to be balanced and weighed together, did its own analysis of all the fair use factors but concluded—rightly, I think—that for the other three fair use factors: 

  • Prutton’s use was not particularly transformative or for a new purpose, weighing against the use;
  • Oppenheimer’s original photo was creative (certainly enough for copyright protection, though reasonable minds might disagree on the extent of the creativity and therefore how strong this factor should weigh in its favor), weighing against the use;
  • Prutton has used the whole work, not a small portion of it, weighing against the use.

For the fourth fair use factor, Prutton argued that because Oppenheimer showed essentially no history of licensing revenue from this photograph, along with a history of other litigation that tended to indicate that Oppenheimer’s business was primarily oriented toward generating revenue through litigation, there was no meaningful market harm. The CCB disagreed, essentially concluding that it was Prutton’s job to show a lack of market harm (which they said he did not do), and the burden did not rest on Oppenheimer to show evidence of a market.  However, because Oppenheimer didn’t show any actual evidence of financial harm, this also led the CCB when assessing damages to grant an award far below Oppenheimer’s request—his original demand of $30,000 in damages was reduced to just $1,000.

Where the case was a little more interesting was how the CCB addressed Prutton’s defense of “unclean hands,” in which he essentially asks the CCB to excuse his use because Oppenheimer had acted improperly. If you do a quick search for “David Oppenheimer” and “copyright” you will find that Oppenheimer is frequently in court over alleged infringement of rights in his photographs, with fact patterns very similar to the one in this case, including heavy-handed negotiation tactics and aggressive use of litigation. In several of those cases, such as this case in the Western District of North Carolina, courts refused to grant Oppenheimer easy wins—concluding that Oppenheimer’s litigation tactics could reasonably be viewed as so problematic as to block his assertion of rights by the defense of “copyright misuse.” 

The CCB dismissed Prutton’s “unclean hands” defense by highlighting how unusual and extreme a plaintiff’s conduct has to be to fall subject to that general defense. The CCB didn’t, however, really assess Prutton’s more substantial “copyright misuse” defense, perhaps because Prutton didn’t raise it as a separate defense. In my view, copyright misuse may well have been a valid defense in this case. 

As the Western District of North Carolina explained in a previous case brought by Oppenheimer,  “misuse of copyright is a valid affirmative defense where the use of a copyright is contrary to the public policy upon which copyrights are granted. . . . Typically, the defense applies when seeking to avoid anti-competitive behavior, but it can also apply to other scenarios where a copyright owner attempts to extend the copyrights beyond their intended reach. . . . The underlying policy principles behind copyrights extend from the United States Constitution, with the relevant policy here being to promote the ‘useful arts.’” The court in that case concluded that if Oppenheimer’s “purpose in copyrighting the Copyrighted Work was to license it for use when individuals or companies need [his photo] then Plaintiff is likely not misusing his copyrights. Yet, a reasonable jury could find Plaintiff is using copyrights to derive an income from infringement suits and this issue is one of fact that the Court should not decide.” 

Lessons Learned

As this is the very first decision of the CCB, I don’t think we should draw sweeping conclusions from it about how the CCB will do its work. But it is interesting to see that this first case wasn’t exactly a suit between legal amateurs—Oppenheimer is a seasoned litigant who has brought many copyright cases, and Prutton is an attorney (albeit not one who specializes in copyright). Both made significant missteps in the presentation of their cases. And so, one observation I think we can make is that while the copyright small claims system is meant to have low barriers to participation, and the CCB seems inclined to go to extra lengths to help parties understand the process and present cogent filings, the CCB is not going to excuse incomplete argumentation. At least in this case, the CCB refused to assume facts or arguments not presented by the parties. That was true both for the plaintiff and defendant: plaintiffs who make damage assertions are going to need to show evidence of actual harm in order to get awards close to their requested amounts. And defendants who raise defenses will need to fully argue them; glossing over three of the four fair use factors is not a winning strategy. Nor does it seem passing references to defenses such as “unclean hands” and “copyright misuse” will work without adequate support. 

Book Talk: Athena Unbound by Peter Baldwin, Moderated by Chris Bourg

Posted March 3, 2023

“In Athena Unbound, Peter Baldwin offers an admirably pragmatic yet principled approach to the perennial problem of encouraging both the production and distribution of knowledge.” – Paul Romer, Nobel Laureate and University Professor, NYU

Book Talk: Athena Unbound
March 28 @ 10am PT / 1pm ET
Register now for the virtual event

Read or purchase Athena Unbound from MIT Press. (Pub date: March 28, 2023)

Open access (OA) could one day put the sum of human knowledge at our fingertips. But the goal of allowing everyone to read everything faces fierce resistance. In Athena Unbound, Peter Baldwin offers an up-to-date look at the ideals and history behind OA, and unpacks the controversies that arise when the dream of limitless information slams into entrenched interests in favor of the status quo. In addition to providing a clear analysis of the debates, Baldwin focuses on thorny issues such as copyright and ways to pay for “free” knowledge. He also provides a roadmap that would make OA economically viable and, as a result, advance one of humanity’s age-old ambitions.

Baldwin addresses the arguments in terms of disseminating scientific research, the history of intellectual property and copyright, and the development of the university and research establishment. As he notes, the hard sciences have already created a funding model that increasingly provides open access, but at the cost of crowding out the humanities. Baldwin proposes a new system that would shift costs from consumers to producers and free scholarly knowledge from the paywalls and institutional barriers that keep it from much of the world.

REGISTER NOW

Rich in detail and free of jargon, Athena Unbound is an essential primer on the state of the global open access movement.

About our speakers

PETER BALDWIN is Professor of History at UCLA, and Global Distinguished Professor at NYU. His recent books are Command and Persuade: Crime, Law, and the State across History (MIT Press); Fighting the First Wave: Why the Coronavirus Was Tackled So Differently across the Globe; and The Copyright Wars: Three Centuries of Trans-Atlantic Battle. He serves on the boards of the New York Public Library, the American Council of Learned Societies, the Wikimedia Endowment, the Central European University, the Danish Institute of Advanced Studies, and as chair of the Board of the Center for Jewish History. His journalistic writings have appeared in the New York Times, the Los Angeles Times, CNN, Newsweek, New Republic, Huffington Post, Der Spiegel, Berliner Zeitung, Publishers Weekly, American Interest, Chronicle of Higher Education, Prospect, American Interest, and Zocalo Public Square.

CHRIS BOURG is the Director of Libraries at Massachusetts Institute of Technology (MIT), where she also has oversight of the MIT Press. She is also the founding director of the Center for Research on Equitable and Open Scholarship (CREOS). Prior to assuming her role at MIT, Chris worked for 12 years in the Stanford University Libraries. Before Stanford, she spent 10 years as an active-duty U.S. Army officer, including three years on the faculty at the United States Military Academy at West Point. She received her BA from Duke University, her MA from the University of Maryland, and her MA and Ph.D. in sociology from Stanford.

Book Talk: Athena Unbound
March 28 @ 10am PT / 1pm ET
Register now for the virtual event

Jack Daniels v. VIP Products and the Freedom to Parody and Comment in the United States

Posted March 2, 2023

This post was written for the Kluwer Copyright Blog, and is based in part on an amicus brief filed last week by the Harvard Cyberlaw Clinic on behalf of Authors Alliance and ComicMix before the United States Supreme Court in Jack Daniels v. VIP Products.

Ordinarily, authors who write parodies look to copyright limitations and exceptions to protect their rights. In the United States, the doctrine of fair use has been held to permit parody in uses ranging from rap music to children’s books. These fair use rights, the courts have said, have their roots in the U.S. Constitution’s First Amendment protections for freedom of speech.

In a recent case before the U.S. Supreme Court, Jack Daniels v. VIP Products, those parody rights are at risk. In a twist, however, it is not copyright law, but rather an expansive view of trademark law, that poses this threat.

The facts of this case are straightforward: Jack Daniels, creator of the famous Tennessee Whiskey,  brought the trademark suit to stop VIP Products for production of a dog toy, which it titled “Bad Spaniels,” in the shape of Jack Daniels’ iconic whiskey bottle and label.  Jack Daniels asserts that the Bad Spaniels toy infringes on its trademark and dilutes its brand. VIP Products counters that the toy is meant to parody Jack Daniels’ bottle and is protected speech under the U.S. Constitution’s First Amendment.

Jack Daniel’s Whiskey Bottle (left) and VIP Products’ “Bad Spaniels” dog toy (right). From Jack Daniels Properties, Inc. s v. VIP Products, LLC, Case No. 22-148, U.S. Supreme Court, Brief for Petitioner (11 January 2023), page 3, available here.

Although dog toys and whiskey bottles seem relatively inconsequential to literature, parody, and creative work, this case could have a dramatic impact on how authors write about, and parody, famous brands.

Trademarks are a cornerstone of our shared cultural vernacular. Popular brands are woven into the fabric of our national identity, recognizable by and meaningful to those from many different backgrounds. Authors often draw on these shared associations in their literary works, sending beloved fictional characters to real colleges, serving them familiar cereals, and outfitting them in well-known clothing labels. Whether to evoke nostalgia or to immerse their readers, authors use trademarks both to simulate reality and to critique it.

While trademark law aims to protect consumers and prevent confusion as to the source of goods or services, it must be enforced in a manner consistent with the speech protections guaranteed by the First Amendment of the U.S. Constitution. The freedom of authors to use trademarks in their works could be stifled by the threat of litigation. Overenforcement of trademark law runs contrary to both the purpose of intellectual property law and the U.S. constitutional legacy of protecting free expression. Protections for parody in other areas of the law, such as copyright’s fair use doctrine, will be undermined by a trademark ruling that allows for expansive enforcement.

If heightened First Amendment protections are not put in place, the threat of costly legal proceedings may cause creators to avoid the use of trademarks in their artistic works. While trademark law does have other mechanisms to protect authors of parody and commentary, such as a showing that an author’s use does not pose a likelihood of confusion, the process for successfully defending a trademark infringement case is remarkably expensive. In 2020, the American Intellectual Property Law Association reported that the median cost of trademark litigation in the U.S. before even going to trial ranged from $150,000 to $588,000. In the American system, litigants ordinarily bear their own costs, and so even an author who successfully defends such a suit would be on the hook for a large amount in legal fees. While litigation is commonplace for large corporations with significant legal resources, even a single lawsuit could be career-ending for an author without the resources to handle it.

If the threat of legal sanction hangs over the heads of writers, their literary characters may no longer use iPhones, eat at McDonald’s, or visit Disneyland. These uses offer meaningful expressive value to authors. Brands are often intentionally selected as cultural signifiers, chosen for the implicit associations they convey to readers. Cory Doctorow’s Down and Out in the Magic Kingdom (a Disney theme park) would have a different meaning if it were instead titled Down and Out in an Amusement Park. Nor is The Devil Wears Luxury Clothing as evocative as The Devil Wears Prada.

Even when trademarks are evoked in literary circumstances that their owners find distasteful, these uses are still expressive and noncommercial, thus worthy of the highest First Amendment protection. Prioritizing the pecuniary interests of trademark owners over the First Amendment rights of creative artists could lead to a catastrophic chilling effect on authors’ speech based on the perceived risk of litigation, whether or not such risk is actualized. This result is both untenable and entirely unnecessary. It is possible to ensure that trademark owners still have access to a wide variety of robust and reasonable remedies in cases of true infringement without creating unnecessary panic in many other circumstances.

The Supreme Court has a clear doctrinal path to avoiding a speech-suppressive environment. In Rogers v. Grimaldi, 875 F.2d 994 (2d Cir. 1989), the Second Circuit Court of Appeals struck a balance between the interests of trademark owners and First Amendment speech by crafting a clear and efficient test for infringement with appropriate protections for speech. The Rogers court recognized the mark owner’s interest in preventing confusion while ensuring adequate protection for the vital free speech principles at play, and provided a rule to determine at the outset of litigation–before incurring substantial costs–when expressive works infringe trademark rights. Rogers, in short, provided that in cases of artistic or creative works, trademark infringement should only be considered “where the public interest in avoiding consumer confusion outweighs the public interest in free expression.” Ordinarily, the court explained, this rule “will normally not support [the] application of [trademark law] unless the title has no artistic relevance to the underlying work whatsoever, or, if it has some artistic relevance, unless the title explicitly misleads as to the source or the content of the work.”

A ruling that substantially adopts a test like that in Rogers would continue to protect the rights of trademark owners, while also ensuring that authors who reference popular brands are protected by providing a clear, consistent and efficient rule to protect authors. A ruling in favor of Jack Daniels, however, could strike fear into the hearts of risk-averse creators, chilling their speech by discouraging them from using certain trademarks in their works altogether. It would undermine the otherwise strong protections that U.S. courts have identified for parodists and other authors in U.S. copyright law, under the doctrine of fair use.

You can read more about our views on the interaction between trademark law and authors’ free expression rights in our amicus brief filed in Jack Daniels v. VIP Products, available here.

Fair Use Week 2023: Looking Back at Google Books Eight Years Later

Posted February 24, 2023
Photo by Patrick Tomasso on Unsplash

This post is authored by Authors Alliance Senior Staff Attorney, Rachel Brooke. 

More recent members and readers may not be aware that Authors Alliance was founded in the wake of Authors Guild v. Google,  a class action fair use case in the Second Circuit that was litigated for nearly a decade, and finally resolved in favor of Google in 2015. The case concerned the Google Books project—an initiative launched by Google whereby the company partnered with university libraries to scan books in their collections. These scans would ultimately be made available as a full-text searchable database for the public to search through for particular terms, with short “snippets” displayed accompanying the search results. Users could not, however, view or read the scanned books in their entirety. The Authors Guild, along with several authors, filed a lawsuit against Google alleging that scanning the books and displaying these snippets constituted copyright infringement.

In addition to Authors Guild representing its members in the litigation, its associated plaintiffs brought the case as a class action, claiming to bring the case on behalf of a broad group of authors:  “[a]ll persons residing in the United States who hold a United States copyright interest in one or more Books reproduced by Google as part of its Library Project” who were either authors or the authors’ heirs.

But many of these authors did not agree with the Authors Guild’s stance in the case, and felt that the Google Books project served their interests in sharing knowledge, seeing their creations be preserved, and reaching readers interested in their work. A group of authors and scholars came together to share their views with the district court, many of whom would soon become founding members of Authors Alliance. Many of those same authors signed on to amicus briefs before both the district court and Second Circuit explaining why they opposed the litigation and supported Google’s fair use defense. Then, in 2014, Authors Alliance submitted its first amicus brief to the Second Circuit, supporting Google’s ultimately successful fair use defense. The plaintiffs later appealed the Second Circuit’s ruling, asking the Supreme Court to weigh in, but the Court ultimately declined to hear the case, leaving the Second Circuit’s ruling intact. 

Nearly a decade later, the effects of Google Books can still be seen in fair use decisions and copyright policy developments involving the challenges of adapting copyright to the digital world. In today’s post, I’ll reflect on how Google Books can be contextualized within today’s fair use landscape and share my thoughts on what the case can tell us about copyright in the digital world. 

Google Books and Transformativeness

A major question in Authors Guild v. Google was whether Google’s use of the copyrighted works was “transformative,” a key component of the fair use inquiry. When a use is found to be transformative, this in practice weighs heavily in favor of a finding of fair use. In the case, the court found that Google’s scanning, as well as the search and snippet display functions, were transformative because the service “augments public knowledge by making available information about [the] books without providing the public with a substantial substitute for . . . the original works.” This was because Google Books provided information about the books—such as the author and publisher information—without creating substitutes of the original works. In other words, readers could learn about the books they searched through, but could not read the books in full—to do this, those readers would have to purchase or borrow copies through the normal channels. 

Since the doctrine of transformativeness was established in the 1994 landmark Supreme Court case, Campbell v. Acuff-Rose Music, there have been myriad questions about the precise contours of what it means for a use to be transformative. Campbell established that a use is transformative when it endows the secondary work with a “new meaning or message,” but it can be difficult to apply this test in practice, particularly in the context of new or nascent technologies. Google Books tells us that scanning works in order to create a full-text searchable database with limited snippet displays is a transformative use based on its new and different purpose from the purpose of the works themselves. Furthermore, it reinforces the notion that a use is particularly likely to be considered transformative when it serves the underlying purpose of copyright law: incentivizing new creation for the benefit of the public and “enriching public knowledge.” By highlighting that Google contributed to public knowledge about books through its scanning activities and the Google Books search function, the court helped bring fair use for scholarship and research—two key prototypical uses established in the 1976 Copyright Act—into the digital age, setting an important precedent for later cases. 

Google Books and Derivative Works

One of the plaintiffs’ arguments in Google Books was that Google’s full-text searchable database constituted a derivative work. One of a copyright holder’s exclusive rights is the right to prepare derivative works—such as adaptations, abridgements, or translations of the original work—and the plaintiffs alleged that this right had been infringed. The court disagreed, finding that Google’s use had a transformative purpose, whereas derivative works tend to involve a transformation in form, such as the adaptation of a novel into a movie or an audiobook. Furthermore, the court explained that derivative works are “those that re-present the protected aspects of the original work, i.e., its expressive content, converted into an altered form[.]” In contrast, the Google Books project provided information about the books and offered a limited “snippet” view, but did not re-present the expressive content: the full text of the books themselves.

The distinction the court drew between transformative fair uses and derivative works in Google Books is an important one, as it can often be a close question whether a work involves a transformative purpose or merely represents the same work in a new form, without enough added to tip the scales towards fair use. And it is a question that continues to arise in fair use cases today: just last year, the Supreme Court agreed to hear Warhol Foundation v. Goldsmith, a case about whether Andy Warhol’s creation of a series of screenprints of the late musical artist Prince which drew from a photograph taken by photographer Lynn Goldsmith qualified as a fair use. We’ve covered this case extensively on our blog over the past few years, and submitted an amicus brief in the case. Our brief argues (among other things) that Warhol’s screen prints involve much more than a transformation in form: they are stylistically and visually distinct from Goldsmith’s photograph, and endow the photograph with a new meaning or message, making the use highly transformative. 

As in Google Books, the parties and amici in Goldsmith grapple with the line between transformative uses and the creation of derivative works, an often complicated and fact-sensitive determination. In this context, Google Books serves as a reminder that fair use is not a one-size-fits-all determination. Yet it also provides support for arguments advanced by Authors Alliance and others that simply because a transformation in form exists—in the Google Books case, the transformation from a print book to a scanned copy, and in Goldsmith, the transformation of a black and white photo to a series of colorful screenprints—does not mean that a secondary use cannot be a fair one. Warhol’s use did not merely “re-present the protected aspects of the original work[‘s] . . . expressive content,” but was transformative in the different “purpose, character, expression, meaning, and message” it conveyed.

Google Books and Controlled Digital Lending

The practice of controlled digital lending (“CDL”)—and the arguments in favor of it constituting a fair use—can be traced back in part to the fair use principles established and reinforced in Google Books. As I argue in our amicus brief in Hachette Books v. Internet Archive, a case about—among other things—whether CDL constitutes a fair use, Google Books shows that copying the entirety of a work in the process of making a transformative use of it can be fully consistent with fair use. 

Another important suggestion in the Google Books case, made at the district court level, was that the Google Books search function could actually drive book sales: the search results were accompanied by links to purchase the book, and research suggested that this could enhance sales of those books. This is analogous to the effects of library lending: library readers often purchase books by authors they first discovered at the library, an effect which can apply with equal force when the library patron borrows a CDL scan. Indeed, several other amici in Hachette Books argue that the finding that the Google Books search was a fair use lent substantial support for the argument that CDL is a fair use, based on both the factual similarities between the two initiatives and their shared objective of “enriching public knowledge.” 

As in Google Books, CDL also helps authors reach readers who could not otherwise access their books, and achieves this through scanning books on library shelves. And also like Google Books, CDL helps solve the problem of 20th century works “disappearing”: the commercial life of a book tends to be much shorter than the term of copyright, so when books under copyright go out of print, they can disappear into obscurity. Scanning these books to preserve them ensures that the knowledge they advance will not be lost. 

Google Books and Text Data Mining

Text data mining—the process of using automated techniques aimed at quantitatively analyzing text and other data—is also widely considered to be a fair use, and this determination is similarly built in part on the building blocks established in Google Books. As was the case in Google Books, the results of text data mining research provide information about the works being studied, and cannot in any way serve as substitutes for the content of the works. In fact, one important aspect of the new exemption to DMCA liability for text data mining, which Authors Alliance successfully petitioned for in 2021, is that researchers are not able to use the works in the text data mining corpus for consumptive purposes. And also like Google Books, researchers are able to view the content in a limited manner to verify their findings, analogous to Google Books’s snippet view. The new TDM exemption was a huge win for Authors Alliance members, and something to celebrate for all scholars engaged in this important research. Importantly, the precedent established by Google Books strongly supported its adoption and the Register of Copyright’s suggestion that text data mining was likely to be a fair use

Looking Forward: Google Books and Artificial Intelligence

In recent years, scholars and researchers have grappled with the implications of copyright protection on AI-generated content and AI models more generally. The holding in Google Books provides some support for companies’ and researchers’ ability to engage in these activities: one important factor in the case was that Google Books did not harm the market for the books at issue in the case, since the books in the database could not serve as substitutes for the books themselves. Similarly, when copyrighted works are used to train AI, the output cannot serve as a substitute for the copyrighted works, and the market for those works is not harmed, even if—like the plaintiffs in Google Books—the copyright holders might prefer that their works not be used in this way. Google Books establishes that simply because copyrighted works are used as “input” in a given model, this does not mean that the outputs constitute infringement. It is also worth noting that the court found Google’s use to be fair despite the fact that it was a use by a commercial, profit-seeking entity. While a commercial use can sometimes tip the scales in favor of finding a use to not be fair, this can be overcome by a socially beneficial, transformative purpose. This could arguably apply with equal force to AI models trained on copyrighted works which contribute to our understanding of the world, despite the fact that commercial entities are often the ones deploying these technologies. 

Eight years after it was decided, the legacy of Google Books endures in policy debates and copyright lawsuits that capture the public’s attention. Policymakers and judges would be wise to heed the lessons it teaches about the value of advancing public knowledge through digitization and the use of copyrighted works for new and socially beneficial purposes. As we await policy developments regarding text data mining and wait for decisions in Goldsmith and Hachette Books, it is my hope that this legacy will live on, reminding us all of the vast capabilities of information technology to enrich our understanding of the world and advance the progress of knowledge, which, after all, is what copyright law is all about.