some district courts have applied DMCA 1202(b) to physical copies, including textile, which means if you cut off parts of a fabric that contain copyright information, you could be liable for up to $25,000 in damages
The US Copyright Act has never been praised for its clarity or its intuitive simplicity—at a whopping 460 pages long, it is filled with hotly debated ambiguities and overly complex provisions. The copyright laws of most other jurisdictions aren’t much better.
Because of this complexity of copyright law, the implications of changes to copyright law and policy are not always clear to most authors. As we’ve said in the past, many of these issues seem arcane, and largely escape public attention. Yet entities with a vested interest in maximalist copyright—often at odds with the public interest—are certainly paying attention, and often claim to speak for all authors when they in fact represent only a small subset. As part of our efforts to advocate for a future where copyright law offers ample clarity, certainty, and real focus on values such as the advancement of knowledge and free expression, we would like to share with you two recent projects we undertook:
The 1202 Issue Brief and Amicus Brief in Doe v. Github
Authors Alliance has been closely monitoring the impact of Digital Millennium Copyright Act (DMCA) Section 1202. As we have explained in a previous post, Section 1202(b) creates liability for those who remove or alter copyright management information (CMI) or distribute works with removed CMI. This provision, originally intended to prevent wide-spread piracy, has been increasingly invoked in AI copyright lawsuits, raising significant concerns for lawful use of copyrighted materials beyond training AI. While on its face, penalties for removing CMI might seem somewhat reasonable, the scope of CMI (including a wide variety of information such as website terms of service, affiliate links, and other information) combined with the challenge of including it with all downstream distribution of incomplete copies (imagine if you had to replicate and distribute something like the Amazon Kindle terms of service every time you quoted text from an ebook) could be potentially very disruptive for many users.
In order to address the confusion regarding the (somewhat inaptly named) “identicality requirement” by the courts in the 9th Circuit, we have released an issue brief, as well undertaken to file an amicus brief in the Doe v. Github case now pending in the 9th Circuit.
Here are the key reasons why we care—and why you should care—about this seemingly obscure issue:
The Precedential Nature of Doe v. Github: The upcoming 9th Circuit case, Doe v. GitHub, will address whether Section 1202(b) should only apply when copies made or distributed are identical (or nearly identical) to the original. Lower courts have upheld this identicality requirement to prevent overbroad applications of the law, and the appellate ruling may set a crucial precedent for AI and fair use.
Potential Impact on Otherwise Legal Uses: It is not entirely certain if fair use is a defense to 1202(b) claims. If the identicality requirement is removed, Section 1202(b) could create liability for transformative fair uses, snippet reuse, text and data mining, and other lawful applications. This would introduce uncertainty for authors, researchers, and educators who rely on copyrighted materials in limited, legal ways. We advocate for maintaining the identicality requirement and clarifying that fair use applies as a defense to Section 1202 claims.
Possibility of Frivolous Litigation: Section 1202(b) claims have surged in recent years, particularly in AI-related lawsuits. The statute’s vague language and broad applicability have raised fears that opportunistic litigants could use it to chill innovation, scholarship, and creative expression.
To find out more about what’s at stake, please take a look at our 1202(b) Issue Brief. You are also invited to share your stories with us, on how you have navigated this strange statute.
Reply to the UK Open Consultation on Copyright and AI
We have members in the UK, and many of our US-based members publish in the UK. We have been watching the development in UK copyright law closely, and have recently filed a comment to the UK Open Consultation on Copyright and AI. In our comment, we emphasized the importance of ensuring that copyright policy serves the public interest. Our response’s key points include:
Competition Concerns: We alerted the policy-makers that their top objective must include preventing monopolies forming in the AI space. If licensing for AI training becomes the norm, we foresee power consolidating in a handful of tech companies and their unbridled monopoly permeating all aspects of our lives within a few decades—if not sooner.
Fair Use as a Guiding Principle: We strongly believe that the use of works in the training and development of AI models constitutes fair use under US law. While this issue is currently being tested in courts, case law suggests that fair use will prevail, ensuring that AI training on copyrighted works remains permissible. The UK does not have an identical fair use statute, but has recognized that some of its functions—such as flexibility to permit new technological uses—are valuable. We argue that the wise approach is for the UK to update its laws to ensure its creative and tech sectors can meaningfully participate in the global arena. Our comment called for a broad AI and TDM exception allowing temporary copies of copyrighted works for AI training. We emphasized that when AI models extract uncopyrightable elements, such as facts and ideas, this should remain lawful and protected.
Noncommercial Research Should Be Protected: We strongly advocated for the protection of noncommercial AI research, arguing that academic institutions and their researchers should not face legal barriers when using copyrighted works to train AI models for research purposes. Imposing additional licensing requirements would place undue burdens on academic institutions, which already pay significant fees to access research materials.
On February 11, Third Circuit Judge Stephanos Bibas (sitting by designation for the U.S. District Court of Delaware) issued a new summary judgment ruling in Thomson Reuters v. ROSS Intelligence. He overruled his previous decision from 2023 which held that a jury must decide the fair use question. The decision was one of the first to address fair use in the context of AI, though the facts of this case differ significantly from the many other pending AI copyright suits.
This ruling focuses on copyright infringement claims brought by Thomson Reuters (TR), the owner of Westlaw, a major legal research platform, against ROSS Intelligence. TR alleged that ROSS improperly used Westlaw’s headnotes and the Key Number System to train its AI system to better match legal questions with relevant case law.
Westlaw’s headnotes summarize legal principles extracted from judicial opinions. (Note: Judicial opinions are not copyrightable in the US.) The Key Number System is a numerical taxonomy categorizing legal topics and cases. Clicking on a headnote takes users to the corresponding passage in the judicial text. Clicking on the key number associated with a headnote takes users to a list of cases that make the same legal point.
Importantly, ROSS did not directly ingest the headnotes and the Key Number System to train its model. Instead, ROSS hired LegalEase, a company that provides legal research and writing services, to create training data based on the headnotes and the Key Number System. LegalEase created Bulk Memos—a collection of legal questions paired with four to six possible answers. LegalEase instructed lawyers to use Westlaw headnotes as a reference to formulate the questions in Bulk Memos. LegalEase instructed the lawyers not to copy the headnotes directly.
ROSS attempted to license the necessary content directly from TR, but TR refused to grant a license because it thought the AI tool contemplated by ROSS would compete with Westlaw.
The court found that ROSS copied 2,243 headnotes from Westlaw. The court ruled that these headnotes and the Key Number System met the low legal threshold for originality and were copyrightable. The court rejected the merger and scenes à faire defense by ROSS, because, according to the court, the headnotes and the Key Number System were not dictated by necessity. The court also rejected ROSS’s fair use defense on the grounds that the 1st and 4th factors weighed in favor of TR. At this point, the only remaining issue for trial is whether some headnotes’ copyrights had expired or were untimely registered.
The new ruling has drawn mixed reactions—some saying it undermines potential fair use defenses in other AI cases, while others dismiss its significance since its facts are unique. In our view, the opinion is poorly reasoned and disregards well-established case law. Future AI cases must demonstrate why the ROSS Court’s approach is unpersuasive. Here are three key flaws we see in the ruling.
Problems with the Opinion
Near-Verbatim Summaries are “Original”?
“A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. … A headnote is a short, key point of law chiseled out of a lengthy judicial opinion.”
— the ROSS court
(↑example of a headnote and the uncopyrightable judicial text the headnote was based on↑)
The court claims that the Westlaw headnotes are original both individually and as a compilation, and the Key Number System is original and protected as a compilation.
“Original” has a special meaning in US copyright law: It means that a work has a modicum of humancreativity that our society would want to protect and encourage. Based on the evidence that survived redaction, it is near impossible to find creativity in any individual headnotes. The headnotes consist of verbatim copying of uncopyrightable judicial texts, along with some basic paraphrasing of facts.
As we know, facts are not copyrightable, but expressions of facts often are. One important safeguard for protecting our freedom to reference facts is the merger doctrine. US law has long recognized that when there are only limited ways to express a fact or an idea, those expressions are not considered “original.” The expressions “merge” with the underlying unprotectable fact, and become unprotectable themselves.
Judge Bibas gets merger wrong—he claims merger does not apply here because “there are many ways to express points of law from judicial opinions.” This view misunderstands the merger doctrine. It is the nature of human language to be capable of conveying the same thing in many different ways, as long as you are willing to do some verbal acrobatics. But when there are only a limited number of reasonable, natural ways to express a fact or idea—especially when textual precision and terms of art are used to convey complex ideas—merger applies.
There are many good reasons for this to be the law. For one, this is how we avoid giving copyright protection to concise expression of ideas. Fundamentally, we do not need to use copyright to incentivize the simple restatement of facts. As the Constitution intended, copyright law is designed to encourage creativity, not to grant exclusive rights to basic expressions of facts. We want people to state facts accurately and concisely. If we allowed the first person to describe a judicial text in a natural, succinct way to claim exclusive rights over that expression, it would hinder, rather than facilitate, meaningful discussion of said text, and stifle blog posts like this one.
As to the selection and arrangement of the Key Number System, the court claims that originality exists here, too, because “there are many possible, logical ways to organize legal topics by level of granularity,” and TR exercised some judgment in choosing the particular “level” with its Key Number System. However, the cases are tagged with Key Number System by an automated computer system, and the topics closely mirror what law schools teach their first-year students.
The court does not say much about why the compilation of the headnotes should receive separate copyright protection, other than that it qualifies as original “factual compilations.” This claim is dubious because the compilation is of uncopyrightable materials, as discussed, and the selection is driven by the necessity to represent facts and law, not by creativity. Even if the compilation of headnotes is indeed copyrightable, using portions of it that are uncopyrightable is decidedly not an infringement, because the US does not protect sui generis database rights.
Can’t Claim Fair Use When Nobody Saw a Copy?
“[The intermediate-copying cases] are all about copying computer code. This case is not.”
— the ROSS court conveniently ignoring Bellsouth Advertising & Publishing Corp. v. Donnelley Information Publishing, Inc., 933 F.2d 952 (11th Cir. 1991) and Sundeman v. Seajay Society, Inc., 142 F. 3d 194 (4th Cir. 1998).
In deciding whether ROSS’s use of Westlaw’s headnotes and the Key Number System is transformative under the 1st factor, the court took a moment to consider whether the available intermediate copying case law is in favor of ROSS, and quickly decided against it.
Even though no consumer ever saw the headnotes or the Key Number System in the AI products offered by ROSS, the court claims that the copying of these constitutes copyright infringement because there existed an intermediate copy that contained copyright-restricted materials authored by Westlaw. And, according to the court, intermediate copying can only weigh in favor of fair use for computer codes.
Before turning to the actual case law the court is overlooking here, we wonder if Judge Bibas is in fact unpersuaded by his own argument: under the 3rd fair use factor, he admits that only the content made accessible to the public should be taken into consideration when deciding what amount is taken from a copyrighted work compared to the copyrighted work as a whole, which is contrary to what he argues under the 1st factor—that we must examine non-public intermediate copies.
Intermediate copying is the process of producing a preliminary, non-public work as an interim step in the creation of a new public-facing work. It is well established under US jurisprudence that any type of copying, whether private or public, satisfies a prima facie copyright infringement claim, but, the fact that a work was never shared publicly—nor intended to be shared publicly—strongly favors fair use. For example, in Bellsouth Advertising & Publishing Corp. v. Donnelley Information Publishing, Inc., the 11th Circuit Court decided that directly copying a competitor’s yellow pages business directory in order to produce a competing yellow pages was fair use when the resulting publicly accessible yellow pages the defendant created did not directly incorporate the plaintiff’s work. Similarly, in Sundeman v. Seajay Society, Inc., the Fourth Circuit concluded that it was fair use when the Seajay Society made an intermediary, entire copy of plaintiffs’ unpublished manuscript for a scholar to study and write about it. The scholar wrote several articles about it mostly summarizing important facts and ideas (while also using short quotations).
There are many good reasons for allowing intermediate copying. Clearly, we do not want ALL unlicensed copies to be subject to copyright infringement lawsuits, particularly when intermediate copies are made in order to extract unprotectable facts or ideas. More generally, intermediate copying is important to protect because it helps authors and artists create new copyrighted works (e.g., sketching a famous painting to learn a new style, translating a passage to practice your language skills, copying the photo of a politician to create a parody print t-shirt).
Suddenly, We Have an AI Training Market?
“[I]t does not matter whether Thomson Reuters has used [the headnotes and the Key Number System] to train its own legal search tools; the effect on a potential market for AI training data is enough.”
— the ROSS court
The 4th fair use factor is very much susceptible to circular reasoning: if a user is making a derivative use of my work, surely that proves a market already exists or will likely develop for that derivative use, and, if a market exists for such a derivative use, then, as the copyright holder, I should have absolute control over such a market.
The ROSS court runs full tilt into this circular trap. In the eyes of the court, ROSS, by virtue of using Westlaw’s data in the context of AI training, has created a legitimate AI training data market that should be rightfully controlled by TR.
Only that our case law suggests the 4th factor “market substitution” considers only markets which are traditional, reasonable or likely to be developed. As we have already pointed out in a previous blog post, copyright holders must offer concrete evidence to prove the existence, or likelihood of developing, licensing market, before they can argue a secondary use serves as “market substitute.” If we allowed a copyright holder’s protected market to include everything that he’s willing to receive licensing fees for, it will all but wipe out fair use in the service of stifling competition.
The impact of this case is currently limited, both because it is a district court ruling and because it concerns non-generative AI. However, it is important to remain vigilant, as the reasoning put forth by the ROSS court could influence other judges, policymakers, and even the broader public, if left unchallenged.
This ruling combines several problematic arguments that, if accepted more widely, could have significant consequences. First, it blurs the line between fact and expression, suggesting that factual information can become copyrightable simply by being written down by someone in a minimally creative way. Second, it expands copyright enforcement to intermediate copies, meaning that even temporary, non-public use of copyrighted material could be subject to infringement claims. Third, it conjures up a new market for AI training data, regardless of whether such a licensing market is legitimate or even likely to exist.
If these arguments gain traction, they could further entrench the dominance of a few large AI companies. Only major players like Microsoft and Meta will be able to afford AI training licenses, consolidating control over the industry. The AI training licensing terms will be determined solely between big AI companies and big content aggregators, without representation of individual authors or public interest. The large content aggregators will get to dictate the terms under which creators must surrender rights to their works for AI training, and the AI companies will dictate how their AI models can be used by the general public.
Without meaningful pushback and policy intervention, smaller organizations and individual creators cannot participate fairly. Let’s not rewrite our copyright laws to entrench this power imbalance even further.
Kat Von D tracing the image of Miles Davis in preparation for inking the tattoo
Although tattoos have existed for as long as human’s written history, legal disputes involving tattoos are a relatively new phenomenon. The case Sedlik v. Drachenberg, currently pending before the 9th Circuit, is particularly notable, as it marks the first instance of a court ruling on an artist’s use of copyrighted imagery in her tattoo art.
More importantly, the case presents the 9th Circuit a first opportunity to interpret the fair use right in the wake of the Supreme Court’s 2023 Warhol decision. Authors Alliance has been closely monitoring circuit courts’ rulings on fair use and advocating for a proper interpretation of Warhol—including challenging the problematic fair use ruling issued by the 10th Circuit earlier this year, a decision that was later vacated in response to strong pushback from fair use advocates.
At the heart of the Sedlik v. Drachenberg legal debate are two creative professionals with very different backgrounds:
The plaintiff in this case is Jeffery Sedlik. Sedlik is a successful professional photographer. He took a photo of the Jazz legend Miles Davis in 1989—an image that is at the focal point of the pending dispute.
The defendant, Kat Von Drachenberg (“KVD”), is a celebrity tattoo artist. In recent years, she has shifted away from for-profit tattooing, opting instead to ink clients for free. In 2017, she freehand-tattooed Miles Davis on a client’s arm, largely drawing from the 1989 photograph captured by Sedlik.
Sedlik’s claims were straightforward—he alleges that KVD’s tattoo, as well as her social media posts documenting the process of her creating the tattoo, infringe his copyright in the Miles Davis photo.
For Sedlik to state a prima facie case of copyright infringement, he must prove that KVD had access to the Miles Davis photo (which is easy to prove in this case), and that the allegedly infringing tattoo and social media posts are substantially similar to the plaintiff’s photo. In this case, the district court left the question of substantial similarity and fair use to the jury, after refusing the motions for summary judgement on copyright infringement issues in May 2022.
The jury returned a verdict in January 2024 that the tattoo inked by KVD and some of her social media posts are not substantially similar to Sedlik’s photo. The jury also determined that the rest of KVD’s social media posts, documenting her process of creating the tattoo in question, were fair use. In short, the jury concluded there was no copyright infringement.
On May 3rd, 2024, the district court judge denied Sedlik’s motions for judgment as a matter of law and for a new trial. Faced with the jury’s adverse decision, Sedlik argued, among other things, that the jury erred in finding no substantial similarity. The judge, however, upheld the jury’s finding that KVD’s works had a different concept and feel from Sedlik’s photo and that KVD only copied the unprotected elements of the photo. Sedlik tried to argue that the legal question of fair use should not have been left to the jury. However, the court was unpersuaded, highlighting that Sedlik had remained silent on this procedural issue until after receiving an unfavorable verdict.
Following the ruling on his motions, Sedlik appealed, and the case is now in front of the 9th Circuit. Anticipating the far-reaching consequences for artists and authors depending on how the 9th Circuit will interpret Warhol, Authors Alliance filed an amicus brief in support of KVD.
Both Sedlik and KVD in this case argued that Warhol supported their side. Sedlik proposed a unique test, that a fair use must either target the original copyrighted work, or otherwise have a compelling justification for the use. In our amicus brief, we illustrated how that is not the correct reading of Warhol. Under Warhol, a distinct purpose is required for the first factor to tilt in favor of fair use. The Warhol Court only analyzed “targeting” and “compelling justification” because Warhol’s secondary use of the Goldsmith photo shared the exact same purpose as the photo, both for the purpose of appearing on the cover of a magazine. This is not the case with KVD’s freehand tattoo and Sedlik’s photo: they serve substantially distinct purposes.
Authors routinely borrow from other’s copyrighted works for reporting, research, teaching, as well as to memorialize, preserve, or provide historical context. These uses by authors have historically been considered fair use, and often have purposes distinct from the copyrighted works used; but they do not necessarily “target” the works being used, nor do they have “compelling justifications” beyond the broad justification that authors are promoting the goal of copyright—”to promote the progress of science and the arts.”
In our brief, we also stressed how a successful commercial entity can nevertheless make noncommercial uses, as already demonstrated in the case of Google Books and Hachette. We also argued that social media posts are not commercial by default, just by virtue of drawing attention to the original poster. Many successful authors maintain active social media presence. The fact that authors invariably write to capture and build an audience through these sites does not automatically render their uses “commercial.” “Commerciality” under the fair use analysis has always been limited to the act of merchandising in the market, such as selling stamps, t-shirts, or mugs.
Finally, we explained to the court why copyright holders must offer concrete evidence to prove the existence, or likelihood of developing, licensing market, before they can argue a secondary use serves as “market substitute.” If we accepted Sedlik’s argument that his protected market includes everything that he’s willing to receive licensing fees for, it will all but wipe out fair use. We want authors and other creatives to continue to engage in fair use, including to document their creative processes—as KVD has done in this case in her social media posts, without being told they have to pay for each instance of use as soon as demanded by a rightsholder.
This post is by Rachael Samberg, Director, Scholarly Communication & Information Policy, UC Berkeley Library and Dave Hansen, Executive Director, Authors Alliance
This post is about the research and the advancement of science and knowledge made impossible when publishers use contracts to limit researchers’ ability to use AI tools with scholarly works.
Within the scholarly publishing community, mixed messages pervade about who gets to say when and how AI tools can be used for research reliant on scholarly works like journal articles or books. Some scholars voiced concern (explained more here) when major scholarly publishers like Wiley or Taylor & Francis entered lucrative contracts with big technology companies to allow for AI training without first seeking permission from authors. We suspect that these publishers have the legal right to do so since most publishers demand that authors hand over extensive rights in exchange for publishing their work. And with the backdrop of dozens of pending AI copyright lawsuits, who can blame the AI companies for paying for licenses, if for no other reason than avoiding the pain of litigation? While it stings to see the same large commercial, academic publishers profit yet again off of the work academic authors submit to them for free, we continue to think there are good ways for authors to retain a say in the matter.
Big tech companies are one thing, but what about scholarly research? What about the large and growing number of scholars who are themselves using scholarly copyrighted content with AI tools to conduct their research? We currently face a situation in which publishers are attempting to dictate how and when researchers can do that work, even when authors’ fair use rights to use and derive new understandings from scholarship clearly allow for such uses.
How vendor contracts disadvantage US researchers
We have written elsewhere (in an explainer and public comment to the Copyright Office) why training AI tools, particularly in the scholarly and research context, constitutes a fair use under U.S. Copyright law. Critical for the advancement of knowledge, training AI is based on a statutory right already held by all scholarly authors engaging in computational research and one that lawmakers should preserve.
The problem U.S. scholarly authors presently face with AI training is that publishers restrict their access to these statutory rights through contracts that override them: In the United States, publishers can use private contracts to take away statutory fair use rights that researchers would otherwise hold under Federal law. In this case, the private contracts at issue are the electronic resource (e-resource) license agreements that academic research libraries sign to secure campus access to electronic journal, e-book, data, and other content that scholars need for their computational research.
Contractual override of fair use is a problem that disparately disadvantages U.S. researchers. As we have described elsewhere, more than forty countries, including the European Union, expressly reserve text mining and AI training rights for scientific research by research institutions. Not only do scholars in these countries not have to worry whether their computational research with AI is permitted, but also: They do not risk having those reserved rights overridden by contract. The European Union’s Copyright Digital Single Market Directive and recent AI Act nullify any attempt to circumscribe the text and data mining and AI training rights reserved for scientific research within research organizations. U.S. scholars are not as fortunate.
In the U.S., most institutional e-resource licenses are negotiated and managed by research libraries, so it is imperative that scholars work closely with their libraries and advocate to preserve their computational research and AI training rights within the e-resource license agreements that universities sign. To that end, we have developed adaptable licensing language to support institutions in doing that nationwide. But while this language is helpful, the onus of advocacy and negotiation for those rights in the contracting process remains. Personally, we have found it helpful to explain to publishers that they must consent to these terms in the European Union, and can do so in the U.S. as well. That, combined with strong faculty and administrative support (such as at the University of California), makes for a strong stance against curtailment of these rights.
But we think there are additional practical ways for libraries to illustrate—both to publishers and scholarly authors—exactly what would happen to the advancement of knowledge if publishers’ licensing efforts to curtail AI training were successful. One way to do that is by “unpacking” or decoding a publisher’s proposed licensing restriction, and then demonstrating the impact that provision would have on research projects that were never objectionable to publishers before, and should not be now. We’ll take that approach below.
Decoding a publisher restriction
A commercial publisher recently proposed the following clause in an e-resource agreement:
Customer [the university] and its Authorized Users [the scholars] may not:
directly or indirectly develop, train, program, improve, and/or enrich any artificial intelligence tool (“AI Tool”) accessible to anyone other than Customer and its Authorized Users, whether developed internally or provided by a third party; or
reproduce or redistribute the Content to any third-party AI Tool, except to the extent limited portions of the Content are used solely for research and academic purposes (including to train an algorithm) and where the third-party AI Tool (a) is used locally in a self-hosted environment or closed hosted environment solely for use by Customer or Authorized Users; (b) is not trained or fine-tuned using the Content or any part thereof; and (c) does not share the Content or any part thereof with a third party.
What does this mean?
The first paragraph forbids the training or improving of any AI tool if it’s accessible or released to third parties. And, it further forbids the use of any computational outputs or analysis that are derived from the licensed content from being used to train any tool available to third parties.
The second paragraph is perhaps even more concerning. It provides that when using third party AI tools of any kind, a scholar can use only limited portions of the licensed content with the tools, and are prohibited from doing any training at all of third party tools even if it’s a non-generative AI tool and the scholar is performing the work in a completely closed and highly secure research environment.
What would the impact of such a restrictive licensing provision be on research?
It would mean that every single one of the trained tools in the following projects could never be disseminated. In addition, for the projects below that used third-party AI tools, the research would have been prohibited full-stop because the third-party tools in those projects required training which the publisher above is attempting to prevent:
Tools that could not be disseminated
In 2017, chemists created and trained a generative AI tool on 12,000 published research papers regarding synthesis conditions for metal oxides, so that the tool could identify anticipated chemical outputs and reactions for any given set of synthesis conditions entered into the tool. The generative tool they created is not capable of reproducing or redistributing any licensed content from the papers; it has merely learned conditions and outcomes and can predict chemical reactions based on those conditions and outcomes. And this beneficial tool would be prohibited from dissemination under the publisher’s terms identified above.
In 2018, researchers trained an AI tool (that they had originally created in 2014) to understand whether a character is “masculine” or “feminine” by looking at the tacit assumptions expressed in words associated with that character. That tool can then look at other texts and identify masculine or feminine characters based on what it knows from having been trained before. The implications are that scholars can therefore use texts from different time periods with the tool to study representations of masculinity and femininity over time. No licensed content, no licensed or copyrighted books from a publisher can ever be released to the world by sharing the trained tool; the trained tool is merely capable of topic modeling—but the publisher’s above language would prohibit its dissemination nevertheless.
Tools that could neither be trained nor disseminated
In 2019, authors used text from millions of books published over 100 years to analyze cultural meaning. They did this by training third-party non-generative AI word-embedding models called Word2Vec and GLoVE on multiple textual archives. The tools cannot reproduce content: when shown new text, they merely represent words as numbers, or vectors, to evaluate or predict how similar words in a given space are semantically or linguistically. The similarity of words can reveal cultural shifts in understanding of socioeconomic factors like class over time. But the publisher’s above licensing terms would prohibit the training of the tools to begin with, much less the sharing of them to support further or different inquiry.
In 2023, scholars trained a third-party-created open-source natural language processing (NLP) tool called Chemical Data Extractor (CDE). Among other things, CDE can be used to extract chemical information and properties identified in scholarly papers. In this case, the scholars wanted to teach CDE to parse aspecific type of chemical information: metal-organic frameworks, or MoFs. Generally speaking, the CDE tool works by breaking sentences into “tokens” like parts of speech and referenced chemicals. By correlating tokens, one can determine that a particular chemical compound has certain synthetic properties, topologies, reactions with solvents, etc. The scholars trained CDE specifically to parse MoF names, synthesis methods, inorganic precursors, and more—and then exported the results into an open source database that identifies the MoF properties for each compound. Anyone can now use both the trained CDE tool and the database of MoF properties to ask different chemical property questions or identify additional MoF production pathways—thereby improving materials science for all. Neither the CDE tool nor the MoF database reproduces or contains the underlying scholarly papers that the tool learned from. Yet, neither the training of this third-party CDE tool nor its dissemination would be permitted under the publisher’s restrictive licensing language cited above.
Indeed, there are hundreds of AI tools that scholars have trained and disseminated—tools that do not reproduce licensed content—and that scholars have created or fine-tuned to extract chemical information, recognize faces, decode conversations, infer character types, and so much more. Restrictive licensing language like that shown above suppresses research inquiries and societal benefits that these tools make possible. It may also disproportionately affect the advancement of knowledge in or about developing countries, which may lack the resources to secure licenses or be forced to rely on open-source or poorly-coded public data—hindering journalism, language translation, and language preservation.
Protecting access to facts
Why are some publishers doing this? Perhaps to reserve the opportunity to develop and license their own scholarship-trained AI tools, which they could then license at additional cost back to research institutions. We could speculate about motivations, but the upshot is that publishers have been pushing hard to foreclose scholars from training and dissemination AI tools that now “know” something based on the licensed content. That is, such publishers wish to prevent tools from learning facts about the licensed content.
However, this is precisely the purpose of licensing content. When institutions license content for their scholars to read, they are doing so for the scholars to learn information from the content. When scholars write about it or teach about the content, they are not regenerating the actual expression from the content—the part that is protected by copyright; rather the scholars are conveying the lessons learned from the content—facts not protected by copyright. Prohibiting the training of AI tools and the dissemination of those tools is functionally equivalent to prohibiting scholars from learning anything about the content that institutions are licensing for that very purpose, and that scholars have written to begin with! Publishers should not be able to monopolize the dissemination of information learned from scholarly content, and especially when that information is used non-commercially.
For these reasons, when we negotiate to preserve AI usage and training rights, we generally try to achieve the following outcomes which would promote—rather than prohibit—all of the research projects described above:
The sample language we’ve disseminated empowers others to negotiate for these outcomes. We hope that, when coupled with the advocacy tools we’ve provided above, scholars and libraries can protect their AI usage and training rights, while also being equipped to consider how they want their own works to be used.
We got a disappointing decision yesterday from the Second Circuit Court of Appeals in the long-running Hachette v. Internet Archive (IA) copyright lawsuit about IA’s digitization and lending of books. The Court affirmed the district court’s decision that IA cannot circulate digital copies of books they have legitimately acquired in physical copies, even when only the same number of copies as legitimately acquired are circulated to a single user at a time—just as a physical book would be loaned.
The Court, focusing on IA’s lending of digitized books that were available for license as ebooks from the publishers, concluded that IA’s fair use defense fails. We think this decision will result in a meaningful reduction in access to knowledge. This is sad news for many authors who have relied on IA’s Open Library for research and discovery, and for readers who have used Open Library to find authors works. However, we also view it as a decision limited to its facts—that is, IA’s particular implementation of controlled digital lending (CDL), and more specifically, its lending of books that are already available in licensed digital formats.
We plan to do a more in-depth analysis of the Court’s decision later, but for now, we offer some initial thoughts. First, there are a couple of bright spots in the opinion:
1) The Court rejected the district court’s conclusion that IA was engaged in commercial use when looking at the first factor of fair use. The publishers argued IA’s lending of digitized books was commercial in nature because IA received a few thousand dollars from a for-profit used-bookseller and also solicited donations on its website. The Court rightly pointed out that if that was the standard, virtually every nonprofit that solicits donations would by default only be able to engage in commercial use. This was an issue we and others strongly urged the Court to address, and we’re glad it did.
2) For the most part, the Court focused its analysis on the facts of the case, which was really about IA lending digitized copies of books that were already available in ebook form and licensable from the publishers. The legal analysis in several places turned on this fact, which we think leaves room to make fair use arguments regarding programs to digitize and make available other books, such as print books for which there is no licensed ebook available, out-of-print books, or orphan works. CDL will remain an important framework, especially considering the lack of an existing digital first-sale doctrine.
We are also disappointed by several key points in the decision:
One was the Court’s assessment of the first fair use factor, “purpose and character of the use.” The Court’s analysis of this factor was in some ways unsurprising but nevertheless disappointing. The Court did little more than conclude that the use was not transformative and, therefore, not fair use. Though we think there are strong arguments that CDL is transformative, whether CDL is “transformative” is just one of the supporting rationales for the argument that CDL is fair use. The other justifications—that CDL supports teaching, scholarship, and research, along with complementing the first sale doctrine and supporting the public-interest mission of libraries—are at the heart of CDL. The Court didn’t engage with those other arguments at all and also ignored meaningful discussion of cases where non-transformative copying supported a fair use finding because of the public benefits.
A second key issue is about whether IA’s digital lending negatively impacts the market for the original works. This issue probably deserves a whole blog post to itself, but in short the analysis came down to who shoulders the burden of proving or disproving market harm, and what default assumptions the court has about market harm. The following quotes from the decision will give you a sense of how the Court analyzed the issue:
[a]lthough they do not provide empirical data of their own, Publishers assert that they (1) have suffered market harm due to lost eBook licensing fees and (2) will suffer market harm in the future if IA’s practices were to become widespread. IA argues that Publishers cannot rely on the “common-sense inference” of market harm without data to back that up, citing American Society for Testing & Materials v. Public.Resource.Org, Inc. [citations omitted]. . . . We agree with Publishers’ assessment of market harm.
Despite IA’s experts having offered meaningful data and analysis indicating a lack of market harm on sales of publishers’ books, the Court went on to say:
We are likewise convinced that “unrestricted and widespread conduct of the sort engaged in by [IA] would result in a substantially adverse impact on the potential market for [the Works in Suit]. . . . Though Publishers have not provided empirical data to support this observation, we routinely rely on such logical inferences where appropriate in assessing the fourth fair use factor. . . . Thus, we conclude it is “self-evident” that if IA’s use were to become widespread, it would adversely affect Publishers’ markets for the Works in Suit.
We are also disappointed by how the Court portrayed the overall public benefit of IA’s lending and its long-term effect: “while IA claims that prohibiting its practices would harm consumers and researchers, allowing its practices would―and does―harm authors.” We think this is a gross generalization and mischaracterization of how IA’s digital lending affects most authors. Authors are researchers. Authors are readers. IA’s digital library helps authors create new works and supports their interests in having their works read. This ruling may benefit the largest publishers and most prominent authors, but for most, it will end up harming more than it will help.
This post is authored by Authors Alliance Senior Staff Attorney, Rachel Brooke.
More recent members and readers may not be aware that Authors Alliance was founded in the wake of Authors Guild v. Google, a class action fair use case in the Second Circuit that was litigated for nearly a decade, and finally resolved in favor of Google in 2015. The case concerned the Google Books project—an initiative launched by Google whereby the company partnered with university libraries to scan books in their collections. These scans would ultimately be made available as a full-text searchable database for the public to search through for particular terms, with short “snippets” displayed accompanying the search results. Users could not, however, view or read the scanned books in their entirety. The Authors Guild, along with several authors, filed a lawsuit against Google alleging that scanning the books and displaying these snippets constituted copyright infringement.
In addition to Authors Guild representing its members in the litigation, its associated plaintiffs brought the case as a class action, claiming to bring the case on behalf of a broad group of authors: “[a]ll persons residing in the United States who hold a United States copyright interest in one or more Books reproduced by Google as part of its Library Project” who were either authors or the authors’ heirs.
But many of these authors did not agree with the Authors Guild’s stance in the case, and felt that the Google Books project served their interests in sharing knowledge, seeing their creations be preserved, and reaching readers interested in their work. A group of authors and scholars came together to share their views with the district court, many of whom would soon become founding members of Authors Alliance. Many of those same authors signed on to amicus briefs before both the district court and Second Circuit explaining why they opposed the litigation and supported Google’s fair use defense. Then, in 2014, Authors Alliance submitted its first amicus brief to the Second Circuit, supporting Google’s ultimately successful fair use defense. The plaintiffs later appealed the Second Circuit’s ruling, asking the Supreme Court to weigh in, but the Court ultimately declined to hear the case, leaving the Second Circuit’s ruling intact.
Nearly a decade later, the effects of Google Books can still be seen in fair use decisions and copyright policy developments involving the challenges of adapting copyright to the digital world. In today’s post, I’ll reflect on how Google Books can be contextualized within today’s fair use landscape and share my thoughts on what the case can tell us about copyright in the digital world.
Google Books and Transformativeness
A major question in Authors Guild v. Google was whether Google’s use of the copyrighted works was “transformative,” a key component of the fair use inquiry. When a use is found to be transformative, this in practice weighs heavily in favor of a finding of fair use. In the case, the court found that Google’s scanning, as well as the search and snippet display functions, were transformative because the service “augments public knowledge by making available information about [the] books without providing the public with a substantial substitute for . . . the original works.” This was because Google Books provided information about the books—such as the author and publisher information—without creating substitutes of the original works. In other words, readers could learn about the books they searched through, but could not read the books in full—to do this, those readers would have to purchase or borrow copies through the normal channels.
Since the doctrine of transformativeness was established in the 1994 landmark Supreme Court case, Campbell v. Acuff-Rose Music, there have been myriad questions about the precise contours of what it means for a use to be transformative. Campbell established that a use is transformative when it endows the secondary work with a “new meaning or message,” but it can be difficult to apply this test in practice, particularly in the context of new or nascent technologies. Google Books tells us that scanning works in order to create a full-text searchable database with limited snippet displays is a transformative use based on its new and different purpose from the purpose of the works themselves. Furthermore, it reinforces the notion that a use is particularly likely to be considered transformative when it serves the underlying purpose of copyright law: incentivizing new creation for the benefit of the public and “enriching public knowledge.” By highlighting that Google contributed to public knowledge about books through its scanning activities and the Google Books search function, the court helped bring fair use for scholarship and research—two key prototypical uses established in the 1976 Copyright Act—into the digital age, setting an important precedent for later cases.
Google Books and Derivative Works
One of the plaintiffs’ arguments in Google Books was that Google’s full-text searchable database constituted a derivative work. One of a copyright holder’s exclusive rights is the right to prepare derivative works—such as adaptations, abridgements, or translations of the original work—and the plaintiffs alleged that this right had been infringed. The court disagreed, finding that Google’s use had a transformative purpose, whereas derivative works tend to involve a transformation in form, such as the adaptation of a novel into a movie or an audiobook. Furthermore, the court explained that derivative works are “those that re-present the protected aspects of the original work, i.e., its expressive content, converted into an altered form[.]” In contrast, the Google Books project provided information about the books and offered a limited “snippet” view, but did not re-present the expressive content: the full text of the books themselves.
The distinction the court drew between transformative fair uses and derivative works in Google Books is an important one, as it can often be a close question whether a work involves a transformative purpose or merely represents the same work in a new form, without enough added to tip the scales towards fair use. And it is a question that continues to arise in fair use cases today: just last year, the Supreme Court agreed to hearWarhol Foundation v. Goldsmith, a case about whether Andy Warhol’s creation of a series of screenprints of the late musical artist Prince which drew from a photograph taken by photographer Lynn Goldsmith qualified as a fair use. We’ve covered this case extensivelyonourblog over the past few years, and submitted an amicus brief in the case. Our brief argues (among other things) that Warhol’s screen prints involve much more than a transformation in form: they are stylistically and visually distinct from Goldsmith’s photograph, and endow the photograph with a new meaning or message, making the use highly transformative.
As in Google Books, the parties and amici in Goldsmith grapple with the line between transformative uses and the creation of derivative works, an often complicated and fact-sensitive determination. In this context, Google Books serves as a reminder that fair use is not a one-size-fits-all determination. Yet it also provides support for arguments advanced by Authors Alliance and others that simply because a transformation in form exists—in the Google Books case, the transformation from a print book to a scanned copy, and in Goldsmith, the transformation of a black and white photo to a series of colorful screenprints—does not mean that a secondary use cannot be a fair one. Warhol’s use did not merely “re-present the protected aspects of the original work[‘s] . . . expressive content,” but was transformative in the different “purpose, character, expression, meaning, and message” it conveyed.
Google Books and Controlled Digital Lending
The practice of controlled digital lending (“CDL”)—and the arguments in favor of it constituting a fair use—can be traced back in part to the fair use principles established and reinforced in Google Books. As I argue in our amicus brief in Hachette Books v. Internet Archive, a case about—among other things—whether CDL constitutes a fair use, Google Books shows that copying the entirety of a work in the process of making a transformative use of it can be fully consistent with fair use.
Another important suggestion in the Google Books case, made at the district court level, was that the Google Books search function could actually drive book sales: the search results were accompanied by links to purchase the book, and research suggested that this could enhance sales of those books. This is analogous to the effects of library lending: library readers often purchase books by authors they first discovered at the library, an effect which can apply with equal force when the library patron borrows a CDL scan. Indeed, several other amici in Hachette Books argue that the finding that the Google Books search was a fair use lent substantial support for the argument that CDL is a fair use, based on both the factual similarities between the two initiatives and their shared objective of “enriching public knowledge.”
As in Google Books, CDL also helps authors reach readers who could not otherwise access their books, and achieves this through scanning books on library shelves. And also like Google Books, CDL helps solve the problem of 20th century works “disappearing”: the commercial life of a book tends to be much shorter than the term of copyright, so when books under copyright go out of print, they can disappear into obscurity. Scanning these books to preserve them ensures that the knowledge they advance will not be lost.
Google Books and Text Data Mining
Text data mining—the process of using automated techniques aimed at quantitatively analyzing text and other data—is also widely considered to be a fair use, and this determination is similarly built in part on the building blocks established in Google Books. As was the case in Google Books, the results of text data mining research provide information about the works being studied, and cannot in any way serve as substitutes for the content of the works. In fact, one important aspect of the new exemption to DMCA liability for text data mining, which Authors Alliance successfully petitioned for in 2021, is that researchers are not able to use the works in the text data mining corpus for consumptive purposes. And also like Google Books, researchers are able to view the content in a limited manner to verify their findings, analogous to Google Books’s snippet view. The new TDM exemption was a huge win for Authors Alliance members, and something to celebrate for all scholars engaged in this important research. Importantly, the precedent established by Google Books strongly supported its adoption and the Register of Copyright’s suggestion that text data mining was likely to be a fair use.
Looking Forward: Google Books and Artificial Intelligence
In recent years, scholars and researchers have grappled with the implications of copyright protection on AI-generated content and AI models more generally. The holding in Google Books provides some support for companies’ and researchers’ ability to engage in these activities: one important factor in the case was that Google Books did not harm the market for the books at issue in the case, since the books in the database could not serve as substitutes for the books themselves. Similarly, when copyrighted works are used to train AI, the output cannot serve as a substitute for the copyrighted works, and the market for those works is not harmed, even if—like the plaintiffs in Google Books—the copyright holders might prefer that their works not be used in this way. Google Books establishes that simply because copyrighted works are used as “input” in a given model, this does not mean that the outputs constitute infringement. It is also worth noting that the court found Google’s use to be fair despite the fact that it was a use by a commercial, profit-seeking entity. While a commercial use can sometimes tip the scales in favor of finding a use to not be fair, this can be overcome by a socially beneficial, transformative purpose. This could arguably apply with equal force to AI models trained on copyrighted works which contribute to our understanding of the world, despite the fact that commercial entities are often the ones deploying these technologies.
Eight years after it was decided, the legacy of Google Books endures in policy debates and copyright lawsuits that capture the public’s attention. Policymakers and judges would be wise to heed the lessons it teaches about the value of advancing public knowledge through digitization and the use of copyrighted works for new and socially beneficial purposes. As we await policy developments regarding text data mining and wait for decisions in Goldsmith and Hachette Books, it is my hope that this legacy will live on, reminding us all of the vast capabilities of information technology to enrich our understanding of the world and advance the progress of knowledge, which, after all, is what copyright law is all about.
Given its importance, it may surprise you to learn that fair use is remarkably easy to evade. Savvy copyright owners do it all the time. It takes just two easy steps.
First, you need to write a contract, specifically a “license” for the use of your work. In it, you dictate the terms on which you provide access to your work. You can impose almost any restrictions you like. Sometimes, contracts will restrict certain classes of uses: “you cannot reproduce this content for commercial use” or “you may download one copy of this work for personal consultation; you cannot reproduce or share any part of this work in whole or in part in any form, or share in any form with the public.”
Other contractual terms guard against specific threats. For example, Disney once won a lawsuit over use of its movie trailers, which Disney would license to websites only if they agreed that the website “may not be derogatory to or critical of the entertainment industry or of [Disney] (and its officers, directors, agents, employees, affiliates, divisions and subsidiaries) or of any motion picture produced or distributed by [Disney].”
The key here is that you can essentially rewrite the rules, and forbid those aspects of fair use that you disapprove of. Want to make sure critics can’t use your words against you? Just say they can’t. Want to make sure libraries don’t make preservation copies without paying you first? Want to make sure that instructors of college classes can only use excerpts of your book—even very small excerpts—if they pay every single time? It’s your prerogative.
Second, you need to make sure that everyone who gains access to your work is bound by your license. This sounds hard, but with online distribution, it’s actually pretty easy.
In the world of print copies, this was difficult because copies had a way of traveling beyond the control of the original purchaser. The “first sale” doctrine meant that buyers of copies could freely transfer those copies to third-party buyers (e.g., someone who buys a book at a used book store, or who borrows a book from a library) or give them away. So, even if you got the original buyer to agree to your terms, those downstream users didn’t have to. But there is no widespread acceptance of a buyer’s “digital first sale.” So, buyers can’t just transfer the copies they purchase to downstream users. Everyone who wants access to the digital copy must agree to the license. All you have to do is make sure that your materials are distributed exclusively on digital platforms that are subject to your terms, and you’re all set.
That’s it. Two easy steps and you’ve practically eliminated fair use. For any use you haven’t already authorized, you can just say no, require them to pay whatever you want, or just refuse to grant access. And if they don’t comply, at a minimum you’ve got at a slam-dunk breach of contract claim.
Is it Seriously That Easy?
Unfortunately, this two-step approach–sometimes known as “contractual override”–reflects the prevailing wisdom and practice of many copyright owners. It is widely used online, by parties ranging from massive corporations such as Amazon or Netflix to small publishers and news outlets. And though the precedent for it isn’t airtight, when it has come up in court, the licensors have mostly prevailed. Because U.S. law so venerates “freedom of contract,” it has been difficult for policymakers or the courts to address the problem of rightsholders forbidding lawful fair uses under the terms of their licenses.
How did we get to this point? This is not a new or unexpected problem. You can look back to 1993, when law professor Jane Ginsburg foresaw this state of affairs just as the possibilities of the internet were coming into view:
“In the digital environment posited here, contract protection may not be the fragile creature presumed in prior intellectual property preemption decisions. If access to works could be obtained only through the information provider (directly or through an authorized online distributor), and if copying could be electronically tracked or prevented, no ‘third parties’ to the contract would exist. When ‘we’re all connected,’ no functional difference may exist between a contract and a property right. At that point, it becomes necessary to consider whether limitations incorporated in the copyright law should be imported to its contractual substitute.”
Numerous others in the legal community soon made similar observations, such as Julie Cohen, Niva Elkin-Koren, and Andrew Shapiro, among others, who also wrote about aspects of this then-new challenge.
How to Protect Fair Use from Contractual Override
A handful of efforts to address this problem have been mounted in Congress. In 2003 and 2005, representative Zoe Lofgren introduced a bill appropriately called the BALANCE Act (“Benefit Authors without Limiting Advancement or Net Consumer Expectations”), which addressed both the unavailability of “first sale” in the digital environment and contractual override of fair use. The proposed legislation provided that “[w]hen a digital work is distributed to the public subject to nonnegotiable license terms, such terms shall not be enforceable under the common laws or statutes of any State to the extent that they restrict or limit any of the limitations on exclusive rights under this title.” The BALANCE Act never passed however, and hasn’t been revisited in Congress since 2005.
Recent actions in other jurisdictions may provide renewed legislative interest and guidance on possible models to adopt. For example, in 2014, the UK passed legislation that limits contractual override of user rights—providing that “to the extent that a term of a contract purports to prevent or restrict the doing of any act which, by virtue of this section, would not infringe copyright, that term is unenforceable.” This language has been applied in the UK to exceptions that allow for making copies for persons with print and other disabilities, research and teaching, and text and data-mining. Similarly, the EU’s recent Copyright in the Digital Single Market Directive contains similar protections for copyright exceptions, as does Singapore’s recent copyright bill. So far, though, there has been no indication of real interest from Congress in the United States.
It’s also possible that states could craft legislation. There has recently been a surge of interest in bills in a number of states aimed at protecting libraries’ ability to license books on reasonable terms (bills that Authors Alliance generally supports). These bills also go beyond what fair use protects—seeking to, for example, ensure that libraries have broad access to ebooks on “reasonable terms,” and addressing problems of major publishers simply refusing to license books to libraries. Maryland was the first state to actually pass such a law, but it was struck down as preempted by federal copyright law inAAP v. Frosh.The court concluded that because federal copyright law dictates the scope of rights governing public distribution of works, it was impermissible for the state of Maryland to interject its own rules about the scope of the publishers’ distribution rights.
It’s possible that state legislation that is more narrowly tailored—e.g., a state law that focused solely on protecting fair use—would not suffer the same fate as the Maryland law. In fact, the reasoning of the Maryland e-lending case would seem to support such a state law, since a state law protecting fair use would be maintaining, rather than altering, the balance of rights as defined by federal law.
Legal Strategies in Court
It’s also possible that the courts could intervene, though so far they have mostly declined to do so. It seems to me there are two or three viable ways for judicial intervention to be effective:
First, Courts could conclude that contracts (created under and governed by state law) are preempted by federal copyright law, which is what defines the scope of copyright’s exclusive rights. The Constitution provides that federal law supersedes conflicting state law, and Congress has provided specific instructions on how such preemption should apply, stating that “all legal or equitable rights that are equivalent to any of the exclusive rights within the general scope of copyright as specified by section 106 . . . are governed exclusively” by federal copyright law. Those exclusive rights of copyright owners are explicitly defined as being “subject to” the limitations including fair use, so it would make some sense for courts to view state law expansions of those rights as being in conflict with and therefore preempted by federal copyright law.
However, there are several negative precedents indicating that this approach may not work. Take Bowers v. Baystate, for example, a Federal Circuit case involving two competing computer aided design (CAD) software companies. Bowers contended that Baystate violated the terms of use on its software by reverse-engineering its product in violation of a clause explicitly prohibiting such use. Baystate contended that such reverse engineering was protected by fair use and that contract terms to the contrary should be preempted as inconsistent with federal law. The Federal Circuit, observing that as a general matter “most courts to examine this issue have found that the Copyright Act does not preempt contractual constraints on copyrighted articles,” concluded that “private parties are free to contractually forego the limited ability to reverse engineer a software product under the exemptions of the Copyright Act. . . . [A] state can permit parties to contract away a fair use defense or to agree not to engage in uses of copyrighted material that are permitted by the copyright law, if the contract is freely negotiated.”
Other courts addressing state contract law and other state law limitations on fair use (e.g,. this California right of publicity case) have largely followed the same approach. One notable exception to isVault Corp. v. Quaid Software, Ltd.,in which the Fifth Circuit invalidated a Louisiana law that permitted contracts to prohibit reverse engineering, even though federal law provides a specific exception (Section 117) that allows for such reverse engineering. Although not directly addressing fair use, the court’s holding could apply equally to state law contractual restrictions on fair use. The issue has not directly reached the Supreme Court, though there is a case,Genius v. Google, currently pending on a Petition for Certiorari that asks the Court to weigh in on the broader question of when federal law preempts contracts under state law.
Second, courts could conclude that the state common law (the body of law made up of legal principles established by courts over the years) on contracts does not permit contractual restrictions on fair use. This could come in a few different forms. One option might be for courts to consider more seriously the question of whether a valid contract is actually created in the first place, particularly in situations where users have no meaningful opportunity to negotiate terms and little ability to even understand what restrictions they are agreeing to. For years, following the lead of the Seventh Circuit Court of Appeals inProCD v. Zeidenberg, courts have been willing to accept that a valid agreement is formed even in situations with “shrinkwrap” or “browsewrap” licenses. But, despite ongoing criticism of this approach by many, the approach has prevailed. Courts might also take more seriously the public policy implications of fair use evasion more directly, by invoking traditional rules for contract interpretation that hold terms unenforceable when they violate public policy—e.g., agreements to commit a crime, or a tort, or restraint of trade. To date, however, I’m unaware of any such cases directly applying these principles to contracts that restrict fair use, though there is a large body of case law and this may merit more research.
Third, the courts could apply existing or new equitable doctrines, such as “copyright misuse” or a yet-to-be-defined right of “fair breach” to protect users from overenforcement of contracts that limit fair use. Professor Jane Ginsburg outlines the potential need for courts to develop their own remedy of “fair breach.” She observes that, as with the current licensing environment online, at some point “it becomes necessary to consider whether limitations incorporated in the copyright law should be imported to its contractual substitute. With respect to libraries and their users, one should inquire whether some kind of fair use exception is appropriate. This might take the form of a judge-made right of ‘fair breach,’ or legislatively imposed mandatory library-user rights.”
This idea of “fair breach” has drawn little attention since Ginsburg first identified its need and coined the term, but it merits further attention. “Fair breach” may have some similarity to the existing doctrine ofcopyright misuse, which could have some application to contracts that restrict fair use. A judge-made doctrine borrowed from the patent law doctrine of patent misuse, copyright misuse has been mostly applied to situations where copyright owners have attempted to exercise their rights to unfairly stifle competition. The primary question with copyright misuse is “whether the copyright is being used in a manner violative of the public policy embodied in the grant of a copyright.” If copyright misuse is found, the copyright isn’t invalidated, but courts have held that the owners’ copyright cannot be enforced to exclude the harmed party’s use. The Supreme Court has yet to acknowledge the existence of this doctrine, but numerous appellate courts have recognized it over the last thirty years.
Video Pipeline, Inc. v Buena Vista Home Entertainment, Inc., also gives some encouragement. In that case, Video Pipeline brought a declaratory judgment action seeking a judgment that its use of video trailers from Disney and others was not copyright infringement. Among the defenses it cited was copyright misuse on the part of Disney. To support its copyright misuse argument, Video Pipeline pointed to the license term I mentioned at the beginning of this blog post, which conditioned the license on an agreement to not disparage Disney or the entertainment industry. The court ultimately declined to find that those terms constituted copyright misuse, because the contract had a narrow focus and limited application: “we nonetheless cannot conclude on this record that the agreements are likely to interfere with creative expression to such a degree that they affect in any significant way the policy interest in increasing the public store of creative activity. The licensing agreements do not, for instance, interfere with the licensee’s opportunity to express such criticism on other web sites or elsewhere.” However, the court suggested that the outcome could have been different if the restrictions were more far reaching.
Contractual override of fair use poses a real threat to free expression, especially given the increasing limits on distribution of copyrighted works online. Almost all online platforms that distribute copyrighted works impose restrictions that inhibit fair use to some degree. It takes just two easy steps. Thankfully, there are some plausible routes forward for improving the law to protect authors and others who rely on fair use to create new works and share knowledge with the world. There is also some reason for optimism due to renewed interest in the issue among scholars and organizations such as the Association of Research Libraries, which issued a report on contractual override for libraries, and is co-hosting a symposium with Washington College of Law at American University on the subject with perspectives from around the world.
Authors who want to incorporate source materials into their writings with confidence may find themselves faced with more questions than answers. What exactly does fair use mean? What factors do courts consider when evaluating claims of fair use? How does fair use support authors’ research, writing, and publishing goals? Fortunately, help is at hand! This Fair Use/Fair Dealing Week, we’re featuring a selection of resources, briefs, and blog posts to help authors understand and apply fair use.
Fair Use 101
Authors Alliance Guide to Fair Use for Nonfiction Authors: Our guidebook, Fair Use for Nonfiction Authors, covers the basics of fair use, addresses common situations faced by nonfiction authors where fair use may apply, and debunks some common misconceptions about fair use. Download a PDF today.
Authors Alliance Fair Use FAQs: Our Fair Use FAQs cover questions such as:
Can I still claim fair use if I am using copyrighted material that is highly creative?
What if I want to use copyrighted material for commercial purposes?
Does fair use apply to copyrighted material that is unpublished?
Codes of Best Practices in Fair Use: The Center for Media and Social Impact at American University has compiled this collection of Codes of Best Practices in Fair Use for various creative communities, from journalists to librarians to filmmakers.
We’re very pleased to announce a new project for 2023, “Text and Data Mining: Demonstrating Fair Use,” which is generously supported by the Mellon Foundation. The project will focus on lowering and overcoming legal barriers for researchers who seek to exercise their fair use rights, specifically within the context of text data mining (“TDM”) research under current regulatory exemptions.
Fair use is one of the primary legal doctrines that allow researchers to copy, transform, and analyze modern creative works—almost all of which are protected by copyright—for research, educational, and scholarly purposes. Unfortunately, in practice, not everyone is able to use this powerful right. Researchers today face the challenge that fair use is often overridden by a complex web of copyright-adjacent laws. One major culprit is Section 1201 of the Digital Millennium Copyright Act (“DMCA”), which imposes significant liability for users of copyrighted works who circumvent technical protection measures (e.g., content scramble for DVDs), unless those users comply with a series of specific exemptions to Section 1201. These exemptions are lengthy and complex, as is the process to petition for their adoption or renewal, which recurs every three years.
Text data mining is a prime example of work that demonstrates the power of fair use, as it allows researchers to discover and share new insights about how modern language and culture reflect on important issues ranging from our understanding of science to how we think about gender, race, and national identity. Authors Alliance has worked extensively on supporting TDM work in the past, including by successfully petitioning the Copyright Office for a DMCA exemption to allow researchers to break digital locks on films and literary works distributed electronically for TDM research purposes, and this project builds on those previous efforts.
The Text Data Mining: Demonstrating Fair Use project has two goals in 2023:
1) To help a broader and more diverse group of researchers understand their fair use rights and their rights under the existing TDM exemption through one-on-one consultations, creating educational materials, and hosting workshops and other trainings; and
2) To collect and document examples of how researchers are using the current TDM exemption, with the aim of illustrating how the TDM exemption can be applied and highlighting its limitations so that policymakers can improve it in the future.
We’ll be working closely with TDM researchers across the United States, as well organizations such as the Association for Computers and the Humanities, and will be actively exploring opportunities to work with others. If you have an interest in this project, we would love to hear from you!
About The Andrew W. Mellon Foundation
The Andrew W. Mellon Foundation is the nation’s largest supporter of the arts and humanities. Since 1969, the Foundation has been guided by its core belief that the humanities and arts are essential to human understanding. The Foundation believes that the arts and humanities are where we express our complex humanity, and that everyone deserves the beauty, transcendence, and freedom that can be found there. Through our grants, we seek to build just communities enriched by meaning and empowered by critical thinking, where ideas and imagination can thrive. Learn more at
Fair use is one of the more dynamic topics in copyright law lately—the Supreme Court has issued decisions or agreed to hear cases in twoseparate fair use cases that could affect how authors can rely on fair use just within the past year and a half. Fair use is also a topic Authors Alliance discusses a lot on this blog and elsewhere—we care about fair use so much that we wrote a book on it! While our guide focuses on fair use for nonfiction writers, fiction authors can and do rely on fair use to create new creative works of authorship. One of the clearest examples of fair use in the realm of fiction is parody (a topic we previously discussed in our 2020 series on fair use for fiction authors). At the most basic level, a parody is defined as “a literary or musical work in which the style of an author or work is closely imitated for comic effect or in ridicule.” In today’s post, we will contextualize parodies within copyright and offer some thoughts on how law and practice surrounding parody might affect authors of parodies and authors more generally.
Parody at the Supreme Court
The landmark Supreme Court decision that proposed a framework for a use’s “transformativeness” which is still relied upon today, Campbell v. Acuff-Rose Music, was itself a case about parody. In the case, the rap group 2 Live Crew recorded and released a song entitled “Pretty Woman.” The song drew from “Oh, Pretty Woman,” a rock ballad by Roy Orbinson and William Dees which was featured in the film Pretty Woman. After reusing a familiar line from the song, the 2 Live Crew song “degenerates into a play on words, substituting predictable lyrics with shocking ones.” The song “juxtaposes the romantic musings of a man whose fantasy comes true, with degrading taunts, a bawdy demand for sex, and a sigh of relief from paternal responsibility.” The Court interpreted this change “as a comment on the naiveté of the original of an earlier day, as a rejection of its sentiment that ignores the ugliness of street life and the debasement that it signifies.” While many remember Campbell for the concept of transformativeness that it established as part of the test for fair use, it also shows the strong legal protection for parodies.
Parodies and Artistic Judgments
One issue that sometimes arises when an author defends their use of another’s work as a fair use parody is the question of whether it is a parody at all. Courts require more than claims from the author that their work is a parody in order to consider it as such. The question of whether the apparent parody uses the first work for a different purpose, or in a way that is transformative, or merely reuses existing material to unfairly benefit from that creative output, involves some artistic judgment by a court. While courts have, in different ways, resisted being slotted into the role of art critic, some judgment as to a work’s “parodic character” is inevitable.
The recent fair use case, TCA v. McCollum, is an illustrative example of this problem. The case concerned the use of a portion of the comedic routine “Who’s on First,” written by Bud Abbott and Lou Costello, in an original play entitled Hand to God, described as an “irreverent puppet comedy” about “a possessed Christian-ministry [sock] puppet.” While the play was not described as a parody, it did arguably use the comedic routine for comedic effect or ridicule, making it at the very least “parody-like.” In Hand to God, “Who’s on First” takes place as a conversation between the protagonist and Tyrone, a sock puppet worn on the character’s hand. The actor performs both roles, using different voices for the character and the sock puppet. In 2015, a district court found the use of the routine in the play to be “highly transformative,” because “[w]hereas the original Routine involved two actors whose performance falls in the vaudeville genre, Hand to God has only one actor performing the Routine in order to illustrate a larger point.” It explained that “[t]he contrast between [the character’s] seemingly soft-spoken personality and the actual outrageousness of his inner nature, which he expresses through the sock puppet, is, among other things, a darkly comedic critique of the social norms governing a small town in the Bible Belt.”
But the next year, the Second Circuit reversed the district court’s decision, finding that the use of “Who’s on First?” in Hand to God was not transformative or a fair use. The court argued that neither the playwright or the district court had explained how the use of “Who’s on First?” specifically served the “darkly comedic” aim of the play. The court added that its conclusion was bolstered by the fact that the playwright presented a portion of the routine “almost verbatim,” apparently ignoring the fact that verbatim copying can be fair use in some instances. In the view of the Second Circuit, the use of the routine did not “add something new” to “Who’s on First?” with “new expression, meaning, or message.” It is difficult to square this with the astute observations of the district court regarding the creative use of “Who’s on First” in Hand to God, and begs the question of how much artistic judgments are involved when judges decide whether a work is transformative or parodic in character.
Parody in Practice
While the fair use doctrine provides strong legal protection for parodies, in practice, authors might still be cautious about whether and how they create parodies of literary works. The possibility of facing a lawsuit related to a new book is daunting, even when those lawsuits are entirely unsuccessful. The cost of defending a copyright infringement suit can be very high, and authors are already strapped for time and resources they need to create new works of authorship. This threat of copyright liability can lead some authors and other creators to be cautious about creating parodies. For example, there has recently been news of an upcoming horror film parody of Winnie-the-Poohinvolving live action actors, entitled Winnie the Pooh: Blood and Honey. This work arguably fits squarely within the definition of parody, turning Pooh and Piglet from lovable, naive animated characters into gruesome killers portrayed by actors. Yet the filmmaker waited until Winnie-the-Pooh by A.A. Milne entered the public domain before creating the film. Moreover, the filmmaker took additional precautions to avoid antagonizing the Milne estate or Disney, who owns copyrights in the character as presented in Disney’s Winnie-the-Pooh films and movies. The filmmaker made sure to pattern the character designs off the drawings in Milne’s book rather than the Disney TV show or films, and even omitted characters like Tigger who did not appear until Milne’s later The House at Pooh Corner, which has not yet entered the public domain.
Parodies and the law around them are an important topic for authors who care about fair use. Fiction authors may be inspired to create new parodies, and nonfiction authors too can take lessons from the laws around parody as to transformativeness and practical caution, even when the law is on one’s side.