Tag Archives: Fair Use

The DMCA 1201 Rulemaking: Summary, Key Takeaways, and Other Items of Interest

Posted November 8, 2024

Last month, we blogged about the key takeaways from the 2024 TDM exemptions recently put in place by the Librarian of Congress, including how the 2024 exemptions (1) expand researchers’ access to existing corpora, (2) definitively allow the viewing and annotation of copyrighted materials for TDM research purposes, and (3) create new obligations for researchers to disclose security protocols to trade associations. Beyond these key changes, the TDM exemptions remain largely the same: researchers affiliated with universities are allowed to circumvent TPMs to compile corpora for TDM research, provided that those copies of copyrighted materials are legally obtained and adequate security protocols are put in place.

We have since updated our resources page on Text and Data Mining and have incorporated the new developments into our TDM report: Text and Data Mining Under U.S. Copyright Law: Landscape, Flaws & Recommendations.

In this blog post, we share some further reflections on the newly expanded TDM exemptions—including (1) the use of AI tools in TDM research, (2) outside researchers’ access to existing corpora, (3) the disclosure requirement, and (4) a potential TDM licensing market—as well as other insights that emerged during the 9th triennial rulemaking.

The TDM Exemption

In other jurisdictions, such as the EU, Singapore, and Japan, legal provisions that permit “text data mining” also allow a broad array of uses, such as general machine learning and generative AI model training. In the US, exemptions allowing TDM so far have not explicitly addressed whether AI could be used as a tool for conducting TDM research. In this round of remaking, we were able to gain clarity on how AI tools are allowed to aid TDM research. Advocates for the TDM exemptions provided ample examples of how machine learning and AI are key to conducting TDM research and asked that “generative AI” not be deemed categorically impermissible as a tool for TDM research. The Copyright Office agreed that a wide array of tools could be utilized for TDM research under the exemptions, including AI tools, as long as the purpose is to conduct “scholarly text and data mining research and teaching.” The Office was careful to limit its analysis to those uses and not address other applications such as compiling data—or reusing existing TDM corpora—for training generative AI models; those are an entirely separate issue from facilitating non-commercial TDM research.

Besides clarifying that AI tools are allowed for TDM research and that viewing and annotation are permitted for copyrighted materials, the new exemptions offer meaningful improvement to TDM researchers’ access to corpora. The previous 2021 exemptions allowed access for purposes of “collaboration,” but many researchers interpreted that narrowly, and the Office confirmed that “collaboration” was not meant to encompass outside research projects entirely unrelated to the original research for which the corpus was created. Under the 2021 exemptions, a TDM corpus could only be accessed by outside researchers if they are working on the same research project as the original compiler of the corpus. The 2024 exemptions’ expansion of access to existing corpora has two main components and advantages. 

The expansion now allows for new research projects to be conducted on existing corpora, permitting institutions that have created a corpus to provide access “to researchers affiliated with other nonprofit institutions of higher education, with all access provided only through secure connections and on the condition of authenticated credentials, solely for purposes of text and data mining research or teaching.” At the same time, it also opens up new possibilities for researchers at institutions who otherwise would not have access, as the new exemption does not require a precondition that the outside researchers’ institutions otherwise own copies of works in the corpora. The new exemptions pose some important limitations: only researchers at institutions of higher education are allowed this access, and nothing more than “access” is allowed—it does not, for example, allow the transfer of a corpus for local use. 

The Office emphasized the need for adequate security protections, pointing back to cases such as Authors Guild v. Google and Authors Guild v. HathiTrust, which emphasized how careful both organizations were, respectively, to prevent their digitized corpora from being misused. To take advantage of this newly expanded TDM exemption, it will be crucial for universities to provide adequate IT support to ensure that technical barriers do not impede TDM researchers. That said, the record for the exemption shows that existing users are exceedingly conscientious when it comes to security. There have been zero reported instances of security breaches or lapses related to TDM corpora being compiled and used under the exemptions. 

As we previously explained, the security requirements are changed in a few ways. The new rule clarifies that trade associations can send inquiries on behalf of rightsholders. However, inquiries must be supported by a “reasonable belief” that the sender’s works are in a corpus being used for TDM research. It remains to be seen how the new obligation to disclose security measures to trade associations would impact TDM researchers and their institutions. The Register circuitously called out demands by trade associations sent to digital humanities researchers in the middle of the exemption process with a two-week response deadline as unreasonable and quoted NTIA (which provides input on the exemptions) in agreement that  “[t]he timing, targeting, and tenor of these requests [for institutions to disclose their security protocols] are disturbing.”  We are hopeful that this discouragement from the Copyright Office will prevent any future large-scale harassment towards TDM researchers and their institutions, but we will also remain vigilant in case trade associations were to abuse this new power. 

Alongside the concerns over disclosure requirements, we have some questions about the Copyright Office’s treatment of fair use as a rationale for circumventing TPMs for TDM research. The Register restated her 2021 conclusion that “under Authors Guild, Inc. v. HathiTrust, lost licensing revenue should only be considered ‘when the use serves as a substitute for the original.’” The Office, in its recommendations, placed considerable weight on the lack of a viable licensing market for TDM, which raises a concern that, in the Office’s view, a use that once was fair and legal might lose that status when the rightsholder starts to offer an adequate licensing option. While this may never become a real issue for the existing TDM exemptions (because no sufficient licensing options exist for TDM researchers, and for the breadth and depth of content needed, it seems unlikely to ever develop), it nonetheless contributes to the growing confusion surrounding the stability of a fair use defense in the face of new licensing markets. 

These concerns highlight the need for ongoing advocacy in the realm of TDM research. Overall, the Register of Copyright recognizes TDM as “a relatively new field that is quickly evolving.” This means that we could ask the Library of Congress to relax the limitations placed on TDM if we can point to legitimate research-related purposes. But, due to the nature of this process, it also means TDM researchers do not have a permanent and stable right to circumvent TPMs. As the exemptions remain subject to review every three years, many large trade associations advocate for the TDM exemptions to be greatly limited or even canceled, wishing to stifle independent TDM research. We will continue to advocate for TDM researchers, as we did during the 8th and 9th triennial rulemaking. 

Looking beyond the TDM exemption, we noted a few other developments: 

Warhol has not fundamentally changed fair use

First, the Opponents of the renewal of the existing exemptions repeatedly pointed to Warhol Foundation v. Goldsmith—the Supreme Court’s most recent fair use opinion—to argue that it has changed the fair use analysis such that the existing exemptions should not be renewed. For example, the Opponents argued that the fair use analysis for repairing medical devices changed under Warhol because, according to them, commercial nontransformative uses were less likely to be fair. The Copyright Office did not agree. The Register said that the same fair use analysis as in 2021 applied and that the Opponents failed “to show that the Warhol decision constitutes intervening legal precedent rendering the Office’s prior fair use analysis invalid.” In another instance where the Opponents tried to argue that commerciality must be given more weight under Warhol, the Register pointed out that under Warhol commerciality is not dispositive and must be weighed against the purpose of the new use.  The arguments for revisiting the 2021 fair use analyses were uniformly rejected, which we think is good news for those of us who believe Warhol should be read as making a modest adjustment to fair use and not a wholesale reworking of the fair use doctrine. 

Does ownership and control of copies matter for access? 

One of the requests before the Office was an expansion of an exemption that allows for access to preservation copies of computer programs and video games. The Office rejected the main thrust of the request but, in doing so, also provided an interesting clarification that may reveal some of the Office’s thinking about the relationship between fair use and access to copies owned by the user: 

The Register concludes that proponents did not show that removing the single user limitation for preserved computer programs or permitting off-premises access to video games are likely to be noninfringing. She also notes the greater risk of market harm with removing the video game exemption’s premises limitation, given the market for legacy video games. She recommends clarifying the single copy restriction language to reflect that preservation institutions can allow a copy of a computer program to be accessed by as many individuals as there are circumvented copies legally owned.”

That sounds a lot like an endorsement of the idea that the owned-to-loaned ratio, a key concept in the controlled digital lending analysis, should matter in the fair use analysis (which is something the Hachette v. Internet Archive controlled digital lending court gave zero weight to). For future 1201 exemptions, we will have to wait and see whether the Office will use this framework in other contexts. 

Addressing other non-copyright and AI questions in the 1201 process

The Librarian of Congress’s final rule included a number of notes on issues not addressed by the rulemaking: 

“The Librarian is aware that the Register and her legal staff have invested a great deal of time over the past two years in analyzing the many issues underlying the 1201 process and proposed exemptions. 

Through this work, the Register has come to believe that the issue of research on artificial intelligence security and trustworthiness warrants more general Congressional and regulatory attention. The Librarian agrees with the Register in this assessment. As a regulatory process focused on technological protection measures for copyrighted content, section 1201 is ill-suited to address fundamental policy issues with new technologies.” 

Proponents tried to argue that the software platforms’ restrictions and barriers to conducting AI research, such as their account requirements, rate limits, and algorithmic safeguards, are circumventable TPMs under 1201, but the Register disagreed. The Register maintained that the challenges Proponents described arose not out of circumventable TPMs but out of third-party controlled Software as a Service platforms. This decision can be illuminating for TDM researchers seeking to conduct TDM research on online streaming media or social media posts.

The Librarian’s note went on to say: “The Librarian is further aware of the policy and legal issues involving a generalized ‘‘right to repair’’ equipment with embedded software. These issues have now occupied the White House, Congress, state legislatures, federal agencies, the Copyright Office, and the general public through multiple rounds of 1201 rulemaking. 

Copyright is but one piece in a national framework for ensuring the security, trustworthiness, and reliability of embedded software, as well as other copyright-protected technology that affects our daily lives. Issues such as these extend beyond the reach of 1201 and may require a broader solution, as noted by the NTIA.”

These notes give an interesting, though a bit confusing, insight into how the Librarian of Congress and the Copyright Office think about the role of 1201 rulemaking when they address issues that go beyond copyright’s core concerns. While we can agree that 1201 is ill-suited to address fundamental policy issues with new technology, it is also somewhat concerning that the Office and the Librarian view copyright more generally as part of a broader “national framework for ensuring the security, trustworthiness, and reliability of embedded software.”  While of course, copyright is sometimes used to further ends outside of its intended purpose, these issues are far from the core constitutional purpose of copyright law and we think they are best addressed through other means. 

Artist Left with Heavy Fees by Copyright Troll Law Firm

Posted October 11, 2024

Facts of the Case & Fair Use

On September 18, the 5th Circuit decided in Keck v. Mix Creative Learning Center that using copyrighted artwork to teach children how to make art in a similar style does not constitute copyright infringement. The case adds to the well-developed jurisprudence that teaching with copyrighted materials is often protected by fair use.

This case was initially filed in 2021 by plaintiff’s counsel, Mathew Kidman Higbee, a known and prolific copyright litigation firm sometimes accused of troll-like behavior.  During the pandemic, the defendant sold a total of six art kits (out of the six kits sold, two were purchased by the plaintiff) that included images of the plaintiff’s dog-themed artworks, biographical information, and details on her artistic styles. Additionally, the kit included paint, paintbrushes, and collage paper. The plaintiff’s side argued that including the artworks in teaching kits constituted willful copyright infringement and therefore demanded $900,000 in damages—to make up for the $250 the defendant made in sales. 

The district court dismissed all infringement claims in 2022; and last month, the 5th Circuit court affirmed that including copies of plaintiff’s artwork in a teaching kit is fair use. 

The courts found the first and fourth fair use factors to favor the defendant. Under the first factor, even though the defendant’s use was commercial in nature, by accompanying the artworks with art theory and history, the teaching kit transformed the original decorative purpose of the dog-themed artworks. The 5th Circuit distinguished this case from Warhol by pointing out that, in the Warhol case, the infringing use served the same illustrative purpose as the original work, while in this case, “the art kits had educational objectives, while the original works had aesthetic or decorative objectives.”  

Under the fourth factor, courts explained that they cannot imagine how the market value of plaintiff’s dog-themed artworks could decrease when included in children’s art lesson kits. The 5th Circuit Court further pointed out that there was no evidence that a market for licensing artworks for similar teaching kits exists now or is ever likely to develop. 

Because these “two most important” factors favored the defendant, the defendant’s use was fair use.

Fee Shifting: Plaintiffs Beware of Copyright Troll Law Firms!

The final outcome of the case: the plaintiff was ordered to cover $102,404 in fees and $165.72 in costs for the defendant.

Even though we are happy for the defendant and her counsel that, after a prolonged legal battle, this well-deserved victory is finally won, it is nevertheless disheartening to see the plaintiff-artist left alone in the end to face the high legal fees of this ill-conceived lawsuit. The plaintiff’s counsel not only failed to advise the plaintiff to act in her own best interest (whether it is to settle the case at the right moment or to pursue more plausible claims), but also conjured up willful infringement claims that were clearly meritless to any trained eye. Even the 5th Circuit Court lamented over this in its opinion, as it begrudgingly upheld the district court’s decision based on the abuse of discretion standard it must follow:

It is troubling that Keck alone will be liable for the high fees incurred by Defendants largely because of Higbee & Associates’ overly aggressive litigation strategy. From our review of the record, the law firm lacked a firm evidentiary basis to pursue hundreds of thousands of dollars in statutory damages against Defendants for willful infringement. Nevertheless, we cannot say, on an abuse of discretion standard, that the district court erred by determining that there was insufficient evidence that the firm’s conduct was both unreasonable and vexatious. … But we warn Higbee & Associates that future conduct of this nature may well warrant sanctions, and nothing in this opinion prevents Higbee & Associates from compensating its client, if appropriate, for the fees that she is now obliged to pay Defendants.

This should serve as a cautionary tale for would-be plaintiffs: copyright lawsuits, like any other type of litigation, are primarily meant to address the damages plaintiffs actually suffered, and the final settlement should make plaintiffs whole again—that is, as if no infringement has ever occurred. Copyright lawsuits (or the threat to sue) should not be undertaken as a way to create brand new income streams, such as was the case in the lawsuit described above. 

When someone aggressively enforces dubious copyright claims with the sole purpose of collecting exorbitant fees rather than protecting any underlying copyrights, they are called a “copyright troll.” Regrettably, beyond the disreputable law firms that are enthused to pursue aggressive claims, many services now exist to tempt creators into troll-like behavior by promising “new licensing income.” The true aim of these services is solely to collect high representation charges from creators, when users of the creators’ works are harassed into paying exorbitant settlements. Many victims often agree to pay just for the nuisance to stop. This predatory business model has been repeatedly exposed by creators and authors, including famously by Cory Doctorow

Needless to say, copyright trolls are harmful to the copyright ecosystem. Obviously, innocent users are harmed when slapped with unreasonable demand letters or even frivolous lawsuits. Worse, creators are misled into supporting this unethical practice while deluded into believing they are doggedly following the spirit of the law—sometimes, as was in this case, they are left to face the inevitable consequences of bringing a frivolous lawsuit, while the lawyer or agent that originally led them into the mire gets off free, upward and onward to their next “representation.” 

It was very unfortunate that the district court did not fully study the plaintiff’s counsel’s track record and issue appropriate disciplinary orders against him. The problem of copyright trolls will have to be addressed soon in order to preserve a healthy copyright system. 

What is “Derivative Work” in the Digital Age?

Posted October 7, 2024
on the top, Seltzer v. Green Day; on the bottom, Kienitz v. Sconnie Nation

Part I: The Problem with “Derivative Work”

The right to prepare derivative works is one of the exclusive rights copyright holders have under §106 of the Copyright Act. Other copyright holders’ exclusive rights include the right to make and distribute copies, and to display or perform a work publicly. 

Lately, we’ve seen a congeries of novel conceptions about “derivative works.” For example, a reader of our blog stated that when looking at AI models and AI outputs, works should be considered infringing “derivatives” even when there is no substantial similarity between the infringing AI model/outputs and the ingested originals. Even in the courts, we’ve seen confusion, for example, Hachette v. Internet Archive presented us with the following statement about derivative works:

Changing the medium of a work is a derivative use rather than a transformative one. . . . In fact, we have characterized this exact use―“the recasting of a novel as an e-book”―as a “paradigmatic” example of a derivative work. [citation omitted; emphasis added]

These statements leave one to wonder—what is a copy, a derivative work, an infringing use, and a transformative fair use in the context of U.S. copyright law? In order to have some clarity on these questions, it’s helpful to juxtapose “derivative works” first with “copies” and then with “transformative uses.” We think the confusion about derivative work and its related concepts arises out of using the phrase to mean “a work that is substantially similar to the original work” as well as “a work that is so in an unauthorized way, not excused from liabilities.”

There are many immediate real world implications for confusion over the meaning of “derivative work.” In privately negotiated agreements, licensees who have a right to make reproductions but not derivative works may be confused as to what medium their use is restricted to. For example, a publisher of a book with a license that allows it to make reproductions but not derivatives might be confused as to whether, under the Hachette court’s reasoning, it is allowed to republish a print book in a digital format such as a simple PDF of a scan. Similarly, for public licenses, such as the CC ND licenses, where a licensor stipulates restriction on the creation of derivative works, it causes confusion for downstream users whether, say, changing a pdf into a Word document is allowed. 

This is also an important topic to explore both in the recent hot debates over Controlled Digital Lending and generative artificial intelligence, as well as in an author’s everyday work—for instance, would quoting someone else’s work make your article/book a derivative work of the original? 

Part II: “Copies” and “Derivatives”

Our basic understanding of derivative works comes from the 1976 Copyright Act. The §101 definition tells us:

A “derivative work” is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications which, as a whole, represent an original work of authorship, is a “derivative work”.

The U.S. Copyright Office published Circular 14 gives some further helpful guidance as to what a §106 derivative work would look like:

To be copyrightable, a derivative work must incorporate some or all of a preexisting “work” and add new original copyrightable authorship to that work. The derivative work right is often referred to as the adaptation right. The following are examples of the many different types of derivative works: 

  • A motion picture based on a play or novel 
  • A translation of an novel written in English into another language
  • A revision of a previously published book 
  • A sculpture based on a drawing 
  • A drawing based on a photograph 
  • A lithograph based on a painting 
  • A drama about John Doe based on the letters and journal entries of John Doe 
  • A musical arrangement of a preexisting musical work 
  • A new version of an existing computer program 
  • An adaptation of a dramatic work 
  • A revision of a website

One immediate observation that can be made from reading these, is that “ebook” or “digitized version of a work” is not listed as, nor similar to any of the exemplary derivative works in the Copyright Act or the Copyright Office Circular. By contrast, “ebook” or “digitized version of a work” seems to fit much better under the § 101 definition of “copies”:

“Copies” are material objects, other than phonorecords, in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device. The term “copies” includes the material object, other than a phonorecord, in which the work is first fixed.

The most crucial difference between a “copy” and a “derivative work” is whether new authorship is added. If no new authorship is added, merely changing the material that the work is fixed on does not create a new copyrightable derivative work. This, in fact, is observed by many courts before Hachette. For example, in Corel v. Bridgeman Art Gallery, the court unequivocally held that there is no new copyright granted to photos of public domain paintings. 

Additionally, as we know from Feist v. Rural Tel., “[t]he mere fact that a work is copyrighted does not mean that every element of the work may be protected.” Copyright protection is only limited to the original elements of a work. We cannot call a work “derivative” of another if it does not incorporate any copyrightable elements from the original copyrighted work. For example, the “Game Genie” device, which let players change elements of a Nintendo game, was not found to be a derivative work by the court because it didn’t incorporate any part of the Nintendo game. 

It is clear from this examination that sometimes a later-created work is a copy, sometimes a derivative, and sometimes it may not implicate any of the exclusive rights of the original.

Part III: “Derivative” and “Transformative” Works

Let’s quickly recap the context in which courts are confusing “derivative” and “transformative” works—

A prima facie case of copyright infringement requires the copyright holder to prove (1) ownership of a valid copyright, and (2) inappropriate copying of original elements. We will not go into more details here, but essentially, the inappropriate copying prong requires plaintiffs to assert and prove defendant’s access to the plaintiff’s work as well as a level of similarity between the works in question that shows improper appropriation of the plaintiff’s work. If the similarity between the defendant’s work and protectable elements in the plaintiff’s work is minimal, then there is no infringement. As seen in the  “Game Genie” example above, courts can rely on substantial similarity analysis to determine whether a work is indeed a potentially-infringing copy or derivative of the plaintiff’s work.

Once the plaintiff establishes a prima facie infringement case—e.g., the defendant’s work is shown to be a derivative or a copy of the plaintiff’s registered work—the defendant may still nevertheless be free to make the use if the use falls outside the ambit of the copyright holder’s §106 rights, such as uses that are fair use. Whether a work is a derivative work under § 106 is no longer a relevant inquiry after establishing a prima facie case: this point is starkly obvious when looking at the many plausible defenses a defendant can raise (including fair use) where even the verbatim copying of a work is authorized by law. 

As the court stated in Authors Guild v. Hathitrust, “there are important limits to an author’s rights to control original and derivative works. One such limit is the doctrine of ‘fair use,’ which allows the public to draw upon copyrighted materials without the permission of the copyright holder in certain circumstances.” When a prima facie infringement case is already established, yet a court still discusses whether the defendant’s work is a “derivative work,” at a minimum, the court adds confusion by beyond the § 101  definition of a derivative work. 

In fact, a distinct new significance is being given to “derivative work” in recent years in the context of the “purpose and character” factor of fair use, specifically, when analyzing if a use has a transformative purpose. The shift in a word’s meaning or a concept is not per se unimaginable or objectionable. It is misguided to consider the copyright legal landscape static. As law professor Pamela Samuelson pointed out, before the mid-19th century, most courts did not even think copyright holders were entitled to demand compensation from others preparing derivative works. The 1976 Copyright Act finally codified copyright holders’ exclusive right to prepare derivative works. And, now, some rights holders want the courts to say there are categorical derivative uses that can never be considered fair use.

The Hachette court is among those that have unfortunately bought into this novel approach. The court seems not only to misconstrue the salient distinction between a ‘copy of a work’ and a ‘derivative work’, they appear to give heightened protections to works they now define as ‘derivative’. If this misconception becomes widespread, we will be living in a world where if a use is new-derivative, then it is never transformative (and, if it is not transformative, it is likely not fair). Ultimately, it is purely circular for a court to say that the reason for denying the fair use defense is that the use is derivative. When we buy into this setup of “derivative v.s. transformative,” it is difficult to ever say with confidence that a work is transformative, because at the same time we remember how a transformative use should often fit in the actual definition of derivative work under § 101, “derivative”—just like the Green Day rendition of the plaintiff’s art in Seltzer v. Green Day.  

Clearly, if we take “derivative work” at its true § 101 definition, out of all potentially infringing works, “transformative fair use” is not an absolute complement, but a possible subset, of derivative works. We know from Campbell v. Acuff-Rose that “transformativeness is a matter of degree, not a binary;” whereas no such sliding scale is plausible for derivative works. A work is either a derivative or it is not: there’s never a “somewhat derivative” work in copyright. All in all, it makes little sense to frame the issues as “transformative v.s. derivative work”—such discussions inevitably buy into the rhetorics of copyright expansionists. We have already warned the court in Warhol against the danger of speaking heedlessly about derivative works in the context of fair use. We must ensure that the “derivative v.s. transformative” dichotomy does not come to dominate future discussions of fair use, so that we conserve the utility and clarity of the fair use doctrine.

The expansion of the relevance of “derivative work” beyond the establishment of a prima facie infringement case not only creates a circular reasoning for denying fair use, but also makes it impossible to make sense of the case law we have accumulated on fair use. Take Seltzer v. Green Day for example, the court held that a work can be transformative even if that work “makes few physical changes to the original.” The Green Day concert background art with a red cross superimposed was found to be a fair use of the original street art—a classic example of how a prima facie infringing derivative work can nevertheless be a transformative, and thus fair, use. Similarly, in Kienitz v. Sconnie Nation, a derivative use of a photo on a tshirt was found to be a fair use. Ideas and concepts, including “derivative works,” are only important to the extent they elucidate our understanding of the world. When the use of “derivative works” leads to more confusion than clarity, we should be cautious in adopting the new meaning being superimposed on “derivative works.”

Clickbait arguments in AI Lawsuits (will number 3 shock you?)

Posted August 15, 2024

Image generated by Canva

The booming AI industry has sparked heated debates over what AI developers are legally allowed to do. So far, we have learned from the US Copyright Office and courts that AI created works are not protectable, unless it is combined with human authorship. 

As we monitor two dozen ongoing lawsuits and regulatory efforts that address various aspects of AI’s legality, we see legitimate legal questions that must be resolved. However, we also see some prominent yet flawed arguments that have been used to enflame discussions, particularly by publisher-plaintiffs and their supporters. For now, let’s focus on some clickbait arguments that sound appealing but are fundamentally baseless. 

Will AI doom human authorship?

Based on current research, AI tools can actually help authors improve creativity, productivity, as well as the longevity of their career

When AI tools such as ChatGPT first appeared online, many leading authors and creators publicly endorsed it as a useful tool like any other tech innovation that came before it. At the same time, many others claimed that authors and creators of lesser caliber will be disproportionately disadvantaged by the advent of AI. 

This intuition-driven hypothesis, that AI will be the bane of average authors, has so far proved to be misguided.

We now know that AI tools can greatly help authors during the ideation stage, especially for less creative authors. According to a study published last month, AI tools had minimal impact on the output of highly creative authors, but were able to enhance the works of less imaginative authors. 

AI can also serve as a readily-accessible editor for authors. Research shows that AI enhances the quality of routine communications. Without AI-powered tools, a less-skilled person will often struggle with the cognitive burden of managing data, which limits both the quality and quantity of their potential output. AI helps level the playing field by handling data-intensive tasks, allowing writers to focus more on making creative and other crucial decisions about their works. 

It is true that entirely AI-generated works of abysmal quality are available for purchase on some platforms. Some of these works are using human authors’ names without authorization. These AI-generated works may infringe on authors’ right of publicity, but they do not present commercially-viable alternatives to books authored by humans. Readers prefer higher-quality works produced with human supervision and interference (provided that digital platforms do not act recklessly towards their human authors despite generating huge profits from human authors).

Are lawsuits against AI companies brought with authors’ best interest in mind? 

In the ongoing debate over AI, publishers and copyright aggregators have suggested that they have brought these lawsuits to defend the interests of human authors. Consider the New York Times for example, in its complaint against OpenAI, NY Times describes their operations as “a creative and deeply human endeavor (¶31)” that necessitates “investment of human capital (¶196).” NY Times argues that OpenAI has built innovation on the stolen hard work and creative output from journalists, editors, photographers, data analysts, and others—an argument contrary to what the NY Times once argued in court in New York Times v. Tasini,  that authors’ rights must take a backseat to NY Times’ financial interests in new digital uses.  

It is also hard to believe that many of the publishers and aggregators are on the side of authors when we look at how they have approached licensing deals for AI training. These licensing deals can be extremely profitable for the publishers. For example, Taylor and Francis sold AI training data to OpenAI for 10 million USD. John Wiley and Sons earned $23 million from a similar deal with a non-disclosed tech company. Though we don’t have the details of these agreements, it seems easy to surmise that in return for the money received, the publishers will not harass the AI companies with future lawsuits. (See our previous blog post about these licensing deals and what you can do as an author.) It is ironic how an allegedly unethical and harmful practice quickly becomes acceptable once the publishers are profiting from it.

How much of the millions of dollars changing hands will go to individual authors? Limited data exist. We know that Cambridge University Press, a good-faith outlier, is offering authors 20% royalties if their work is licensed for AI training. Most publishers and aggregators are entirely opaque about how authors are to be compensated in these deals. Take the Copyright Clearance Center (CCC) for example, it offers zero information about how individual authors are consulted or compensated when their works are sold for AI training under CCC AI training license.

This is by no means a new problem for authors. We know that traditionally-published book authors receive around 10% of royalties from their publishers: a little under $2 per copy for most books. On an ebook, authors receive a similar amount for each “copy” sold. This little amount handed to authors only starts to look generous when compared to academic publishing, where authors increasingly pay publishers to have their articles published in journals. The journal authors receive zero royalties, despite the publishers’ growing profit

Even before the advent of AI technology, most authors were struggling to make a living on writing alone. According to an Authors Guild’s survey in 2018, the median income for full-time writers was $20,300, and for part-time writers, a mere $6,080. Fair wage and equitable profit sharing is an issue that needs to be settled between authors and publishers, even if publishers try to scapegoat AI companies. 

It’s worth acknowledging that it’s not just publishers and copyright industry organizations filing these lawsuits. Many of these ongoing lawsuits have been filed as class actions, with the plaintiffs claiming to represent a broad class of people who are similarly situated and (thus they alleged) hold similar views. Most notably, in Authors Guild v. OpenAI, Authors Guild and its named individual plaintiffs claim to represent all fiction writers in the US who have sold more than 5000 copies of a work. There’s also another case where plaintiff claims to represent all copyright holders of non-fiction works, including authors of academic journal articles, which got support from Authors Guild, and several others in which an individual plaintiff asserts the right to represent virtually all copyright holders of any type

As we (along with many others) have repeatedly pointed out, many authors disagree with the publishers and aggregators’ restrictive view on fair use in these cases, and don’t want or need a self-appointed guardian to “protect” their interests.  We have seen the same over-broad class designation in the Authors Guild v. Google case, which caused many authors to object, including many of our own 200 founding members.

Respect for copyright and human authors’ hard work means no more AI training under US copyright law? 

While we wait for courts to figure out the key questions on infringement and fair use, let’s take a moment to remember what copyright law does not regulate.

Copyright law in the US exists to further the Constitutional goal to “promote the Progress of Science and useful Arts.” In 1991, the Supreme Court held in Feist v. Rural Telephone Service that copyright cannot be granted solely based on how much time or energy authors have expended. “Compensation for hard work“ may be a valid ethical discussion, but it is not a relevant topic in the context of copyright law.

Publishers and aggregators preach that people must “respect copyright,” as if copyright is synonymous with the exclusive rights of the copyright holder. This is inaccurate and misleading. In order to safeguard the freedom of expression, copyright is designed to embody not only the rightsholders’ exclusive rights but also many exceptions and limitations to the rightsholders’ exclusive rights. Similarly, there’s no sound legal basis to claim that authors must have absolute control over their own work and its message. Knowledge and culture thrives because authors are permitted to build upon and reinterpret the works of others

Does this mean I should side with the AI companies in this debate?

Many of the largest AI companies exhibit troubling traits that they have in common with many publishers, copyright aggregators, digital platforms (e.g., Twitter, TikTok, Youtube, Amazon, Netflix, etc.), and many other companies with dominant market power. There’s no transparency or oversight afforded to the authors or the public. The authors and the public have little say in how the AI models are trained, just like how we have no influence over how content is moderated on digital platforms, how much royalties authors receive from the publishers, or how much publishers and copyright aggregators can charge users. None of these crucial systematic flaws will be fixed by granting publishers a share of AI companies’ revenue. 

Copyright also is not the entire story. As we’ve seen recently, there are some significant open questions about the right of publicity and somewhat related concerns about the ability of AI to churn out digital fakes for all sorts of purposes, some of which are innocent, but others are fraudulent, misleading, or exploitative. The US Copyright Office released a report on digital replicas on July 31 addressing the question of digital publicity rights, and on the same day the NO FAKES Act was officially introduced. Will the rights of authors and the public be adequately considered in that debate? Let’s remain vigilant as we wait to see the first-ever AI-generated public figure in a leading role to hit theaters in September 2024.

Introducing the Authors Alliance’s First Zine: Can Authors Address AI Bias?

Posted May 31, 2024

This guest post was jointly authored by Mariah Johnson and Marcus Liou, student attorneys in Georgetown’s Intellectual Property and Information Policy (iPIP) Clinic.

Generative AI (GenAI) systems perpetuate biases, and authors can have a potent role in mitigating such biases.

But GenAI is generating controversy among authors. Can authors do anything to ensure that these systems promote progress rather than prevent it? Authors Alliance believes the answer is yes, and we worked with them to launch a new zine, Putting the AI in Fair Use: Authors’ Abilities to Promote Progress, that demonstrates how authors can share their works broadly to shape better AI systems. Drawing together Authors Alliance’s past blog posts and advocacy discussing GenAI, copyright law, and authors, this zine emphasizes how authors can help prevent AI bias and protect “the widest possible access to information of all kinds.” 

As former Copyright Register Barbara Ringer articulated, protecting that access requires striking a balance with “induc[cing] authors and artists to create and disseminate original works, and to reward them for their contributions to society.” The fair use doctrine is often invoked to do that work. Fair use is a multi-factor standard that allows limited use of copyrighted material—even without authors’ credit, consent, or compensation–that asks courts to examine:

(1) the purpose and character of the use, 

(2) the nature of the copyrighted work, 

(3) the amount or substantiality of the portion used, and 

(4) the effect of the use on the potential market for or value of the work. 

While courts have not decided whether using copyrighted works as training data for GenAI is fair use, past fair use decisions involving algorithms, such as Perfect 10, iParadigms, Google Books, and HathiTrust favored the consentless use of other people’s copyrighted works to create novel computational systems. In those cases, judges repeatedly found that algorithmic technologies aligned with the Constitutional justification for copyright law: promoting progress.

But some GenAI outputs prevent progress by projecting biases. GenAI outputs are biased in part because they use biased, low friction data (BLFD) as training data, like content scraped from the public internet. Examples of BLFD include Creative Commons (CC) licensed works, like Wikipedia, and works in the public domain. While Wikipedia is used as training data in most AI systems, its articles are overwhelmingly written by men–and that bias is reflected in shorter and fewer articles about women. And because the public domain cuts off in the mid-1920s, those works often reflect the harmful gender and racial biases of that time. However, if authors allow their copyrighted works to be used as GenAI training data, those authors can help mitigate some of the biases embedded in BLFD. 

Current biases in GenAI are disturbing. As we discuss in our zine, word2vec is a very popular toolkit used to help machine learning (ML) models recognize relationships between words–like women as homemakers and Black men with the word “assaulted.” Similarly, OpenAI’s GenAI chatbox ChatGPT, when asked to generate letters of recommendation, used “expert,” “reputable,” and “authentic” to describe men and  “beauty,” “stunning,” and “emotional” for women, discounting women’s competency and reinforcing harmful stereotypes about working women. An intersectional perspective can help authors see the compounding impact of these harms. What began as a legal framework to describe why discrimination law did not adequately address harms facing Black women, it is now used as a wider lens to consider how marginalization affects all people with multiple identities. Coined by Professor Kimberlé Crenshaw in the late 1980s, intersectionality uses critical theory like Critical Race Theory, feminism, and working-class studies together as “a lens . . . for seeing the way in which various forms of inequality often operate together and exacerbate each other.” Contemporary authors’ copyrighted works often reflect the richness of intersectional perspectives, and using those works as training data can help mitigate GenAI bias against marginalized people by introducing diverse narratives and inclusive language. Not always–even recent works reflect bias–but more often than might be possible currently.

Which brings us back to fair use. Some corporations may rely on the doctrine to include more works by or about marginalized people in an attempt to mitigate GenAI bias. Professor Mark Lemley and Bryan Casey have suggested “[t]he solution [to facial recognition bias] is to build bigger databases overall or to ‘oversample’ members of smaller groups” because “simply restricting access to more data is not a viable solution.” Similarly, Professor Matthew Sag notes that “[r]estricting the training data for LLMs to public domain and open license material would tend to encode the perspectives, interests, and biases of a distinctly unrepresentative set of authors.” However, many marginalized people may wish to be excluded from these databases rather than have their works or stories become grist for the mill. As Dr. Anna Lauren Hoffman warns, “[I]nclusion reinforces the structural sources of violence it supposedly addresses.”

Legally, if not ethically, fair use may moot the point. The doctrine is flexible, fact-dependent, and fraught. It’s also fairly predictable, which is why legal precedent and empirical work have led many legal scholars to believe that using copyrighted works as training data to debias AI will be fair use–even if that has some public harms. Back in 2017, Professor Ben Sobel concluded that “[i]f engineers made unauthorized use of copyrighted data for the sole purpose of debiasing an expressive program, . . . fair use would excuse it.” Professor Amanda Levendowski has explained why and how “[f]air use can, quite literally, promote creation of fairer AI systems.” More recently, Dr. Mehtab Khan and Dr. Alex Hanna  observed that “[a]ccessing copyright work may also be necessary for the purpose of auditing, testing, and mitigating bias in datasets . . . [and] it may be useful to rely on the flexibility of fair use, and support access for researchers and auditors.” 

No matter how you feel about it, fair use is not the end of the story. It is ill-equipped to solve the troubling growth of AI-powered deepfakes. After being targeted by sexualized deepfakes, Rep. Ocasio-Cortez described “[d]eepfakes [as] absolutely a way of digitizing violent humiliation against other people.” Fair use will not solve the intersectional harms of AI-powered face surveillance either. Dr. Joy Buolamwini and Dr. Timnit Gebru evaluated leading gender classifiers used to train face surveillance technologies and discovered that they more accurately classified males over females and lighter-skinned over darker-skinned people. The researchers also discovered that the “classifiers performed worst on darker female subjects.” While legal scholars like Professors Shyamkrishna Balganesh, Margaret Chon, and Cathay Smith argue that copyright law can protect privacy interests, like the ones threatened by deepfakes or face surveillance, federal privacy laws are a more permanent, comprehensive way to address these problems.

But who has time to wait on courts and Congress? Right now, authors can take proactive steps to ensure that their works promote progress rather than prevent it. Check out the Authors Alliance’s guides to Contract Negotiations, Open Access, Rights Reversion, and Termination of Transfer to learn how–or explore our new zine, Putting the AI in Fair Use: Authors’ Abilities to Promote Progress.

You can find a PDF of the Zine here, as well as printer-ready copies here and here.