Category Archives: Blog

Introducing the Authors Alliance’s First Zine: Can Authors Address AI Bias?

Posted May 31, 2024

This guest post was jointly authored by Mariah Johnson and Marcus Liou, student attorneys in Georgetown’s Intellectual Property and Information Policy (iPIP) Clinic.

Generative AI (GenAI) systems perpetuate biases, and authors can have a potent role in mitigating such biases.

But GenAI is generating controversy among authors. Can authors do anything to ensure that these systems promote progress rather than prevent it? Authors Alliance believes the answer is yes, and we worked with them to launch a new zine, Putting the AI in Fair Use: Authors’ Abilities to Promote Progress, that demonstrates how authors can share their works broadly to shape better AI systems. Drawing together Authors Alliance’s past blog posts and advocacy discussing GenAI, copyright law, and authors, this zine emphasizes how authors can help prevent AI bias and protect “the widest possible access to information of all kinds.” 

As former Copyright Register Barbara Ringer articulated, protecting that access requires striking a balance with “induc[cing] authors and artists to create and disseminate original works, and to reward them for their contributions to society.” The fair use doctrine is often invoked to do that work. Fair use is a multi-factor standard that allows limited use of copyrighted material—even without authors’ credit, consent, or compensation–that asks courts to examine:

(1) the purpose and character of the use, 

(2) the nature of the copyrighted work, 

(3) the amount or substantiality of the portion used, and 

(4) the effect of the use on the potential market for or value of the work. 

While courts have not decided whether using copyrighted works as training data for GenAI is fair use, past fair use decisions involving algorithms, such as Perfect 10, iParadigms, Google Books, and HathiTrust favored the consentless use of other people’s copyrighted works to create novel computational systems. In those cases, judges repeatedly found that algorithmic technologies aligned with the Constitutional justification for copyright law: promoting progress.

But some GenAI outputs prevent progress by projecting biases. GenAI outputs are biased in part because they use biased, low friction data (BLFD) as training data, like content scraped from the public internet. Examples of BLFD include Creative Commons (CC) licensed works, like Wikipedia, and works in the public domain. While Wikipedia is used as training data in most AI systems, its articles are overwhelmingly written by men–and that bias is reflected in shorter and fewer articles about women. And because the public domain cuts off in the mid-1920s, those works often reflect the harmful gender and racial biases of that time. However, if authors allow their copyrighted works to be used as GenAI training data, those authors can help mitigate some of the biases embedded in BLFD. 

Current biases in GenAI are disturbing. As we discuss in our zine, word2vec is a very popular toolkit used to help machine learning (ML) models recognize relationships between words–like women as homemakers and Black men with the word “assaulted.” Similarly, OpenAI’s GenAI chatbox ChatGPT, when asked to generate letters of recommendation, used “expert,” “reputable,” and “authentic” to describe men and  “beauty,” “stunning,” and “emotional” for women, discounting women’s competency and reinforcing harmful stereotypes about working women. An intersectional perspective can help authors see the compounding impact of these harms. What began as a legal framework to describe why discrimination law did not adequately address harms facing Black women, it is now used as a wider lens to consider how marginalization affects all people with multiple identities. Coined by Professor Kimberlé Crenshaw in the late 1980s, intersectionality uses critical theory like Critical Race Theory, feminism, and working-class studies together as “a lens . . . for seeing the way in which various forms of inequality often operate together and exacerbate each other.” Contemporary authors’ copyrighted works often reflect the richness of intersectional perspectives, and using those works as training data can help mitigate GenAI bias against marginalized people by introducing diverse narratives and inclusive language. Not always–even recent works reflect bias–but more often than might be possible currently.

Which brings us back to fair use. Some corporations may rely on the doctrine to include more works by or about marginalized people in an attempt to mitigate GenAI bias. Professor Mark Lemley and Bryan Casey have suggested “[t]he solution [to facial recognition bias] is to build bigger databases overall or to ‘oversample’ members of smaller groups” because “simply restricting access to more data is not a viable solution.” Similarly, Professor Matthew Sag notes that “[r]estricting the training data for LLMs to public domain and open license material would tend to encode the perspectives, interests, and biases of a distinctly unrepresentative set of authors.” However, many marginalized people may wish to be excluded from these databases rather than have their works or stories become grist for the mill. As Dr. Anna Lauren Hoffman warns, “[I]nclusion reinforces the structural sources of violence it supposedly addresses.”

Legally, if not ethically, fair use may moot the point. The doctrine is flexible, fact-dependent, and fraught. It’s also fairly predictable, which is why legal precedent and empirical work have led many legal scholars to believe that using copyrighted works as training data to debias AI will be fair use–even if that has some public harms. Back in 2017, Professor Ben Sobel concluded that “[i]f engineers made unauthorized use of copyrighted data for the sole purpose of debiasing an expressive program, . . . fair use would excuse it.” Professor Amanda Levendowski has explained why and how “[f]air use can, quite literally, promote creation of fairer AI systems.” More recently, Dr. Mehtab Khan and Dr. Alex Hanna  observed that “[a]ccessing copyright work may also be necessary for the purpose of auditing, testing, and mitigating bias in datasets . . . [and] it may be useful to rely on the flexibility of fair use, and support access for researchers and auditors.” 

No matter how you feel about it, fair use is not the end of the story. It is ill-equipped to solve the troubling growth of AI-powered deepfakes. After being targeted by sexualized deepfakes, Rep. Ocasio-Cortez described “[d]eepfakes [as] absolutely a way of digitizing violent humiliation against other people.” Fair use will not solve the intersectional harms of AI-powered face surveillance either. Dr. Joy Buolamwini and Dr. Timnit Gebru evaluated leading gender classifiers used to train face surveillance technologies and discovered that they more accurately classified males over females and lighter-skinned over darker-skinned people. The researchers also discovered that the “classifiers performed worst on darker female subjects.” While legal scholars like Professors Shyamkrishna Balganesh, Margaret Chon, and Cathay Smith argue that copyright law can protect privacy interests, like the ones threatened by deepfakes or face surveillance, federal privacy laws are a more permanent, comprehensive way to address these problems.

But who has time to wait on courts and Congress? Right now, authors can take proactive steps to ensure that their works promote progress rather than prevent it. Check out the Authors Alliance’s guides to Contract Negotiations, Open Access, Rights Reversion, and Termination of Transfer to learn how–or explore our new zine, Putting the AI in Fair Use: Authors’ Abilities to Promote Progress.

You can find a PDF of the Zine here, as well as printer-ready copies here and here.

Books are Big AI’s Achilles Heel

Posted May 13, 2024

By Dave Hansen and Dan Cohen

Image of the Rijksmuseum by Michael D Beckwith. Image dedicated to the Public Domain.

Rapidly advancing artificial intelligence is remaking how we work and live, a revolution that will affect us all. While AI’s impact continues to expand, the operation and benefits of the technology are increasingly concentrated in a small number of gigantic corporations, including OpenAI, Google, Meta, Amazon, and Microsoft.

Challenging this emerging AI oligopoly seems daunting. The latest AI models now cost billions of dollars, beyond the budgets of startups and even elite research universities, which have often generated the new ideas and innovations that advance the state of the art.

But universities have a secret weapon that might level the AI playing field: their libraries. Computing power may be one important part of AI, but the other key ingredient is training data. Immense scale is essential for this data—but so is its quality.

Given their voracious appetite for text to feed their large language models, leading AI companies have taken all the words they can find, including from online forums, YouTube subtitles, and Google Docs. This is not exactly “the best that has been thought and said,” to use Matthew Arnold’s pointed phrase. In Big AI’s haphazard quest for quantity, quality has taken a back seat. The frequency of “hallucinations”—inaccuracies currently endemic to AI outputs—are cause for even greater concern.

The obvious way to rectify this lack of quality and tenuous relationship to the truth is by ingesting books. Since the advent of the printing press, authors have published well over 100 million books. These volumes, preserved for generations on the shelves of libraries, are perhaps the most sophisticated reflection of human thinking from the beginning of recorded history, holding within them some of our greatest (and worst) ideas. On average, they have exceptional editorial quality compared to other texts, capture a breadth and diversity of content, a vivid mix of styles, and use long-form narrative to communicate nuanced arguments and concepts.

The major AI vendors have sought to tap into this wellspring of human intelligence to power the artificial, although often through questionable methods. Some companies have turned to an infamous set of thousands of books, apparently retrieved from pirate websites without permission, called “Books3.” They have also sought licenses directly from publishers, using their massive budgets to buy what they cannot scavenge. Meta even considered purchasing one of the largest publishers in the world, Simon & Schuster.

As the bedrock of our shared culture, and as the possible foundation for better artificial intelligence, books are too important to flow through these compromised or expensive channels. What if there were a library-managed collection made available to a wide array of AI researchers, including at colleges and universities, nonprofit research institutions, and small companies as well as large ones?

Such vast collections of digitized books exist right now. Google, by pouring millions of dollars into its long-running book scanning project, has access to over 40 million books, a valuable asset they undoubtedly would like to keep exclusive. Fortunately, those digitized books are also held by Google’s partner libraries. Research libraries and other nonprofits have additional stockpiles of digitized books from their own scanning operations, derived from books in their own collections. Together, they represent a formidable aggregation of texts.

A library-led training data set of books would diversify and strengthen the development of AI. Digitized research libraries are more than large enough, and of substantially higher quality, to offer a compelling alternative to existing scattershot data sets. These institutions and initiatives have already worked through many of the most challenging copyright issues, at least for how fair use applies to nonprofit research uses such as computational analysis. Whether fair use also applies to commercial AI, or models built from iffy sources like Books3, remains to be seen.

Library-held digital texts come from lawfully acquired books—an investment of billions of dollars, it should be noted, just like those big data centers—and libraries are innately respectful of the interests of authors and rightsholders by accounting for concerns about consent, credit, and compensation. Furthermore, they have a public-interest disposition that can take into account the particular social and ethical challenges of AI development. A library consortium could distinguish between the different needs and responsibilities of academic researchers, small market entrants, and large commercial actors. 

If we don’t look to libraries to guide the training of AI on the profound content of books, we will see a reinforcement of the same oligopolies that rule today’s tech sector. Only the largest, most well-resourced companies will acquire these valuable texts, driving further concentration in the industry. Others will be prevented from creating imaginative new forms of AI based on the best that has been thought and said. As they have always done, by democratizing access libraries can support learning and research for all, ensuring that AI becomes the product of the many rather than the few.

Further reading on this topic: “Towards a Books Data Commons for AI Training,” by Paul Keller, Betsy Masiello, Derek Slater, and Alek Tarkowski.

This week, Authors Alliance celebrates its 10th anniversary with an event in San Francisco on May 17 (We still have space! Register for free here) titled “Authorship in an Age of Monopoly and Moral Panics,” where we will highlight obstacles and opportunities of new technology. This piece is part of a series leading up to the event.

Authors Alliance Submits Amicus Brief in Tiger King Fair Use Case

Posted May 6, 2024

By Dave Hansen

Have you ever used a photograph to illustrate a historical event in your writing? Or quoted, say from a letter, to point out some fact that the author conveyed in their writing? According to the 10th Circuit, these aren’t the kinds of uses that fair use supports. 

On Thursday, Authors Alliance joined with EFF, the Association of Research Libraries, the American Library Association, and Public Knowlege in filing an amicus brief asking the 10th Circuit Court of Appeals to reconsider its recent fair use decision in Whyte Monkee v. Netflix. 

The case is about Netflix’s use of a funeral video recording in its documentary series Tiger King, a true crime documentary about Joseph Maldanado, aka Joe Exotic, an excentric zookeeper, media personality, exotic animal owner, and convicted felon. The recording at issue was created by Timothy Sepi/Whyte Monkee, as a memorial for Travis Maldonado, Joe Exotic’s late husband. Netflix used about 60 seconds of the funeral video in its show. Its purpose was, among other things, to “illustrate Mr. Exotic’s purported megalomania, even in the face of tragedy.” 

A three-judge panel of the 10th Circuit issued its opinion in late March, concluding that Netflix’s use was not “transformative” under the first fair use factor and therefore disfavored as a fair use. The panel relied heavily on the Supreme Court’s recent decision in Andy Warhol v. Goldsmith, taking that case to mean that uses that do not comment or criticize the artistic and creative aspects of the underlying work are generally disfavored. So, the court concluded: 

Defendants’ use of the Funeral Video is not transformative under the first fair use factor. Here, Defendants did not comment on or “target” Mr. Sepi’s work at all; instead, Defendants used the Funeral Video to comment on Joe Exotic. More specifically, Defendants used the Funeral Video to illustrate Mr. Exotic’s purported megalomania, even in the face of tragedy. By doing so, Defendants were providing a historical reference point in Mr. Exotic’s life and commenting on Mr. Exotic’s showmanship. However, Defendants’ use did not comment on Mr. Sepi’s video—i.e., its creative decisions or its intended meaning.

You can probably see the problem. Fair use has, for a very long time, supported a wide variety of other uses that incorporate existing works as historical reference points and illustrations. Although the Supreme Court talked a lot about criticism and comment in its Warhol opinion (which made sense, given that the use before it was a purported artistic commentary), I think very few people interpreted that decision to mean that only commentary and criticism are permissible transformative fair uses. But as our brief points out, the panel’s decision essentially converts the Supreme Court’s decision in Warhol from a nuanced reaffirmation of fair use precedent into a radical rewrite of the law that only supports those kinds of uses. 

Our brief argues that the 10th Circuit misread the Supreme Court’s opinion in Warhol, and that it ignored decades of fair use case law. We point to a few good examples – e.g., Time v. Bernard Geis (a 1968 case finding fair use of a recreation of the famous Zapruder film in a book titled “Six Seconds in Dallas,” analyzing President Kennedy’s assassination), New Era Publications v. Carol Publishing (a 1990 case supporting reuse of lengthy quotations of L Ron Hubbard in a book about him, to make a point about Hubbard’s “hypocrisy and pomposity”) and Bill Graham Archives v. Dorling Kindersley (a 2006 case finding fair use of Grateful Dead concert posters in a book using them as historical reference points). 

Our brief also highlights how communities of practice such as documentary filmmakers, journalists, and nonfiction writers have come to rely on fair use to support these types of uses–so much so that these practices are codified in best practices here, here, and even here Authors Alliance’s own Fair Use for Nonfiction Authors guide. 

Although it is rare for appellate courts to grant rehearing of already issued opinions, this opinion has drawn quite a lot of negative attention. In addition to our amicus brief, there were amicus briefs filed in support of rehearing from: 

Given the broad and negative reach of this decision, I hope the 10th Circuit will pay attention and grant the request. 

Book Talk – Unlocking the Digital Age: The Musician’s Guide to Research, Copyright & Publishing

Posted March 27, 2024

Join us for a book talk with ANDREA I. COPLAND & KATHLEEN DeLAURENTI about UNLOCKING THE DIGITAL AGE, a crucial resource for early career musicians navigating the complexities of the digital era.

REGISTER NOW

“[Musicians,] Use this book as a tool to enhance your understanding, protect your creations, and confidently step into the world of digital music. Embrace the journey with the same fervor you bring to your music and let this guide be a catalyst in shaping a fulfilling and sustainable musical career.”
– Dean Fred Bronstein, THE PEABODY INSTITUTE OF THE JOHNS HOPKINS UNIVERSITY

Based on coursework developed at the Peabody Conservatory, Unlocking the Digital Age: The Musician’s Guide to Research, Copyright, and Publishing by Andrea I. Copland and Kathleen DeLaurenti [READ NOW] serves as a crucial resource for early career musicians navigating the complexities of the digital era. This guide bridges the gap between creative practice and scholarly research, empowering musicians to confidently share and protect their work as they expand their performing lives beyond the concert stage as citizen artists. It offers a plain language resource that helps early career musicians see where creative practice and creative research intersect and how to traverse information systems to share their work. As professional musicians and researchers, the authors’ experiences on stage and in academia makes this guide an indispensable tool for musicians aiming to thrive in the digital landscape.

Copland and DeLaurenti will be in conversation with musician and educator, Kyoko Kitamura. Music librarian Matthew Vest will facilitate our discussion.

Unlocking the Digital Age: The Musician’s Guide to Research, Copyright, and Publishing is available to read & download.

REGISTER NOW

About our speakers

ANDREA I. COPLAND is an oboist, music historian, and librarian based in Baltimore, MD. Andrea has dual master’s of music degrees in oboe performance and music history from the Peabody Institute of the Johns Hopkins University and is currently Research Coordinator at the Répertoire International de la Presse Musicale (RIPM) database. She is also a teaching artist with the Baltimore Symphony Orchestra’s OrchKids program and writes a public musicology blog, Outward Sound, on substack.

KATHLEEN DeLAURENTI is the Director of the Arthur Friedheim Library at the Peabody Institute of The Johns Hopkins University where she also teaches Foundations of Music Research in the graduate program. Previously, she served as scholarly communication librarian at the College of William and Mary where she participated in establishing state-wide open educational resources (OER) initiatives. She is co-chair of the Music Library Association (MLA) Legislation Committee as well as a member of the Copyright Education sub-committee of the American Library Association (ALA) and is past winner of the ALA Robert Oakley Memorial Scholarship for copyright research. DeLaurenti is passionate about copyright education, especially for musicians. She is active in communities of practice working on music copyright education, sustainable economic models for artists and musicians, and policy for a balanced copyright system. DeLaurenti served as the inaugural Open Access Editor of MLA and continues to serve on the MLA Open Access Editorial Board. She holds an MLIS from the University of Washington and a BFA in vocal performance from Carnegie Mellon University.

KYOKO KITAMURA is a Brookyn-based vocal improviser, bandleader, composer and educator, currently co-leading the quartet Geometry (with cornetist Taylor Ho Bynum, guitarist Joe Morris and cellist Tomeka Reid) and the trio Siren Xypher (with violist Melanie Dyer and pianist Mara Rosenbloom). A long-time collaborator of legendary composer Anthony Braxton, Kitamura appears on many of his releases and is the creator of the acclaimed 2023 documentary Introduction to Syntactical Ghost Trance Music which DownBeat Magazine calls “an invaluable resource for Braxton-philes.” Active in interdisciplinary performances, Kitamura recently provided vocals for, and appeared in, artist Matthew Barney’s 2023 five-channel installation Secondary.

MATTHEW VEST is the Music Inquiry and Research Librarian at UCLA. His research interests include change leadership in higher education, digital projects and publishing for music and the humanities, and composers working at the margins of the second Viennese School. He has also worked in the music libraries at the University of Virginia, Davidson College, and Indiana University and is the Open Access Editor for the Music Library Association.

Book Talk: UNLOCKING THE DIGITAL AGE
April 3 @ 10am PT / 1pm ET
VIRTUAL
Register now!

Announcing Departure of Rachel Brooke, Authors Alliance Senior Staff Attorney

Posted March 18, 2024
Photo by Jan Tinneberg on Unsplash

Dear Authors Alliance Members, Friends, and Allies,

It is with a heavy heart that I am announcing my departure from Authors Alliance. For me, the development is bittersweet—in a few weeks, I will be starting a new job at a law firm where I’ll focus on litigation and developing my advocacy skills in a new way. I’m excited for this next chapter, but I’ll sorely miss being an Authors Alliance staff member and working to advance the interests of our members, a dedicated and engaged community of authors who care deeply about access to knowledge and culture. 

My time at Authors Alliance has seen a lot of change, both on an organizational level and in terms of the world around us. I joined as a staff attorney in late 2020, during a stormy political season and in the midst of a public health crisis. Working with former executive director, Brianna Schofield, I got to know this community and began to understand what mattered to you. I wrote one of our guides, Third-Party Permissions and How to Clear Them, drawing on my past experience working as a literary agent in addition to what I had learned about copyright law and the particular needs of our members. I also spent nine months as our interim executive director before Dave joined us back in 2022. Along the way, with the blessing and guidance of our outstanding board of directors, Authors Alliance began to focus more on policy and scale back our education work. Back in 2014, there was a dearth of these kinds of educational resources for authors, but that has changed over time, particularly with the increasing presence of scholarly communications offices to guide academic scholars.  

This week is my last as an employee of Authors Alliance, and next week will be my first as a regular member. During my years with Authors Alliance, I’ve been asked a lot of times “who can join” and whether a person “qualified” as an author. Unlike other authors’ organizations, we don’t gatekeep when it comes to membership. If you—like me—write, for business or for pleasure, and you—like me—believe in our mission, Authors Alliance would love to have you join as a member. And what I love about this organization is that it truly does want to be responsive to the needs of its members. Our two amicus briefs in the Hachette Books v. Internet Archive litigation (that Dave and Kyle Courtney wrote about just last week) were based on a survey we conducted of members and other authors, because we saw how author interests were taking a back seat to the interests of large publishers in the litigation. I wrote both of these briefs, and it was an absolute pleasure to use my legal training to share this important perspective with the courts. 

We created our most recent guide, Writing About Real People, because we so often heard from nonfiction authors writing about real people who had questions about whether they might be exposing themselves to legal risk. The same is true for the permissions guide—it was partially inspired by the fact that a guest blog post on clearing rights for images had been one of our most popular of all time, indicating the need for this kind of resource. We began conducting advocacy work in the realm of AI and copyright because it was clear that generative AI had the potential to reshape authorship and intellectual property laws, and we thought our voice could be useful as a sensible, measured one that remained optimistic about technology and innovation. 

On a personal level, being an attorney for Authors Alliance has given me both a strong sense of job satisfaction and the feeling that my work is helping people and making a difference in the world (something many lawyers can only dream of!). Whether it is seeing our views shape the development of the laws and regulations governing information policy, or hearing from an author who got their rights back or successfully negotiated with their publisher to retain their copyright, the effects of our work have reminded me that our organization really matters. It’s one I have been honored to be a part of for the past three and a half years. Please feel free to reach out over email (for now, you can reach me at rachel@authorsalliance.org) in the next few days, or add me on twitter or LinkedIn—I’d love to stay engaged with this community, even if I’m no longer involved professionally. I also plan to attend our 10th Anniversary celebration in May, and hope to see many of our members and allies there!

Fondly,

Rachel

Publishers’ brief in Hachette v. Internet Archive: First Impressions

Posted March 15, 2024

Dave Hansen and Kyle Courtney jointly authored this post. They are also the authors of a White Paper on Controlled Digital Lending of Library Books. We are not, as the Publishers claim in their brief on page 13, a “cadre of boosters.” We wrote the paper independently as part of our combined decades of work on libraries and access to knowledge.

Earlier today the publishers (Hachette, Harper Collins, John Wiley, and Penguin Random House) filed their reply brief on appeal in their long-running lawsuit against Internet Archive, which challenges (among other things) the practice of controlled digital lending. 

For the months after the decision, we had been observing all the hot takes, cheers, jeers, and awkward declarations about the case, the Internet Archive itself, and Controlled Digital Lending (CDL).

This post is not part of that fanfare. Here, we want to identify a few critical issues that the publishers focus on in their brief, including some questionable fair use analysis that they repeat from the district court below. Much of the brief is framed in heated rhetoric that may cause alarm, but much like publishers’ announcements about interlibrary loan, e-reserves, or document delivery, we believe controlled digital lending is here to stay, regardless of the lower court’s poor copyright analysis and current publisher’s brief.

Framing the Question

As is often the case, the parties disagree on what this case is actually about. For its part, Internet Archive says in their “Statement of the Issue on Appeal” that the question is  “whether Internet Archive’s controlled digital lending is fair use.” Publishers, on the other hand, reframe the question more broadly, which in combination with their arguments through the brief,  seems intended to not just kill IA’s implementation of controlled digital lending, but to encourage the court to rule in a way that would call into question all other library applications of CDL.. They say that the question is  “whether IA’s infringement of the Publishers’ Works is fair use based on IA’s CDL theories and practices.” 

This litigation, coordinated by the AAP,  seems to us an attempt to undermine what libraries have done for centuries: lend the books that they already lawfully own. Ironically, the opposition calls CDL a made-up theory created by a “cadre of boosters,” but in actuality, it’s the publishers’ licensing system that is a modern, made-up invention. The works themselves are unchanged, but the nature of digital delivery allows publishers to charge people in new ways. There is nothing in the Copyright Act that states ebook licensing is, or should be, the default way for libraries to acquire and lend books. 

Commercial vs. Non-Profit Use

One of the most criticized aspects of the decision below is the lower court’s conclusion that IA’s activities are commercial, as opposed to non-profit. The publisher’s brief enthusiastically embraces this conclusion, while also attempting to drive a wedge between IA’s lending and that of other libraries: “IA’s practices are distinctly commercial – especially in comparison to public and academic libraries.” 

The district court concluded that IA’s activity was commercial because it “stands to profit” through its partnership with Better World Books on its website, and by “us[ing] its Website to attract new members, solicit donations, and bolster its standing in the library community” (p. 26).

As many amici pointed out earlier in the appeal, the use of a nonprofit’s website to solicit donations is routine; it would be chilling for sites like Wikipedia, Project Gutenberg, Hathitrust and others (all of whom filed briefs in this case) to face heightened copyright liability just because they seek donations in combination with aspects of their sites that rely on a fair use assertion.  The publishers attempt to distance themselves from this absurd result (“The concern that Judge Koeltl’s analysis “would render virtually all nonprofit uses commercial” is wildly overblown”), but it is clear from the number and diversity of amici who filed to speak to just this issue that the concern is very real. 

As for Better World Books (BWB): BWB  is an online bookstore and a Certified B Corporation, meaning that it achieves high standards of social and environmental performance, transparency, and accountability. B Corps are committed to using business as a force for good in the world. According to its website, BWB donates books to nonprofit organizations, including the Internet Archive. As of November 2019, IA and BWB have a partnership to digitize books for preservation purposes. 

The focus on the supposedly commercial relationship with Better World Books (a used book reseller) seems to us a stretch based on the facts. The publishers’ brief makes a big deal of Better World Books (referencing them over 20 times in the brief), and argues that IA’s use is commercial because a)  IA encourages readers to purchase books through links on its site to Better World Books, and b) Better World Books donates some funds back to IA.  The first point is perplexing–one would think they’d be pleased that readers are encouraged to purchase copies of their books–even if on the used market. But the later point about BetterWorld Books’ commercial influence on IA’s operation is just not rooted in the facts of the case. As IA laid out in its opening brief, it has only received $5,561.41 from Better World Books in the relevant time frame.  That’s an infinitesimally small drop in the bucket compared to the costs that IA has borne to digitize and lend books for no monetary return from readers. It’s hard to see how such an amount could be construed to tilt IA’s entire operation into a commercial activity. 

For anyone who has actually worked on such projects, it is clear that IA is not archiving or lending books for commercial purposes. The idea that there is money to be made in doing so is laughable. Instead, it is providing access to knowledge and cultural heritage. This fundamental point somehow got lost on the publishers on the road to enormous profits.

eBooks vs. Digitized Books

There are lots of nuances that got lost in the decision below, which we believe were helpfully addressed by amici filings earlier in this appeal (e.g., the privacy implications of licensed ebooks vs. CDL copies lent by libraries).  The publishers seem happy to gloss over the details again in this brief, particularly when it comes to the differences between licensed ebooks and those that are lent out with CDL. 

First, the publisher’s brief makes clear they really don’t like it when books are available for free.. They use the word 33 times (about every other page of the brief)! Many of the references obscure what “free” really means though –  for example, asserting that  “Two Publishers believe that 39-50% of American ebook consumers read their ebooks for free from libraries rather than paying for their own commercial ebooks” (emphasis added) while ignoring the exorbitant costs and other burdens placed on libraries and the public to fund that licensed access. This is a major part of why libraries have responded both by embracing CDL and by advocating for laws that would require fair licensing terms for ebooks. . 

Second, as far as market harm goes, the Publisher’s assert that “IA offered the Publishers’ library and consumer customers a free competing substitute to the authorized ebook editions” essentially arguing that “you can’t compete with free.” But, that is just not true.  Examples are trivially easy to conjure up open source software vs. Microsoft or iOS. How often do you run into someone who uses Libre Open Office, or Ubuntu? And of course in creative industries, we’ve seen this kind of model take hold in numerous areas, including book publishing, with “freemium” models.’

That’s because products that are free often offer a different user experience than those that aren’t. Usually when someone opts to pay, they’re paying for an enhanced experience. The same holds true of books scanned for CDL vs. licensed ebooks. CDL books are just that – they are digitized physical books. They don’t have the nice, crisp text of licensed ebooks, nor the interactive features. You can’t highlight, or change the font, or look up a word by touching it, or do any of the myriad of functions that you can with an ebook. 

That a library is loaning and controlling those copies is also a major distinguishing factor, because borrowing a book from a library (along with all the special privacy protections one receives) provides a vastly different reading environment than one in which vendors can scrape, process and sell data about your reading experience. Notably, the publishers did not engage with this argument. 

“IA refuses to pay the customary price and join the Publishers’ thriving market for authorized library ebooks…”

Good gravy! According to the publishers, libraries should be forced to pay over and over again for the same book, to join a market for which there is no evidence that they are harming. 

The publishers’ devote a large portion of their brief – nearly 20 pages– to arguing about market harm. Most of it comes down to the assertion that mere fact of the existence of a digital book market means that  CDL must negatively impact the rightsholders’ profits (despite no empirical evidence of market harm). The lower court decision stated that IA has the “burden to show a lack of market harm” (p. 43), and concluded (without reference to meaningful evidence) that “that harm here is evident” (p. 44), an assumption which the publishers are happy to rest on. 

There is a genuinely important legal question raised here about which party needs to prove what when it comes to market harm. The publisher’s brief relies heavily on the idea that IA bears the burden on every point of its fair use defense, especially market harm. But as IA points out in its opening brief, 

“Although the Supreme Court has stated fair use is an affirmative defense for which defendants bear the burden (Campbell, 510 U.S. at 1177), it has also suggested this burden may apply differently to noncommercial uses than commercial ones. Sony stated that noncommercial cases require “a showing by a preponderance of the evidence that some meaningful likelihood of future harm exists.” 464 U.S. at 417; see Princeton Univ. Press v. Mich. Document Servs., Inc., 99 F.3d 1381, 1385- 86 (6th Cir. 1996) (“The burden of proof as to market effect rests with the copyright holder if the challenged use is of a ‘noncommercial’ nature.”). 

Conclusion

The brief is predictably hyperbolic, and continues to refuse to allow for any room for digital lending based on a misreading, in our view, of precedents such as Sony, TVEyes, and ReDigi. But, CDL is not some form of library-sanctioned piracy. CDL is based in copyright, fair use, and the public mission of libraries, while also broadening access to the books that library systems spend billions of dollars to collect and maintain for the public—including long-neglected, out-of-print books with enormous social and scholarly value and books for which commercial ebook licenses are not available.

During the pandemic, the importance of digital library access became strikingly apparent. It is unfortunate that the Publishers chose that moment of national emergency to sue a non-profit library for loaning books digitally. CDL simply seeks to preserve the library’s long-established and vital mission to collect and lend books in an increasingly licensed-access digital world.

Writing About Real People Update: Right of Publicity, Voice Protection, and Artificial Intelligence

Posted March 7, 2024
Photo by Jason Rosewell on Unsplash

Some of you may recall that Authors Alliance published our long-awaited guide, Writing About Real People, earlier this year. One of the major topics in the guide is the right of publicity—a right to control use of one’s own identity, particularly in the context of commercial advertising. These issues have been in the news a lot lately as generative AI poses new questions about the scope and application of the right of publicity. 

Sound-alikes and the Right of Publicity

One important right of publicity question in the genAI era concerns the increasing prevalence of “sound-alikes” created using generative AI systems. The issue of AI-generated voices that mimicked real people came to the public’s attention with the apparently convincing “Heart on My Sleeve” song, imitating Drake and the Weeknd, and tools that facilitate creating songs imitating popular singers have increased in number and availability

AI-generated soundalikes are a particularly interesting use of this technology when it comes to the right of publicity because one of the seminal right of publicity cases, taught in law schools and mentioned in primers on the topic, concerns a sound-alike from the analog world. In 1986, the Ford Motor Company hired an advertising agency to create a TV commercial. The agency obtained permission to use “Do You Wanna Dance,” a song Bette Midler had famously covered, in its commercial. But when the ad agency approached Midler about actually singing the song for the commercial, she refused. The agency then hired a former backup singer of Midler’s to record the song, apparently asking the singer to imitate Midler’s voice in the recording. A federal court found that this violated Midler’s right of publicity under California law, even though her voice was not actually used. Extending this holding to AI-generated voices seems logical and straightforward—it is not about the precise technology used to create or record the voice, but about the end result the technology is used to achieve. 

Right of Publicity Legislation

The right of publicity is a matter of state law. In some states, like California and New York, the right of publicity is established via statute, and in others, it’s a matter of common law (or judge-made law). In recent months, state legislatures have proposed new laws that would codify or expand the right of publicity. Similarly, many have called for the establishment of a federal right of publicity, specifically in the context of harms caused by the rise of generative AI. One driving force behind calls for the establishment of a federal right of publicity is the patchwork nature of state right of publicity laws: in some states, the right of publicity extends only to someone’s name, image, likeness, voice, and signature, but in others, it’s much broader. While AI-generated content and the ways in which it is being used certainly pose new challenges for courts considering right of publicity violations, we are skeptical that new legislation is the best solution. 

In late January, the No Artificial Intelligence Fake Replicas and Unauthorized Duplications Act of 2024 (or “No AI FRAUD Act”) was introduced in the House of Representatives. The No AI FRAUD Act would create a property-like right in one’s voice and likeness, which is transferable to other parties. It targets voice “cloning services” and mentions the “Heart on My Sleeve” controversy specifically. But civil societies and advocates for free expression have raised alarm about the ways in which the bill would make it easier for creators to actually lose control over their own personality rights while also impinging on others’ First Amendment rights due to its overbreadth and the property-like nature of the right it creates. While the No AI FRAUD Act contains language stating that the First Amendment is a defense to liability, it’s unclear how effective this would be in practice (and as we explain in the Writing About Real People Guide, the First Amendment is always a limitation on laws affecting freedom of expression). 

The Right of Publicity and AI-Generated Content

In the past, the right of publicity has been described as “name, image, and likeness” rights. What is interesting about AI-generated content and the right of publicity is that a person’s likeness can be used in a more complete way than ever before. In some cases, both their appearance and voice are imitated, associated with their name, and combined in a way that makes the imitation more convincing. 

What is different about this iteration of right of publicity questions is the actors behind the production of the soundalikes and imitations, and, to a lesser extent, the harms that might flow from these uses. A recent use of a different celebrity’s likeness in connection with an advertisement is instructive on this point. Earlier this year, advertisements emerged on various platforms featuring an AI-generated Taylor Swift participating in a Le Creuset cookware giveaway. These ads contained two separate layers of deceptiveness: most obviously, that Swift was AI-generated and did not personally appear in the ad, but more bafflingly, that they were not Le Creuset ads at all. The ads were part of a scam whereby users might pay for cookware they would never receive, or enter credit card details which could then be stolen or otherwise used for improper purposes. Compared to more traditional conceptions of advertising, the unfair advantages and harms caused by the use of Swift’s voice and likeness are much more difficult to trace. Taylor Swift’s likeness and voice were appropriated by scammers to trick the public into thinking they were interacting with Le Creuset advertising. 

It may be that the right of publicity as we know it (and as we discuss it in the Writing About Real People Guide) is not well-equipped to deal with these kinds of situations. But it seems to us that codifying the right of publicity in federal law is not the best approach. Just as Bette Midler had a viable claim under California’s right of publicity statute back in 1992, Taylor Swift would likely have a viable claim against Le Creuset if her likeness had been used by that company in connection with commercial advertising. The problem is not the “patchwork of state laws,” but that this kind of doubly-deceptive advertising is not commercial advertising at all. On a practical level, it’s unclear what party could even be sued by this kind of use. Certainly not Le Creuset. And it seems to us unfair to say that the creator of the AI technology sued should be left holding the bag, just because someone used it for fraudulent purposes. The real fraudsters—anonymous but likely not impossible to track down—are the ones who can and should be pursued under existing fraud laws. 

Authors Alliance has said elsewhere that reforms to copyright law cannot be the solution to any and all harms caused by generative AI. The same goes for the intellectual property-like right of publicity. Sensible regulation of platforms, stronger consumer protection laws, and better means of detecting and exposing AI-generated content are possible solutions to the problems that the use of AI-generated celebrity likenesses have brought about. To instead expand intellectual property rights under a federal right of publicity statute risks infringing on our First Amendment freedoms of speech and expression.

Why Fair Use Supports Non-Expressive Uses

Posted February 29, 2024

This post is part of Fair Use Week series, cross-posted at https://sites.harvard.edu/fair-use-week/2024/02/29/fair-use-week-2024-day-four-with-guest-expert-dave-hansen/

AI programs and their outputs raise all sorts of interesting questions–now found in the form of some 20+ lawsuits, many of them massive class actions.

One of the most important questions is whether it is permissible to use copyrighted works as training data to develop AI models themselves, on top of which AI services like ChatGPT are built (read here for a good overview of the component parts and “supply chain” of generative AI, reviewed through a legal lens).

For the question of fair use of AI training data, you’ll find that almost everyone writing about this question in the US context says the answer turns on two or three precedents–especially the Google Books case and the HathiTrust case–and a concept referred to as “non-expressive use” (or sometimes “non-consumptive use”).  This concept of non-expressive use and those cases have proven to be foundational for all sorts of applications that extend well beyond generative AI, including basic web search, plagiarism detection tools, and text and data mining research. Since this idea has received so much attention, I thought this fair use week was a good opportunity to explore what this concept is. 

What is non-expressive use? 

Non-expressive use refers to uses that involve copying, but don’t communicate the expressive aspects of the work to be read or otherwise enjoyed. It is a term coined, as far as I can tell, by law professor Matthew Sag in a series of papers titled “Copyright and Copyright Reliant Technology” (in which he observes that courts have been approving of such uses–for example in search engine cases–albeit without a coherent framework) and then more directly in “Orphan Works as Grist for the Data Mill” and later in an article titled “The New Legal  Landscape for Text Mining and Machine Learning.”  You can do much better than this blog post if you just read Matt’s articles. But, since you’re here, the argument is basically built on two propositions:  

Proposition #1: “Facts are not copyrightable”  is a phrase you’ll hear somewhere near the beginning of the lecture on copyright 101. It, along with the “idea-expression” dichotomy and some related doctrines are some of the ways that copyright law draws a line between protected content and those underlying facts and ideas that anyone is free to use. These protections for free use of facts and ideas are more than just a line in the sand drawn by Congress or the courts. As the U.S. Supreme Court in Eldred v. Ashcroft most recently explained: 

“[The]idea/expression dichotomy strike[s] a definitional balance between the First Amendment and the Copyright Act by permitting free communication of facts while still protecting an author’s expression. Due to this distinction, every idea, theory, and fact in a copyrighted work becomes instantly available for public exploitation at the moment of publication.” (citations and quotations omitted). 

The law has therefore recognized the distinction between expressive non-expressive works (for example, copyright exists in a novel, but not in a phone book), and that this distinction is so important that the Constitution mandates it. The exact contours of this line have been the subject of a long and not always consistent history, but has slowly come into focus in cases from  Baker v. Selden (1879) (“there is a clear distinction between the book, as such, and the art which it is intended to illustrate”) to Feist Publications v. Rural Telephone (1994) (no copyright in telephone white pages). 

Proposition #2: Fair use is also one of the Copyright Act’s First Amendment safeguards, per the Supreme Court Eldred. The “transformative use” analysis, in particular, does a lot of work in giving breathing room for others to use existing works in ways that allow for their own criticism and comment. It also has provided ample space for uses that rely on copying to unearth facts and ideas contained within and about underlying works, particularly when doing so in a way that provides a net social benefit. 

Transformative use, though not always easy to define in practice, favors uses that avoid substituting for the original expression, but that reuse that content in new ways, with new meaning, message and purpose. While this can apply to downstream expressive uses (e.g., parody is the paradigmatic example that relies on reusing expression itself), its application to non-expressive uses can look even stronger. This is why you find courts like the 9th Circuit in a case about image search saying things like “a search engine may be more transformative than a parody because a search engine provides an entirely new use for the original work, while a parody typically has the same entertainment purpose as the original work,” where search engines copy underlying works primarily for the purpose of helping users discover them. 

Fair use for non-expressive use

We now have several cases that address non-expressive uses for computational analysis of texts.   The three cases, in particular, are iParadigms v. ex rel Vanderhye,  in which the Fourth Circuit in 2009 analyzed a plagiarism detection tool that ingested papers and then created a “digital fingerprint” to match them to duplicate content using a statistical technique originally designed to analyze brain waves. The court there concluded that “iParadigms’ use of these works was completely unrelated to expressive content” and therefore constituted transformative fair use. Then in Authors Guild v. HathiTrust and Authors Guild v. Google, we saw the Second Circuit in successive opinions in 2014 and 2015 approve of copying at a massive scale of books used for the purpose of full-text search of those books and related computational, analytical uses. The court, in Google Books, fully briefed on the implications of these projects for computational analysis of texts, explained: 

As with HathiTrust (and iParadigms), the purpose of Google’s copying of the original copyrighted books is to make available significant information about those books, permitting a searcher to identify those that contain a word or term of interest, as well as those that do not include reference to it. In addition, through the ngrams tool, Google allows readers to learn the frequency of usage of selected words in the aggregate corpus of published books in different historical periods. We have no doubt that the purpose of this copying is the sort of transformative purpose described in Campbell.

Example from the Digital Humanities Scholars brief in the Google Books case,
illustrating one text mining use enabled by the Google Books corpus. 

So, back to AI 

There are certainly limits to how much of an underlying work can be described before one crosses the line from non-expressive to substantial use of expressive content. For example, uses that reproduce extensive facts from underlying works to merely repackage content for the same purpose as the original works may face challenges, as in the case of Castlerock Entertainment v. Carol Publishing(about Carol Publishing’s “Seinfeld Aptitude Test” based on facts from the Seinfeld series), which the court concluded as made merely to “repackage Seinfeld to entertain Seinfeld viewers.” And there are real questions (discussed in two excellent recent essays, here and here) about how the law may respond in practice to AI products, particularly ones where outputs look–or at least can be made to look–suspiciously similar to inputs used as training data.

How AI models work is explained much more thoroughly (and much better) elsewhere, but the basic idea is that they are built by developing extraordinarily robust word vectors used to represent the relationships between words. To do this well, these models need to train on a large and diverse set of texts to build out a good model of how humans communicate in a variety of contexts. In short, these copy texts for the purpose of developing a model to describe facts about the underlying works and the relationship of words within them and with each other. What’s new is that we can now do this at a level of complexity and scale almost unimaginable before. Scale and complexity don’t change the underlying principles at issue, however,  and so this kind of training seems to me clearly within the bounds of non-expressive use as approved already by the courts in the cases cited above that authors, researchers, and the tech industry have been relying on for nearly a decade. 

Fair Use Week Webinar: Fair Use in Text Data Mining and Artificial Intelligence

Posted February 16, 2024
Text Miner, generated by MidJourney

Computational research techniques such as text and data mining (TDM) hold tremendous opportunities for researchers across the disciplines ranging from mining scientific articles to create better systematic reviews, or curated chemical property datasets to building a corpus of films to understand how concepts of gender, race, and identity are shared over time. Unfortunately, legal uncertainty, whether through copyright or restrictive terms of use can stifle this research. Recent copyright lawsuits, such as the high-profile cases brought against Microsoft, Github, and StabiltyAI underscore the legal complications.

So how can fair use allow for computational research techniques? Join us for this Fair Use Week webinar, co-sponsored with the the Library Copyright Institute, to find out! 

Wednesday, February 28, 2024
1pm – 2:30pm ET / 10am – 11:30 PT
Register here

We’ve written quite a bit about fair use in TDM and AI for research applications already, and the topic is certainly complicated. Join us for this event to hear live from legal experts and researchers. We plan to include substantial time for Q&A, so bring your questions! Panelists include: 

  • Dave Hansen, Executive Director, Authors Alliance
  • Rachael Samberg, Scholarly Communications Officer, UC Berkeley
  • Lauren Tilton, Claiborne Robins Professor of Liberal Arts and Digital Humanities, University of Richmond

Book Talk: Wrong Way by Joanna McNeil

Posted February 13, 2024

Join us for a VIRTUAL book talk with author Joanne McNeil about her latest book, WRONG WAY, which examines the treacherous gaps between the working and middle classes wrought by the age of AI. McNeil will be in conversation with author Sarah Jaffe.

This is the first Internet Archive / Authors Alliance book talk for a work of fiction! Come for a reading, stay for a thoughtful conversation between McNeil & Jaffe about the labor implications of artificial intelligence.

February 29 @ 10am PT / 1pm ET
VIRTUAL

REGISTER NOW

WRONG WAY was named one of the best books of 2023 by the New Yorker and Esquire. It was the Endless Bookshelf Book of the Year and named one of the best tech books by the LA Times.

“Wrong Way is a chilling portrait of economic precarity, and a disturbing reminder of how attempts to optimize life and work leave us all alienated.”
—Adrienne Westenfeld, Esquire

For years, Teresa has passed from one job to the next, settling into long stretches of time, struggling to build her career in any field or unstick herself from an endless cycle of labor. The dreaded move from one gig to another is starting to feel unbearable. When a recruiter connects her with a contract position at AllOver, it appears to check all her prerequisites for a “good” job. It’s a fintech corporation with progressive hiring policies and a social justice-minded mission statement. Their new service for premium members: a functional fleet of driverless cars. The future of transportation. As her new-hire orientation reveals, the distance between AllOver’s claims and its actions is wide, but the lure of financial stability and a flexible schedule is enough to keep Teresa driving forward.

Joanne McNeil, who often reports on how the human experience intersects with labor and technology brings blazing compassion and criticism to Wrong Way, examining the treacherous gaps between the working and middle classes wrought by the age of AI. Within these divides, McNeil turns the unsaid into the unignorable, and captures the existential perils imposed by a nonstop, full-service gig economy.

REGISTER NOW

About our speakers

JOANNE MCNEIL was the inaugural winner of the Carl & Marilynn Thoma Art Foundation’s Arts Writing Award for an emerging writer. She has been a resident at Eyebeam, a Logan Nonfiction Program fellow, and an instructor at the School for Poetic Computation.
Joanne is the author of Lurking: How a Person Became a User.

SARAH JAFFE is an author, independent journalist, and a co-host of Dissent magazine’s Belabored podcast.

Book Talk: Wrong Way by Joanne McNeil
February 29 @ 10am PT / 1pm ET
VIRTUAL
Register now!