Author Archives: Authors Alliance

Hachette v. Internet Archive Update: Second Circuit Court of Appeals Rules Against Internet Archive

Posted September 5, 2024

We got a disappointing decision yesterday from the Second Circuit Court of Appeals in the long-running Hachette v. Internet Archive (IA) copyright lawsuit about IA’s digitization and lending of books. The Court affirmed the district court’s decision that IA cannot circulate digital copies of books they have legitimately acquired in physical copies, even when only the same number of copies as legitimately acquired are circulated to a single user at a time—just as a physical book would be loaned.

The Court, focusing on IA’s lending of digitized books that were available for license as ebooks from the publishers, concluded that IA’s fair use defense fails. We think this decision will result in a meaningful reduction in access to knowledge. This is sad news for many authors who have relied on IA’s Open Library for research and discovery, and  for readers who have used Open Library to find authors works. However, we also view it as a decision limited to its facts—that is, IA’s particular implementation of controlled digital lending (CDL), and more specifically, its lending of books that are already available in licensed digital formats. 

We plan to do a more in-depth analysis of the Court’s decision later, but for now, we offer some initial thoughts. First, there are a couple of bright spots in the opinion: 

1) The Court rejected the district court’s conclusion that IA was engaged in commercial use when looking at the first factor of fair use. The publishers argued IA’s lending of digitized books was commercial in nature because IA received a few thousand dollars from a for-profit used-bookseller and also solicited donations on its website. The Court rightly pointed out that if that was the standard, virtually every nonprofit that solicits donations would by default only be able to engage in commercial use. This was an issue we and others strongly urged the Court to address, and we’re glad it did. 

2)  For the most part, the Court focused its analysis on the facts of the case, which was really about IA lending digitized copies of books that were already available in ebook form and licensable from the publishers. The legal analysis in several places turned on this fact, which we think leaves room to make fair use arguments regarding programs to digitize and make available other books, such as print books for which there is no licensed ebook available, out-of-print books, or orphan works. CDL will remain an important framework, especially considering the lack of an existing digital first-sale doctrine.  

We are also disappointed by several key points in the decision: 

One was the Court’s assessment of the first fair use factor, “purpose and character of the use.” The Court’s analysis of this factor was in some ways unsurprising but nevertheless disappointing. The Court did little more than conclude that the use was not transformative and, therefore, not fair use. Though we think there are strong arguments that CDL is transformative, whether CDL is “transformative” is just one of the supporting rationales for the argument that CDL is fair use. The other justifications—that CDL supports teaching, scholarship, and research, along with complementing the first sale doctrine and supporting the public-interest mission of libraries—are at the heart of CDL. The Court didn’t engage with those other arguments at all and also ignored meaningful discussion of cases where non-transformative copying supported a fair use finding because of the public benefits.

A second key issue is about whether IA’s digital lending negatively impacts the market for the original works. This issue probably deserves a whole blog post to itself, but in short the analysis came down to who shoulders the burden of proving or disproving market harm, and what default assumptions the court has about market harm.  The following quotes from the decision will give you a sense of how the Court analyzed the issue: 

[a]lthough they do not provide empirical data of their own, Publishers assert that they (1) have suffered market harm due to lost eBook licensing fees and (2) will suffer market harm in the future if IA’s practices were to become widespread.  IA argues that Publishers cannot rely on the “common-sense inference” of market harm without data to back that up, citing American Society for Testing & Materials v. Public.Resource.Org, Inc. [citations omitted]. . . . We agree with Publishers’ assessment of market harm. 

Despite IA’s experts having offered meaningful data and analysis indicating a lack of market harm on sales of publishers’ books, the Court went on to say: 

We are likewise convinced that “unrestricted and widespread conduct of the sort engaged in by [IA] would result in a substantially adverse impact on the potential market for [the Works in Suit]. . . . Though Publishers have not provided empirical data to support this observation, we routinely rely on such logical inferences where appropriate in assessing the fourth fair use factor. . . . Thus, we conclude it is “self-evident” that if IA’s use were to become widespread, it would adversely affect Publishers’ markets for the Works in Suit.

We are also disappointed by how the Court portrayed the overall public benefit of IA’s lending and its long-term effect: “while IA claims that prohibiting its practices would harm consumers and researchers, allowing its practices would―and does―harm authors.” We think this is a gross generalization and mischaracterization of how IA’s digital lending affects most authors. Authors are researchers. Authors are readers. IA’s digital library helps authors create new works and supports their interests in having their works read. This ruling may benefit the largest publishers and most prominent authors, but for most, it will end up harming more than it will help. 

Seeking Authors & Books to Feature in Our Book Talk Series with Internet Archive

Posted July 18, 2024

Authors and Publishers: We are looking for books (both new & classic titles) to feature in our popular book talk series.

Starting in 2023, Authors Alliance and Internet Archive have partnered on a series of virtual book talks highlighting issues of importance to the library and information communities. Last year, more than 2,000 people attended our virtual and in-person talks. You can watch those talks now at https://archive.org/details/booktalks.

Themes

We are particularly interested in highlighting books that touch on one (or more!) of the following themes:

  1. Libraries & Literacy
  2. Book Culture & the History of the Book
  3. Internet Policy
  4. Copyright & Intellectual Property Rights
  5. Artificial Intelligence & its impact
  6. Computing & Internet History
  7. Supporting Democracies

Contact

If you are an author or publisher with a book (either new or backlist) that would be a good fit for our series, please reach out to Chris Freeland, director of library services at Internet Archive, at chrisfreeland@archive.org today!

Book Talk: The Secret Life of Data

Book Talk: The Secret Life of Data

How data surveillance, digital forensics, and generative AI pose new long-term threats and opportunities—and how we can use them to make better decisions in the face of technological uncertainty.

Book Talk: The Secret Life of Data
April 18 @ 10am PT / 1pm ET ONLINE
Register now!

“I have been waiting a long time for a clearly written book that cuts through the hype and describes how data—big and small, old and new—actually operate in our lives. Neither utopian nor dystopian, The Secret Life of Data just tells it like it is.”   
—Siva Vaidhyanathan, Professor of Media Studies, The University of Virginia; author of Antisocial Media and The Googlization of Everything (And Why We Should Worry)

In The Secret Life of Data, Aram Sinnreich and Jesse Gilbert explore the many unpredictable, and often surprising, ways in which data surveillance, AI, and the constant presence of algorithms impact our culture and society in the age of global networks. The authors build on this basic premise: no matter what form data takes, and what purpose we think it’s being used for, data will always have a secret life. How this data will be used, by other people in other times and places, has profound implications for every aspect of our lives—from our intimate relationships to our professional lives to our political systems.

REGISTER NOW

ABOUT OUR SPEAKERS

ARAM SINNREICH is an author, professor, and musician. He is Chair of Communication Studies at American University. His books include Mashed Up, The Piracy CrusadeThe Essential Guide to Intellectual Property, and A Second Chance for Yesterday (published as R. A. Sinn).

JESSE GILBERT is an interdisciplinary artist exploring the intersection of visual art, sound, and software design at his firm Dark Matter Media. He was the founding Chair of the Media Technology department at Woodbury University, and he has taught interactive software design at both CalArts and UC San Diego.

DR. LAURA DENARDIS is Professor and Endowed Chair in Technology, Ethics, and Society and Director of the Center for Digital Ethics at Georgetown University in Washington, DC.  Her book The Internet in Everything: Freedom and Security in a World with No Off Switch (Yale University Press) was recognized as a Financial Times Top Technology Book of 2020. Among her seven books, The Global War for Internet Governance (Yale University Press) is considered a definitive source for understanding cyber governance debates and solutions. Professor DeNardis is an affiliated Fellow of the Yale Information Society Project, where she previously served as Executive Director, and is a life Member of the Council on Foreign Relations. She holds engineering degrees and a PhD in Science and Technology Studies, and was awarded a postdoctoral fellowship from Yale Law School.

Book Talk: The Secret Life of Data
April 18 @ 10am PT / 1pm ET ONLINE
Register now!

Publishers’ brief in Hachette v. Internet Archive: First Impressions

Posted March 15, 2024

Dave Hansen and Kyle Courtney jointly authored this post. They are also the authors of a White Paper on Controlled Digital Lending of Library Books. We are not, as the Publishers claim in their brief on page 13, a “cadre of boosters.” We wrote the paper independently as part of our combined decades of work on libraries and access to knowledge.

Earlier today the publishers (Hachette, Harper Collins, John Wiley, and Penguin Random House) filed their reply brief on appeal in their long-running lawsuit against Internet Archive, which challenges (among other things) the practice of controlled digital lending. 

For the months after the decision, we had been observing all the hot takes, cheers, jeers, and awkward declarations about the case, the Internet Archive itself, and Controlled Digital Lending (CDL).

This post is not part of that fanfare. Here, we want to identify a few critical issues that the publishers focus on in their brief, including some questionable fair use analysis that they repeat from the district court below. Much of the brief is framed in heated rhetoric that may cause alarm, but much like publishers’ announcements about interlibrary loan, e-reserves, or document delivery, we believe controlled digital lending is here to stay, regardless of the lower court’s poor copyright analysis and current publisher’s brief.

Framing the Question

As is often the case, the parties disagree on what this case is actually about. For its part, Internet Archive says in their “Statement of the Issue on Appeal” that the question is  “whether Internet Archive’s controlled digital lending is fair use.” Publishers, on the other hand, reframe the question more broadly, which in combination with their arguments through the brief,  seems intended to not just kill IA’s implementation of controlled digital lending, but to encourage the court to rule in a way that would call into question all other library applications of CDL.. They say that the question is  “whether IA’s infringement of the Publishers’ Works is fair use based on IA’s CDL theories and practices.” 

This litigation, coordinated by the AAP,  seems to us an attempt to undermine what libraries have done for centuries: lend the books that they already lawfully own. Ironically, the opposition calls CDL a made-up theory created by a “cadre of boosters,” but in actuality, it’s the publishers’ licensing system that is a modern, made-up invention. The works themselves are unchanged, but the nature of digital delivery allows publishers to charge people in new ways. There is nothing in the Copyright Act that states ebook licensing is, or should be, the default way for libraries to acquire and lend books. 

Commercial vs. Non-Profit Use

One of the most criticized aspects of the decision below is the lower court’s conclusion that IA’s activities are commercial, as opposed to non-profit. The publisher’s brief enthusiastically embraces this conclusion, while also attempting to drive a wedge between IA’s lending and that of other libraries: “IA’s practices are distinctly commercial – especially in comparison to public and academic libraries.” 

The district court concluded that IA’s activity was commercial because it “stands to profit” through its partnership with Better World Books on its website, and by “us[ing] its Website to attract new members, solicit donations, and bolster its standing in the library community” (p. 26).

As many amici pointed out earlier in the appeal, the use of a nonprofit’s website to solicit donations is routine; it would be chilling for sites like Wikipedia, Project Gutenberg, Hathitrust and others (all of whom filed briefs in this case) to face heightened copyright liability just because they seek donations in combination with aspects of their sites that rely on a fair use assertion.  The publishers attempt to distance themselves from this absurd result (“The concern that Judge Koeltl’s analysis “would render virtually all nonprofit uses commercial” is wildly overblown”), but it is clear from the number and diversity of amici who filed to speak to just this issue that the concern is very real. 

As for Better World Books (BWB): BWB  is an online bookstore and a Certified B Corporation, meaning that it achieves high standards of social and environmental performance, transparency, and accountability. B Corps are committed to using business as a force for good in the world. According to its website, BWB donates books to nonprofit organizations, including the Internet Archive. As of November 2019, IA and BWB have a partnership to digitize books for preservation purposes. 

The focus on the supposedly commercial relationship with Better World Books (a used book reseller) seems to us a stretch based on the facts. The publishers’ brief makes a big deal of Better World Books (referencing them over 20 times in the brief), and argues that IA’s use is commercial because a)  IA encourages readers to purchase books through links on its site to Better World Books, and b) Better World Books donates some funds back to IA.  The first point is perplexing–one would think they’d be pleased that readers are encouraged to purchase copies of their books–even if on the used market. But the later point about BetterWorld Books’ commercial influence on IA’s operation is just not rooted in the facts of the case. As IA laid out in its opening brief, it has only received $5,561.41 from Better World Books in the relevant time frame.  That’s an infinitesimally small drop in the bucket compared to the costs that IA has borne to digitize and lend books for no monetary return from readers. It’s hard to see how such an amount could be construed to tilt IA’s entire operation into a commercial activity. 

For anyone who has actually worked on such projects, it is clear that IA is not archiving or lending books for commercial purposes. The idea that there is money to be made in doing so is laughable. Instead, it is providing access to knowledge and cultural heritage. This fundamental point somehow got lost on the publishers on the road to enormous profits.

eBooks vs. Digitized Books

There are lots of nuances that got lost in the decision below, which we believe were helpfully addressed by amici filings earlier in this appeal (e.g., the privacy implications of licensed ebooks vs. CDL copies lent by libraries).  The publishers seem happy to gloss over the details again in this brief, particularly when it comes to the differences between licensed ebooks and those that are lent out with CDL. 

First, the publisher’s brief makes clear they really don’t like it when books are available for free.. They use the word 33 times (about every other page of the brief)! Many of the references obscure what “free” really means though –  for example, asserting that  “Two Publishers believe that 39-50% of American ebook consumers read their ebooks for free from libraries rather than paying for their own commercial ebooks” (emphasis added) while ignoring the exorbitant costs and other burdens placed on libraries and the public to fund that licensed access. This is a major part of why libraries have responded both by embracing CDL and by advocating for laws that would require fair licensing terms for ebooks. . 

Second, as far as market harm goes, the Publisher’s assert that “IA offered the Publishers’ library and consumer customers a free competing substitute to the authorized ebook editions” essentially arguing that “you can’t compete with free.” But, that is just not true.  Examples are trivially easy to conjure up open source software vs. Microsoft or iOS. How often do you run into someone who uses Libre Open Office, or Ubuntu? And of course in creative industries, we’ve seen this kind of model take hold in numerous areas, including book publishing, with “freemium” models.’

That’s because products that are free often offer a different user experience than those that aren’t. Usually when someone opts to pay, they’re paying for an enhanced experience. The same holds true of books scanned for CDL vs. licensed ebooks. CDL books are just that – they are digitized physical books. They don’t have the nice, crisp text of licensed ebooks, nor the interactive features. You can’t highlight, or change the font, or look up a word by touching it, or do any of the myriad of functions that you can with an ebook. 

That a library is loaning and controlling those copies is also a major distinguishing factor, because borrowing a book from a library (along with all the special privacy protections one receives) provides a vastly different reading environment than one in which vendors can scrape, process and sell data about your reading experience. Notably, the publishers did not engage with this argument. 

“IA refuses to pay the customary price and join the Publishers’ thriving market for authorized library ebooks…”

Good gravy! According to the publishers, libraries should be forced to pay over and over again for the same book, to join a market for which there is no evidence that they are harming. 

The publishers’ devote a large portion of their brief – nearly 20 pages– to arguing about market harm. Most of it comes down to the assertion that mere fact of the existence of a digital book market means that  CDL must negatively impact the rightsholders’ profits (despite no empirical evidence of market harm). The lower court decision stated that IA has the “burden to show a lack of market harm” (p. 43), and concluded (without reference to meaningful evidence) that “that harm here is evident” (p. 44), an assumption which the publishers are happy to rest on. 

There is a genuinely important legal question raised here about which party needs to prove what when it comes to market harm. The publisher’s brief relies heavily on the idea that IA bears the burden on every point of its fair use defense, especially market harm. But as IA points out in its opening brief, 

“Although the Supreme Court has stated fair use is an affirmative defense for which defendants bear the burden (Campbell, 510 U.S. at 1177), it has also suggested this burden may apply differently to noncommercial uses than commercial ones. Sony stated that noncommercial cases require “a showing by a preponderance of the evidence that some meaningful likelihood of future harm exists.” 464 U.S. at 417; see Princeton Univ. Press v. Mich. Document Servs., Inc., 99 F.3d 1381, 1385- 86 (6th Cir. 1996) (“The burden of proof as to market effect rests with the copyright holder if the challenged use is of a ‘noncommercial’ nature.”). 

Conclusion

The brief is predictably hyperbolic, and continues to refuse to allow for any room for digital lending based on a misreading, in our view, of precedents such as Sony, TVEyes, and ReDigi. But, CDL is not some form of library-sanctioned piracy. CDL is based in copyright, fair use, and the public mission of libraries, while also broadening access to the books that library systems spend billions of dollars to collect and maintain for the public—including long-neglected, out-of-print books with enormous social and scholarly value and books for which commercial ebook licenses are not available.

During the pandemic, the importance of digital library access became strikingly apparent. It is unfortunate that the Publishers chose that moment of national emergency to sue a non-profit library for loaning books digitally. CDL simply seeks to preserve the library’s long-established and vital mission to collect and lend books in an increasingly licensed-access digital world.

Authors Alliance 10th Anniversary Event: Authorship in an Age of Monopoly and Moral Panics

Register here for this IN-PERSON event
hosted in San Francisco at the Internet Archive on May 17

Moral panics about technology are nothing new for creators. Copyright, in particular, has been a favorite tool to excite outrage. We were told that the motion picture industry would “bleed and bleed and hemorrhage” if the law didn’t prohibit VCRs. Because of the photocopier, industry experts warned that “the day may not be far off when no one need purchase book.” MP3 players, we were told, would leave us with no professional musicians, but only amateurs. 

Today, we are told that librarians lending books online will undo the publishing industry, and that AI will destroy entire creative industries as we know them.  At the same time, authors face real and unprecedented challenges in reaching readers, working within an increasingly consolidated publishing marketplace, a concentrated technology stack that seems aimed at optimizing ad revenue over all else, and a labyrinth of private agreements over which authors have almost no say. 

So what’s real and what’s hyperbole? Join us on May 17th to celebrate Authors Alliance’s 10th anniversary and be part of an engaging discussion with leading experts to cut through the hype and hear about the real challenges and opportunities facing authors who want to be read. 

The event will include a keynote address from author, activist, and journalist Cory Doctorow, as well as a series of panel discussions with leading experts on authorship, law, technology, and publishing.

Register here
Hosted in person in San Francisco at the Internet Archive
May 17, 2024
4:00pm to 7:00pm
Reception to Follow

4:00 Welcome & Introduction:  Dave Hansen, Executive Director of Authors Alliance

4:15 to 5:15 Technology, the Law, and Authorship

Moderator: Marta Belcher, President and Chair of the Filecoin Foundation as well as the Filecoin Foundation for the Decentralized Web

  • Pamela Samuelson, Richard M. Sherman Distinguished Professor of Law and Information at the University of California, Berkeley
  • David Bamman, Associate Professor, School of Information, University of California, Berkeley
  • Sasha Stiles, award-winning poet,  language artist and AI researcher

5:15 to 6:00 Platforms, the Publishing Industry, and the Public Interest

Moderator: Corynne McSherry, Legal Director, Electronic Frontier Foundation

  • Daphne Keller, Director of the Program on Platform Regulation at Stanford’s Cyber Policy Center 
  • Alison Mudditt, CEO of the Public Library of Science (PLOS)
  • Brewster Kahle, Digital Librarian and founder of the Internet Archive

6:00 to 6:45 Keynote:  Cory Doctorow, science fiction author, activist and journalist 

6:45 Closing remarks

7:00 Reception to follow

For those of you who can’t join us in person, the event will be recorded and video shared out to Authors Alliance members (so if you aren’t a member, join (for free) today!)

Why Fair Use Supports Non-Expressive Uses

Posted February 29, 2024

This post is part of Fair Use Week series, cross-posted at https://sites.harvard.edu/fair-use-week/2024/02/29/fair-use-week-2024-day-four-with-guest-expert-dave-hansen/

AI programs and their outputs raise all sorts of interesting questions–now found in the form of some 20+ lawsuits, many of them massive class actions.

One of the most important questions is whether it is permissible to use copyrighted works as training data to develop AI models themselves, on top of which AI services like ChatGPT are built (read here for a good overview of the component parts and “supply chain” of generative AI, reviewed through a legal lens).

For the question of fair use of AI training data, you’ll find that almost everyone writing about this question in the US context says the answer turns on two or three precedents–especially the Google Books case and the HathiTrust case–and a concept referred to as “non-expressive use” (or sometimes “non-consumptive use”).  This concept of non-expressive use and those cases have proven to be foundational for all sorts of applications that extend well beyond generative AI, including basic web search, plagiarism detection tools, and text and data mining research. Since this idea has received so much attention, I thought this fair use week was a good opportunity to explore what this concept is. 

What is non-expressive use? 

Non-expressive use refers to uses that involve copying, but don’t communicate the expressive aspects of the work to be read or otherwise enjoyed. It is a term coined, as far as I can tell, by law professor Matthew Sag in a series of papers titled “Copyright and Copyright Reliant Technology” (in which he observes that courts have been approving of such uses–for example in search engine cases–albeit without a coherent framework) and then more directly in “Orphan Works as Grist for the Data Mill” and later in an article titled “The New Legal  Landscape for Text Mining and Machine Learning.”  You can do much better than this blog post if you just read Matt’s articles. But, since you’re here, the argument is basically built on two propositions:  

Proposition #1: “Facts are not copyrightable”  is a phrase you’ll hear somewhere near the beginning of the lecture on copyright 101. It, along with the “idea-expression” dichotomy and some related doctrines are some of the ways that copyright law draws a line between protected content and those underlying facts and ideas that anyone is free to use. These protections for free use of facts and ideas are more than just a line in the sand drawn by Congress or the courts. As the U.S. Supreme Court in Eldred v. Ashcroft most recently explained: 

“[The]idea/expression dichotomy strike[s] a definitional balance between the First Amendment and the Copyright Act by permitting free communication of facts while still protecting an author’s expression. Due to this distinction, every idea, theory, and fact in a copyrighted work becomes instantly available for public exploitation at the moment of publication.” (citations and quotations omitted). 

The law has therefore recognized the distinction between expressive non-expressive works (for example, copyright exists in a novel, but not in a phone book), and that this distinction is so important that the Constitution mandates it. The exact contours of this line have been the subject of a long and not always consistent history, but has slowly come into focus in cases from  Baker v. Selden (1879) (“there is a clear distinction between the book, as such, and the art which it is intended to illustrate”) to Feist Publications v. Rural Telephone (1994) (no copyright in telephone white pages). 

Proposition #2: Fair use is also one of the Copyright Act’s First Amendment safeguards, per the Supreme Court Eldred. The “transformative use” analysis, in particular, does a lot of work in giving breathing room for others to use existing works in ways that allow for their own criticism and comment. It also has provided ample space for uses that rely on copying to unearth facts and ideas contained within and about underlying works, particularly when doing so in a way that provides a net social benefit. 

Transformative use, though not always easy to define in practice, favors uses that avoid substituting for the original expression, but that reuse that content in new ways, with new meaning, message and purpose. While this can apply to downstream expressive uses (e.g., parody is the paradigmatic example that relies on reusing expression itself), its application to non-expressive uses can look even stronger. This is why you find courts like the 9th Circuit in a case about image search saying things like “a search engine may be more transformative than a parody because a search engine provides an entirely new use for the original work, while a parody typically has the same entertainment purpose as the original work,” where search engines copy underlying works primarily for the purpose of helping users discover them. 

Fair use for non-expressive use

We now have several cases that address non-expressive uses for computational analysis of texts.   The three cases, in particular, are iParadigms v. ex rel Vanderhye,  in which the Fourth Circuit in 2009 analyzed a plagiarism detection tool that ingested papers and then created a “digital fingerprint” to match them to duplicate content using a statistical technique originally designed to analyze brain waves. The court there concluded that “iParadigms’ use of these works was completely unrelated to expressive content” and therefore constituted transformative fair use. Then in Authors Guild v. HathiTrust and Authors Guild v. Google, we saw the Second Circuit in successive opinions in 2014 and 2015 approve of copying at a massive scale of books used for the purpose of full-text search of those books and related computational, analytical uses. The court, in Google Books, fully briefed on the implications of these projects for computational analysis of texts, explained: 

As with HathiTrust (and iParadigms), the purpose of Google’s copying of the original copyrighted books is to make available significant information about those books, permitting a searcher to identify those that contain a word or term of interest, as well as those that do not include reference to it. In addition, through the ngrams tool, Google allows readers to learn the frequency of usage of selected words in the aggregate corpus of published books in different historical periods. We have no doubt that the purpose of this copying is the sort of transformative purpose described in Campbell.

Example from the Digital Humanities Scholars brief in the Google Books case,
illustrating one text mining use enabled by the Google Books corpus. 

So, back to AI 

There are certainly limits to how much of an underlying work can be described before one crosses the line from non-expressive to substantial use of expressive content. For example, uses that reproduce extensive facts from underlying works to merely repackage content for the same purpose as the original works may face challenges, as in the case of Castlerock Entertainment v. Carol Publishing(about Carol Publishing’s “Seinfeld Aptitude Test” based on facts from the Seinfeld series), which the court concluded as made merely to “repackage Seinfeld to entertain Seinfeld viewers.” And there are real questions (discussed in two excellent recent essays, here and here) about how the law may respond in practice to AI products, particularly ones where outputs look–or at least can be made to look–suspiciously similar to inputs used as training data.

How AI models work is explained much more thoroughly (and much better) elsewhere, but the basic idea is that they are built by developing extraordinarily robust word vectors used to represent the relationships between words. To do this well, these models need to train on a large and diverse set of texts to build out a good model of how humans communicate in a variety of contexts. In short, these copy texts for the purpose of developing a model to describe facts about the underlying works and the relationship of words within them and with each other. What’s new is that we can now do this at a level of complexity and scale almost unimaginable before. Scale and complexity don’t change the underlying principles at issue, however,  and so this kind of training seems to me clearly within the bounds of non-expressive use as approved already by the courts in the cases cited above that authors, researchers, and the tech industry have been relying on for nearly a decade. 

Fair Use Week Webinar: Fair Use in Text Data Mining and Artificial Intelligence

Posted February 16, 2024
Text Miner, generated by MidJourney

Computational research techniques such as text and data mining (TDM) hold tremendous opportunities for researchers across the disciplines ranging from mining scientific articles to create better systematic reviews, or curated chemical property datasets to building a corpus of films to understand how concepts of gender, race, and identity are shared over time. Unfortunately, legal uncertainty, whether through copyright or restrictive terms of use can stifle this research. Recent copyright lawsuits, such as the high-profile cases brought against Microsoft, Github, and StabiltyAI underscore the legal complications.

So how can fair use allow for computational research techniques? Join us for this Fair Use Week webinar, co-sponsored with the the Library Copyright Institute, to find out! 

Wednesday, February 28, 2024
1pm – 2:30pm ET / 10am – 11:30 PT
Register here

We’ve written quite a bit about fair use in TDM and AI for research applications already, and the topic is certainly complicated. Join us for this event to hear live from legal experts and researchers. We plan to include substantial time for Q&A, so bring your questions! Panelists include: 

  • Dave Hansen, Executive Director, Authors Alliance
  • Rachael Samberg, Scholarly Communications Officer, UC Berkeley
  • Lauren Tilton, Claiborne Robins Professor of Liberal Arts and Digital Humanities, University of Richmond

Book Talk: Wrong Way by Joanna McNeil

Posted February 13, 2024

Join us for a VIRTUAL book talk with author Joanne McNeil about her latest book, WRONG WAY, which examines the treacherous gaps between the working and middle classes wrought by the age of AI. McNeil will be in conversation with author Sarah Jaffe.

This is the first Internet Archive / Authors Alliance book talk for a work of fiction! Come for a reading, stay for a thoughtful conversation between McNeil & Jaffe about the labor implications of artificial intelligence.

February 29 @ 10am PT / 1pm ET
VIRTUAL

REGISTER NOW

WRONG WAY was named one of the best books of 2023 by the New Yorker and Esquire. It was the Endless Bookshelf Book of the Year and named one of the best tech books by the LA Times.

“Wrong Way is a chilling portrait of economic precarity, and a disturbing reminder of how attempts to optimize life and work leave us all alienated.”
—Adrienne Westenfeld, Esquire

For years, Teresa has passed from one job to the next, settling into long stretches of time, struggling to build her career in any field or unstick herself from an endless cycle of labor. The dreaded move from one gig to another is starting to feel unbearable. When a recruiter connects her with a contract position at AllOver, it appears to check all her prerequisites for a “good” job. It’s a fintech corporation with progressive hiring policies and a social justice-minded mission statement. Their new service for premium members: a functional fleet of driverless cars. The future of transportation. As her new-hire orientation reveals, the distance between AllOver’s claims and its actions is wide, but the lure of financial stability and a flexible schedule is enough to keep Teresa driving forward.

Joanne McNeil, who often reports on how the human experience intersects with labor and technology brings blazing compassion and criticism to Wrong Way, examining the treacherous gaps between the working and middle classes wrought by the age of AI. Within these divides, McNeil turns the unsaid into the unignorable, and captures the existential perils imposed by a nonstop, full-service gig economy.

REGISTER NOW

About our speakers

JOANNE MCNEIL was the inaugural winner of the Carl & Marilynn Thoma Art Foundation’s Arts Writing Award for an emerging writer. She has been a resident at Eyebeam, a Logan Nonfiction Program fellow, and an instructor at the School for Poetic Computation.
Joanne is the author of Lurking: How a Person Became a User.

SARAH JAFFE is an author, independent journalist, and a co-host of Dissent magazine’s Belabored podcast.

Book Talk: Wrong Way by Joanne McNeil
February 29 @ 10am PT / 1pm ET
VIRTUAL
Register now!

A Copyright Small Claims Update: Defaults and Failure to Opt Out

Posted February 1, 2024

We’ve been tracking for a few years the new copyright small claims court known as the Copyright Claims Board. My last update was in September when I posted a summary of a paper I wrote with Katie Fortney summarizing data about the first year of operations of the court (thanks entirely to Katie for doing the hard work of extracting that data and sharing it in an easy-to-understand format). 

As explained then, the CCB has been slow in processing cases; it only entered a final judgment on the merits in one case when I last wrote. It has now issued a total of 18 final determinations, about half of which are default determinations (cases where the respondent failed to appear or refused to participate in the CCB process). The facts for most of these cases are not very interesting, but two of the most recent caught my attention. 

Oakes v. Heart of Gold Pageant System

The first case, Oakes v. Heart of Gold Pageant System Inc., highlights a concern from opponents of the CCB when it was being debated in Congress. Namely, the CCB’s ability to make default determinations could be a trap for the unwary defendants who don’t understand what the CCB is, what a case before it could mean for them, or what their rights are to opt out of a CCB proceeding. 

The facts are unspectacular: Oakes, a professional photographer represented by Higbee & Associates, filed a CCB complaint against Heart of Gold and its owner, Angel Jameson, for using photographs taken by Oakes on Heart of Gold’s Facebook page and in materials for events it sponsored. Oakes originally filed the claim in July 2022 and then refiled it in August 2022 with some corrections. Oakes then provided the CCB with the required proof of service (proof that Oakes had adequately informed Heart of Gold and Jameson of the CCB claim) in October 2022. 

At this point, the ball was in Heart of Gold and Jameson’s court; she could either respond and defend her use, or (if done within 60 days of service) opt out of the CCB proceeding altogether. Unfortunately for her, she did neither, which resulted in a default determination against her for $4,500. 

We learn in the final determination a little more about Jameson’s lack of participation. As the CCB recounts in its final default determination: 

“At multiple points in this procedural history, Jameson has contacted the CCB, and after communicating with staff, has affirmed each time her intent to not participate in this proceeding.”

“Jameson initially contacted the Board in response to this Zoom link, expressing her disbelief that the Board is a government tribunal.”

“Jameson then sent another email in response to the First Default, requesting an ‘official day in court.’”

“In a subsequent call with CCB staff in March, Jameson indicated that she would not participate.”

“Shortly after the order scheduling the hearing, Jameson contacted the U.S. Copyright Office’s Public Information Office, who placed her in contact with CCB staff. In a follow-up call, CCB staff again explained the proceeding and Jameson again affirmed that she would not participate in the proceeding.”

Jameson missed her opportunity to opt out early in the case – she had a sixty-day window to do so, as defined by CCB regulations. So, her protests later were ineffective to opt out, even though it seems clear that she did not want her case to be heard by the CCB. 

Joe Hand Promotions v. Dawson 

A second default determination case offers a slightly different view of how the CCB treats defaults. The facts are similarly straightforward: Joe Hand is a company that “specializes in commercially licensing premier sporting events to commercial locations such as bars, restaurants, lounges, clubhouses, and similar establishments.” Joe Hand had obtained the exclusive right to sell pay-per-view access to a boxing event–” Deontay Wilder vs. Tyson Fury II,” to commercial establishments, including bars. Joe Hand provided evidence that a California bar, “Bottoms Up,” had shown the match without permission. 

Joe Hand (a frequent filer with the CCB, with 33 cases to its name) ran into a problem in this case, however, because it didn’t actually file its case against Bottoms Up, but instead against the individual that is listed on the bar’s liquor license and ownership documents, Mary Dawson. Even in Dawson’s absence, the CCB was unwilling to rubber-stamp Joe Hand’s claims against her. The final determination explained, 

Beyond the conclusory and clearly boilerplate allegations in the Claim that Dawson (and now-dismissed respondent Giglio) ‘owned, operated, maintained, and controlled the commercial business known as Bottoms Up Bar & Grill’ and ‘had a right and ability to supervise the activities of the Establishment on the date of the Program and had an obvious and direct financial interest in the activities of the Establishment on the date of the Program’ (Dkt. 1), Claimant offers absolutely no information linking Respondent to the infringement.” 

I will spare you the details, but the CCB went on to cite case after case explaining why courts have routinely rejected such boilerplate claims, and required plaintiffs to at least allege meaningful facts connecting an individual to an act of infringement.  Even in this default case where Dawson was not present to defend herself, the CCB put in the effort on her behalf. 

Takeaways

I have a few observations. In the first case, given that Jameson clearly did not want her case heard before the CCB, I think it would have been fair for the CCB to allow her a second chance to opt out. At least on the record we have available, there is no indication that the CCB offered her that chance.  Although the normal opt-out period extends only sixty days after service, the CCB opt-out regulations also state that “the Board may extend the 60-day period to opt out in exceptional circumstances and in the interests of justice.” 

It seems to me, given the newness of the CCB system, the small number of cases filed to date, and the relative lack of awareness among most people that the CCB is a legitimate government forum (Jameson expressed such doubt herself), the “interests of justice” may well dictate a more flexible approach at least at the outset of operations of the CCB. 

The CCB has demonstrated an extraordinary willingness to offer helpful guidance, flexibility, and multiple opportunities to claimants, and so respondents may have expected a similar approach to help them along through the process. At least in this case, we see a more stringent approach. An obvious takeaway for respondents then is to pay attention to notices about CCB claims and associated deadlines, and opt-out early on in the process if they think they don’t want their case heard there. 

The Dawson case, however, does show that the CCB isn’t willing to let claimants make unsubstantiated claims against absent respondents. Though Joe Hand is surely familiar with the process and it would have been easy for the CCB to accept its barebones allegations against Dawson as true, the CCB made the case itself–with ample legal support–that even claims against absent respondents require claimants to make a real case. 

Overall, these are just two cases,  so I don’t want to read into them too much. But it’s already looking like a large portion of CCB cases will be defaults (10 out of the 18 final determinations to date, and more than half of the existing active cases are trending in that direction). So, it’s good to keep an eye on how the CCB will treat these types of cases, given the risks they pose for unwary and uninformed respondents. 

Authors Alliance 2023 Annual Report

Posted January 23, 2024

Authors Alliance is pleased to share our 2023 annual report, where you can find highlights of our work in 2023 to promote laws, policies, and practices that enable authors to reach wide audiences. In the report, you can read about how we’re helping authors meet their dissemination goals for their works, representing their interests in the courts, and otherwise working to advocate for authors who write to be read. 

Click here to read the full report.