Category Archives: Uncategorized

Hachette v. Internet Archive Update: Oral Argument Before the Second Circuit Court of Appeals

This is a short update on the Hachette v. Internet Archive controlled digital lending lawsuit, which is currently pending on appeal before the Second Circuit Court of Appeals. The court held oral argument in the case today. [July 2 update: a recording of the hearing is available here.]

We’ve covered the background of this suit numerous times – it is in essence about whether it is permissible for libraries to digitize and lend books in their collections in a lend-like-print manner (e.g., only providing access to one user at a time based on the number of copies the library owns in print). 

At this point, both parties have fully briefed the court on their legal arguments, bolstered on both sides by numerous amicus briefs explaining the broader implications of the case for authors, publishers, libraries, and readers (you can find the full docket, including these briefs online here). 

Our amicus brief, which received a nice shout-out from Internet Archive’s counsel in oral argument today, was filed in support of the Internet Archive and controlled digital lending, argues that many authors benefit from CDL because it enhances access to their work, aids in preservation, and supports their efforts to research and build upon existing works to create new ones. 

What happened at oral argument

Compared to the District Court proceedings, this oral argument went much better for Internet Archive. Whether Internet Archive will prevail is another question, but it did seem to me the panel was genuinely trying to understand the basic rationale for CDL, whether there is a credible argument for distinguishing between CDL copies and licensed ebooks, and what kind of burden the plaintiff or defendant should bear in proving or disproving market harm. Overall, I felt the panel gave both sides a fair hearing and is interested in the broader implications of this case. 

A few highlights: 

  • It almost seemed that the panel assumed the district court got it wrong when it concluded that Internet Archive’s use was commercial in nature, rather than nonprofit (an important distinction in fair use cases). The district court adopted a novel approach, finding that IA’s connection with Better World Books and its solicitation of donations on webpages that employ CDL pushed it into the “commercial” category. The panel on appeal seemed skeptical, for example, commenting on how meager the $5000 was that Internet Archive actually made on the arrangement. Looking beyond controlled digital lending, this is an important issue for all nonprofit users, and I’m hopeful that the Second Circuit sees the importance of correcting the lower court on this point. 
  • At least some members of the panel seemed to appreciate the incongruity of a first sale doctrine that applies only to physical books but somehow not to digital lending. One particularly good question on this, directed to the publishers’ counsel, was about whether in the absence of section 109, library physical lending would be permissible as a fair use or otherwise. This was helpful, I think, because it stripped away the focus on the text of 109 and refocused the discussion on the underlying principles of exhaustion–i.e., what rights do libraries and other owners of copies get when they buy copies. 

There were also a few concerning exchanges: 

  • At one point, there was a line of questioning about whether fair use could override or provide for a broader scope of uses than what Congress had provided to libraries in Section 108 (the part of the copyright act that has very specific exceptions for things like libraries making preservation copies). Even the publishers’ lawyer wasn’t willing to argue that libraries’ rights are fully covered by Section 108 of the Copyright Act and that fair use didn’t apply–likely because of course that issue was addressed directly in Authors Guild HathiTrust, and she knew it–but it was a concerning exchange nonetheless.

I also came away with several questions:

  • Each member of the panel asked probing questions to both sides about the importance of market harm and, more specifically, what kind of proof is required to demonstrate market harm to the publishers. It was hard to tell which direction any were leaning on this–while there was some acknowledgment that there wasn’t really any hard evidence about the market effect, members of the panel also made several remarks about the logic of CDL copies replacing ebook sales as being common sense. 
  • The panel asked a number of questions about the role of fair use in responding to law new technology. Should fair use be employed to help smooth over bumps caused by new technology, or should courts be more conservative in its application in cases where Congress has chosen not to act?  Despite several questions about this issue, I came away with no clear read on what the panel thought might be the correct framework in a case like this.

It’s folly to predict, but I came away optimistic that the panel will correct many of the errors from the District Court below. 

Introducing the Authors Alliance’s First Zine: Can Authors Address AI Bias?

Posted May 31, 2024

This guest post was jointly authored by Mariah Johnson and Marcus Liou, student attorneys in Georgetown’s Intellectual Property and Information Policy (iPIP) Clinic.

Generative AI (GenAI) systems perpetuate biases, and authors can have a potent role in mitigating such biases.

But GenAI is generating controversy among authors. Can authors do anything to ensure that these systems promote progress rather than prevent it? Authors Alliance believes the answer is yes, and we worked with them to launch a new zine, Putting the AI in Fair Use: Authors’ Abilities to Promote Progress, that demonstrates how authors can share their works broadly to shape better AI systems. Drawing together Authors Alliance’s past blog posts and advocacy discussing GenAI, copyright law, and authors, this zine emphasizes how authors can help prevent AI bias and protect “the widest possible access to information of all kinds.” 

As former Copyright Register Barbara Ringer articulated, protecting that access requires striking a balance with “induc[cing] authors and artists to create and disseminate original works, and to reward them for their contributions to society.” The fair use doctrine is often invoked to do that work. Fair use is a multi-factor standard that allows limited use of copyrighted material—even without authors’ credit, consent, or compensation–that asks courts to examine:

(1) the purpose and character of the use, 

(2) the nature of the copyrighted work, 

(3) the amount or substantiality of the portion used, and 

(4) the effect of the use on the potential market for or value of the work. 

While courts have not decided whether using copyrighted works as training data for GenAI is fair use, past fair use decisions involving algorithms, such as Perfect 10, iParadigms, Google Books, and HathiTrust favored the consentless use of other people’s copyrighted works to create novel computational systems. In those cases, judges repeatedly found that algorithmic technologies aligned with the Constitutional justification for copyright law: promoting progress.

But some GenAI outputs prevent progress by projecting biases. GenAI outputs are biased in part because they use biased, low friction data (BLFD) as training data, like content scraped from the public internet. Examples of BLFD include Creative Commons (CC) licensed works, like Wikipedia, and works in the public domain. While Wikipedia is used as training data in most AI systems, its articles are overwhelmingly written by men–and that bias is reflected in shorter and fewer articles about women. And because the public domain cuts off in the mid-1920s, those works often reflect the harmful gender and racial biases of that time. However, if authors allow their copyrighted works to be used as GenAI training data, those authors can help mitigate some of the biases embedded in BLFD. 

Current biases in GenAI are disturbing. As we discuss in our zine, word2vec is a very popular toolkit used to help machine learning (ML) models recognize relationships between words–like women as homemakers and Black men with the word “assaulted.” Similarly, OpenAI’s GenAI chatbox ChatGPT, when asked to generate letters of recommendation, used “expert,” “reputable,” and “authentic” to describe men and  “beauty,” “stunning,” and “emotional” for women, discounting women’s competency and reinforcing harmful stereotypes about working women. An intersectional perspective can help authors see the compounding impact of these harms. What began as a legal framework to describe why discrimination law did not adequately address harms facing Black women, it is now used as a wider lens to consider how marginalization affects all people with multiple identities. Coined by Professor Kimberlé Crenshaw in the late 1980s, intersectionality uses critical theory like Critical Race Theory, feminism, and working-class studies together as “a lens . . . for seeing the way in which various forms of inequality often operate together and exacerbate each other.” Contemporary authors’ copyrighted works often reflect the richness of intersectional perspectives, and using those works as training data can help mitigate GenAI bias against marginalized people by introducing diverse narratives and inclusive language. Not always–even recent works reflect bias–but more often than might be possible currently.

Which brings us back to fair use. Some corporations may rely on the doctrine to include more works by or about marginalized people in an attempt to mitigate GenAI bias. Professor Mark Lemley and Bryan Casey have suggested “[t]he solution [to facial recognition bias] is to build bigger databases overall or to ‘oversample’ members of smaller groups” because “simply restricting access to more data is not a viable solution.” Similarly, Professor Matthew Sag notes that “[r]estricting the training data for LLMs to public domain and open license material would tend to encode the perspectives, interests, and biases of a distinctly unrepresentative set of authors.” However, many marginalized people may wish to be excluded from these databases rather than have their works or stories become grist for the mill. As Dr. Anna Lauren Hoffman warns, “[I]nclusion reinforces the structural sources of violence it supposedly addresses.”

Legally, if not ethically, fair use may moot the point. The doctrine is flexible, fact-dependent, and fraught. It’s also fairly predictable, which is why legal precedent and empirical work have led many legal scholars to believe that using copyrighted works as training data to debias AI will be fair use–even if that has some public harms. Back in 2017, Professor Ben Sobel concluded that “[i]f engineers made unauthorized use of copyrighted data for the sole purpose of debiasing an expressive program, . . . fair use would excuse it.” Professor Amanda Levendowski has explained why and how “[f]air use can, quite literally, promote creation of fairer AI systems.” More recently, Dr. Mehtab Khan and Dr. Alex Hanna  observed that “[a]ccessing copyright work may also be necessary for the purpose of auditing, testing, and mitigating bias in datasets . . . [and] it may be useful to rely on the flexibility of fair use, and support access for researchers and auditors.” 

No matter how you feel about it, fair use is not the end of the story. It is ill-equipped to solve the troubling growth of AI-powered deepfakes. After being targeted by sexualized deepfakes, Rep. Ocasio-Cortez described “[d]eepfakes [as] absolutely a way of digitizing violent humiliation against other people.” Fair use will not solve the intersectional harms of AI-powered face surveillance either. Dr. Joy Buolamwini and Dr. Timnit Gebru evaluated leading gender classifiers used to train face surveillance technologies and discovered that they more accurately classified males over females and lighter-skinned over darker-skinned people. The researchers also discovered that the “classifiers performed worst on darker female subjects.” While legal scholars like Professors Shyamkrishna Balganesh, Margaret Chon, and Cathay Smith argue that copyright law can protect privacy interests, like the ones threatened by deepfakes or face surveillance, federal privacy laws are a more permanent, comprehensive way to address these problems.

But who has time to wait on courts and Congress? Right now, authors can take proactive steps to ensure that their works promote progress rather than prevent it. Check out the Authors Alliance’s guides to Contract Negotiations, Open Access, Rights Reversion, and Termination of Transfer to learn how–or explore our new zine, Putting the AI in Fair Use: Authors’ Abilities to Promote Progress.

You can find a PDF of the Zine here, as well as printer-ready copies here and here.

Authors Alliance Submits Amicus Brief in Tiger King Fair Use Case

Posted May 6, 2024

By Dave Hansen

Have you ever used a photograph to illustrate a historical event in your writing? Or quoted, say from a letter, to point out some fact that the author conveyed in their writing? According to the 10th Circuit, these aren’t the kinds of uses that fair use supports. 

On Thursday, Authors Alliance joined with EFF, the Association of Research Libraries, the American Library Association, and Public Knowlege in filing an amicus brief asking the 10th Circuit Court of Appeals to reconsider its recent fair use decision in Whyte Monkee v. Netflix. 

The case is about Netflix’s use of a funeral video recording in its documentary series Tiger King, a true crime documentary about Joseph Maldanado, aka Joe Exotic, an excentric zookeeper, media personality, exotic animal owner, and convicted felon. The recording at issue was created by Timothy Sepi/Whyte Monkee, as a memorial for Travis Maldonado, Joe Exotic’s late husband. Netflix used about 60 seconds of the funeral video in its show. Its purpose was, among other things, to “illustrate Mr. Exotic’s purported megalomania, even in the face of tragedy.” 

A three-judge panel of the 10th Circuit issued its opinion in late March, concluding that Netflix’s use was not “transformative” under the first fair use factor and therefore disfavored as a fair use. The panel relied heavily on the Supreme Court’s recent decision in Andy Warhol v. Goldsmith, taking that case to mean that uses that do not comment or criticize the artistic and creative aspects of the underlying work are generally disfavored. So, the court concluded: 

Defendants’ use of the Funeral Video is not transformative under the first fair use factor. Here, Defendants did not comment on or “target” Mr. Sepi’s work at all; instead, Defendants used the Funeral Video to comment on Joe Exotic. More specifically, Defendants used the Funeral Video to illustrate Mr. Exotic’s purported megalomania, even in the face of tragedy. By doing so, Defendants were providing a historical reference point in Mr. Exotic’s life and commenting on Mr. Exotic’s showmanship. However, Defendants’ use did not comment on Mr. Sepi’s video—i.e., its creative decisions or its intended meaning.

You can probably see the problem. Fair use has, for a very long time, supported a wide variety of other uses that incorporate existing works as historical reference points and illustrations. Although the Supreme Court talked a lot about criticism and comment in its Warhol opinion (which made sense, given that the use before it was a purported artistic commentary), I think very few people interpreted that decision to mean that only commentary and criticism are permissible transformative fair uses. But as our brief points out, the panel’s decision essentially converts the Supreme Court’s decision in Warhol from a nuanced reaffirmation of fair use precedent into a radical rewrite of the law that only supports those kinds of uses. 

Our brief argues that the 10th Circuit misread the Supreme Court’s opinion in Warhol, and that it ignored decades of fair use case law. We point to a few good examples – e.g., Time v. Bernard Geis (a 1968 case finding fair use of a recreation of the famous Zapruder film in a book titled “Six Seconds in Dallas,” analyzing President Kennedy’s assassination), New Era Publications v. Carol Publishing (a 1990 case supporting reuse of lengthy quotations of L Ron Hubbard in a book about him, to make a point about Hubbard’s “hypocrisy and pomposity”) and Bill Graham Archives v. Dorling Kindersley (a 2006 case finding fair use of Grateful Dead concert posters in a book using them as historical reference points). 

Our brief also highlights how communities of practice such as documentary filmmakers, journalists, and nonfiction writers have come to rely on fair use to support these types of uses–so much so that these practices are codified in best practices here, here, and even here Authors Alliance’s own Fair Use for Nonfiction Authors guide. 

Although it is rare for appellate courts to grant rehearing of already issued opinions, this opinion has drawn quite a lot of negative attention. In addition to our amicus brief, there were amicus briefs filed in support of rehearing from: 

Given the broad and negative reach of this decision, I hope the 10th Circuit will pay attention and grant the request. 

Authors Alliance 10th Anniversary Event: Authorship in an Age of Monopoly and Moral Panics

Register here for this IN-PERSON event
hosted in San Francisco at the Internet Archive on May 17

Moral panics about technology are nothing new for creators. Copyright, in particular, has been a favorite tool to excite outrage. We were told that the motion picture industry would “bleed and bleed and hemorrhage” if the law didn’t prohibit VCRs. Because of the photocopier, industry experts warned that “the day may not be far off when no one need purchase book.” MP3 players, we were told, would leave us with no professional musicians, but only amateurs. 

Today, we are told that librarians lending books online will undo the publishing industry, and that AI will destroy entire creative industries as we know them.  At the same time, authors face real and unprecedented challenges in reaching readers, working within an increasingly consolidated publishing marketplace, a concentrated technology stack that seems aimed at optimizing ad revenue over all else, and a labyrinth of private agreements over which authors have almost no say. 

So what’s real and what’s hyperbole? Join us on May 17th to celebrate Authors Alliance’s 10th anniversary and be part of an engaging discussion with leading experts to cut through the hype and hear about the real challenges and opportunities facing authors who want to be read. 

The event will include a keynote address from author, activist, and journalist Cory Doctorow, as well as a series of panel discussions with leading experts on authorship, law, technology, and publishing.

Register here
Hosted in person in San Francisco at the Internet Archive
May 17, 2024
4:00pm to 7:00pm
Reception to Follow

4:00 Welcome & Introduction:  Dave Hansen, Executive Director of Authors Alliance

4:15 to 5:15 Technology, the Law, and Authorship

Moderator: Marta Belcher, President and Chair of the Filecoin Foundation as well as the Filecoin Foundation for the Decentralized Web

  • Pamela Samuelson, Richard M. Sherman Distinguished Professor of Law and Information at the University of California, Berkeley
  • David Bamman, Associate Professor, School of Information, University of California, Berkeley
  • Sasha Stiles, award-winning poet,  language artist and AI researcher

5:15 to 6:00 Platforms, the Publishing Industry, and the Public Interest

Moderator: Corynne McSherry, Legal Director, Electronic Frontier Foundation

  • Daphne Keller, Director of the Program on Platform Regulation at Stanford’s Cyber Policy Center 
  • Alison Mudditt, CEO of the Public Library of Science (PLOS)
  • Brewster Kahle, Digital Librarian and founder of the Internet Archive

6:00 to 6:45 Keynote:  Cory Doctorow, science fiction author, activist and journalist 

6:45 Closing remarks

7:00 Reception to follow

For those of you who can’t join us in person, the event will be recorded and video shared out to Authors Alliance members (so if you aren’t a member, join (for free) today!)

Authors Alliance Submits Amicus Brief to the Second Circuit in Hachette Books v. Internet Archive

Posted December 21, 2023
Photo by Dylan Dehnert on Unsplash

We are thrilled to announce that we’ve submitted an amicus brief to the Second Circuit Court of Appeals in Hachette Books v. Internet Archive—the case about whether controlled digital lending is a fair use—in support of the Internet Archive. Authored by Authors Alliance Senior Staff Attorney, Rachel Brooke, the brief reprises many of the arguments we made in our amicus brief in the district court proceedings and elaborates on why and how the lower court got it wrong, and why the case matters for our members and other authors who write to be read.

The Case

We’ve been writing about this case for years—since the complaint was first filed back in 2020. But to recap: a group of trade publishers sued the Internet Archive in federal court in the Southern District of New York over (among other things) the legality of its controlled digital lending (CDL) program. The publishers argued that the practice infringed their copyrights, and Internet Archive defended its project on the grounds that it was fair use. We submitted an amicus brief in support of IA and CDL (which we have long supported as a fair use) to the district court, explaining that copyright is about protecting authors, and many authors strongly support CDL

The case finally went to oral argument before a judge in March of this year. Unfortunately, the judge ruled against Internet Archive, finding that each of the fair use factors favored the publishers. Internet Archive indicated that it planned to appeal, and we announced that we planned to support them in those efforts. Now, the case is before the Second Circuit Court of Appeals. After Internet Archive filed its opening brief last week, we (and other amici) filed our briefs in support of a reversal of the lower court’s decision.

Our Brief

Our amicus brief argues, in essence, that the district court  judge failed to adequately consider the interests of authors.  While the commercial publishers in the case did not support CDL, those publishers’ interests do not always align with authors’ and they certainly do not speak for all authors. We conducted outreach to authors, including launching a CDL survey, and uncovered a diversity of views on CDL—most of them extremely positive. We offered up these authors’ perspectives to show the court that many authors do support CDL, contrary to the representations of the publishers. Since copyright is about incentivizing new creation for the benefit of the public and protecting author interests, we felt these views were important for the Second Circuit to hear. 

We also sought to explain how the district court judge got it wrong when it comes to fair use. One of the key findings in the lower court decision was that loans of CDL scans were direct substitutes for loans of licensed ebooks. We explained that this is not the case: a CDL scan is not the same thing as an ebook, they look different and have different functions and features. And CDL scans can be resources for authors conducting research in some key ways that licensed ebooks cannot. Out of print books and older editions of books are often available as CDL scans but not licensed ebooks, for example.

Another issue from the district court opinion that we addressed was the judge’s finding that IA’s use of the works in question was “commercial.” We strongly disagreed with this conclusion: borrowing a CDL scan from IA’s Open Library is free, and the organization—which is also a nonprofit—actually bears a lot of expenses related to digitization. Moreover, the publishers had failed to establish any concrete financial harm they had suffered as a result of IA’s CDL program. We discussed a recent lawsuit in the D.C. Circuit, ASTM v. PRO, to further push back on the district court’s conclusion on commerciality. 

You can read our brief for yourself here, or find it embedded at the bottom of this post. In the new year, you can expect another post or two with more details about our amicus brief and the other amicus briefs that have been, or soon will be, submitted in this case.

What’s Next?

Earlier this week, the publishers proposed that they file their own brief on March 15, 2024—91 days after Internet Archive filed its opening brief. The court’s rules stipulate that any amici supporting the publishers file their briefs within seven days of the publishers’ filing. Then, the parties can decide to submit reply briefs, and will notify the court of their intent to do so. Finally, the parties can choose to request oral argument, though the court might still decide to decide the case “on submission,” i.e., without oral argument. If the case does proceed to oral argument, a three-judge panel will hear from attorneys for each side before rendering their decision. We expect the process to extend into mid-2024, but it can take quite a while for appeals courts to actually hand down their decision. We’ll keep our readers apprised of any updates as the case moves forward.

Authors-Alliance-Second-Circuit-Amicus-Brief_Filed

An Open Letter Regarding Copyright Reform on Behalf of South African Authors

Posted September 25, 2023
Photo by Jacques Nel on Unsplash

Today we are very pleased to share an open letter regarding copyright reform on behalf of South African authors. The letter is available here and is also available as a PDF (with names as of today) here.

The letter comes at a critical decision making moment for South Africa’s Copyright Amendment Bill which has been debated for years (read more here and here on our views). We believe it is important for lawmakers to hear from authors who support this bill, and in particular hear from us about why we view its fair use provisions and author remuneration provisions so positively.

We welcome other South African authors to add their names to the letter to express their support. You can do so by completing this form.

An-open-letter-regarding-copyright-reform-on-behalf-of-South-African-Authors-FINAL

Prosecraft, text and data mining, and the law

Posted August 14, 2023

Last week you may have read about a website called prosecraft.io, a site with an index of some 25,000 books that provided a variety of data about the texts (how long, how many adverbs, how much passive voice) along with a chart showing sentiment analysis of the works in its collection and displayed short snippets from the texts themselves, two paragraphs representing the most and least vivid from the text. Overall, it was a somewhat interesting tool, promoted to authors to better understand how their work compares to those of other published works. 

The news cycle about prosecraft.io was about the campaign to get its creator Benji Smith to take the site down (he now has) based on allegations of copyright infringement. A Gizmodo story about it generated lots of attention, and it’s been written up extensively, for example here, here, here, and here.  

It’s written about enough that I won’t repeat the whole saga here. However, I think a few observations are worth sharing:  

1) Don’t get your legal advice from Twitter (or whatever its called)

Fair Use does not, by any stretch of the imagination, allow you to use an author’s entire copyrighted work without permission as a part of a data training program that feeds into your own ‘AI algorithm.’”  – Linda Codega, Gizmodo (a sentiment that was retweeted extensively)

Fair use actually allows quite a few situations where you can copy an entire work, including situations when you can use it as part of a data training program (and calling an algorithm “AI” doesn’t magically transform it into something unlawful). For example, way back in 2002 in Kelly v. Ariba Soft, the 9th Circuit concluded that it was fair use to make full text copies of images found on the internet for the purpose of enabling web image search. Similarly, in AV ex rel Vanderhye v. iParadigms, the 4th Circuit in 2009 concluded that it was fair use to make full text copies of academic papers for use in a plagiarism detection tool.  

Most relevant to prosecraft, in Authors Guild v. HathiTrust (2014)  and Authors Guild v. Google (2015) the Second Circuit held that Google’s copying of millions of books for purposes of creating a massive search engine of their contents was fair use . Google produced full-text searchable databases of the works, and displayed short snippets containing whatever term the user had searched for (quite similar to prosecraft’s outputs). That functionality also enabled a wide range of computer-aided textual analysis, as the court explained: 

The search engine also makes possible new forms of research, known as “text mining” and “data mining.” Google’s “ngrams” research tool draws on the Google Library Project corpus to furnish statistical information to Internet users about the frequency of word and phrase usage over centuries.  This tool permits users to discern fluctuations of interest in a particular subject over time and space by showing increases and decreases in the frequency of reference and usage in different periods and different linguistic regions. It also allows researchers to comb over the tens of millions of books Google has scanned in order to examine “word frequencies, syntactic patterns, and thematic markers” and to derive information on how nomenclature, linguistic usage, and literary style have changed over time. Authors Guild, Inc., 954 F.Supp.2d at 287. The district court gave as an example “track[ing] the frequency of references to the United States as a single entity (‘the United States is’) versus references to the United States in the plural (‘the United States are’) and how that usage has changed over time.”

While there are a number of generative AI cases pending (a nice summary of them is here) that I agree raise some additional legal questions beyond those directly answered in Google Books, the kind of textual analysis that prosecraft.io offered seems remarkably similar to the kinds of things that the courts have already said are permissible fair uses. 

2) Text and data mining analysis has broad benefits

Not only is text mining fair use, it also yields some amazing insights that truly “promote the progress of Science,” which is what copyright law is all about.  Prosecraft offered some pretty basic insights into published books – how long, how many adverbs, and the like. I can understand opinions being split on whether that kind of information is actually helpful for current or aspiring authors. But, text mining can reveal so much more. 

In the submission Authors Alliance made to the US Copyright Office three years ago in support of a Section 1201 Exemption permitting text data mining, we explained:

TDM makes it possible to sift through substantial amounts of information to draw groundbreaking conclusions. This is true across disciplines. In medical science, TDM has been used to perform an overview of a mass of coronavirus literature.Researchers have also begun to explore the technique’s promise for extracting clinically actionable information from biomedical publications and clinical notes. Others have assessed its promise for drawing insights from the masses of medical images and associated reports that hospitals accumulate. 

In social science, studies have used TDM to analyze job advertisements to identify direct discrimination during the hiring process.7 It has also been used to study police officer body-worn camera footage, uncovering that police officers speak less respectfully to Black than to white community members even under similar circumstances.

TDM also shows great promise for drawing insights from literary works and motion pictures. Regarding literature, some 221,597 fiction books were printed in English in 2015 alone, more than a single scholar could read in a lifetime. TDM allows researchers to “‘scale up’ more familiar humanistic approaches and investigate questions of how literary genres evolve, how literary style circulates within and across linguistic contexts, and how patterns of racial discourse in society at large filter down into literary expression.” TDM has been used to “observe trends such as the marked decline in fiction written from a first-person point of view that took place from the mid-late 1700s to the early-mid 1800s, the weakening of gender stereotypes, and the staying power of literary standards over time.” Those who apply TDM to motion pictures view the technique as every bit as promising for their field. Researchers believe the technique will provide insight into the politics of representation in the Network era of American television, into what elements make a movie a Hollywood blockbuster, and into whether it is possible to identify the components that make up a director’s unique visual style [citing numerous letters in support of the TDM exemption from researchers].

3) Text and data mining is not new and it’s not a threat to authors

Text mining of the sort it seemed prosecraft employed isn’t some kind of new phenomenon. Marti Hearst, a professor at UC Berkeley’s iSchool explained the basics in this classic 2003 piece. Scores of computer science students experiment with projects to do almost exactly what prosecraft was producing in their courses each year. Textbooks like Matt Jockers’s Text Analysis with R for Students of Literature have been widely used and adopted all across the U.S. to teach these techniques. Our submissions during our petition for the DMCA exemption for text and data mining back in 2020 included 14 separate letters of support from authors and researchers engaged in text data mining research, and even more researchers are currently working on TDM projects. While fears over generative AI may be justified for some creators (and we are certainly not oblivious to the threat of various forms of economic displacement), it’s important to remember that text data mining on textual works is not the same as generative AI. On the contrary, it is a fair use that enriches and deepens our understanding of literature rather than harming the authors who create it.

Copyright Office Holds Listening Session on Copyright Issues in AI-Generated Visual Works

Photo by Debby Hudson on Unsplash

Earlier this week, the Copyright Office convened a second listening session on the topic of copyright issues in AI-generated expressive works, a part of its initiative to study and understand the issue, and following its listening session on copyright issues in AIgenerated textual works a few weeks back (in which Authors Alliance participated). Tuesday’s sessions covered copyright issues in images created by generative AI programs, a topic that has garnered substantial public attention and controversy in recent months.

Participants in the listening sessions included a variety of professional artist organizations like National Press Photographers Association, Graphic Artists Guild, and Professional Photographers of America; companies that have created the generative AI tools under discussion, like Stability AI, Jasper AI, and Adobe; several individual artists; and a variety of law school professors, attorneys, and think tanks representing varied and diverse views on copyright issues in AI-generated images. 

Generative AI as a Powerful Artistic Tool

Most if not all of the listening sessions’ participants agreed that generative AI programs had the potential to be incredible tools for artists. Like earlier technological developments such as manual cameras and, much more recently, image editing software like Photoshop, generative AI programs can minimize or eliminate some of the “mechanical” aspects of creation, making creation less time-consuming. But participants disagreed on the impact these tools are having on artists and whether the tools themselves or copyright law ought to be reformed to address these effects. 

Visual artists, and those representing them, tended to caution that these tools should be developed in a way that does not hurt the livelihoods of the artists who created the images the programs are trained on. While a more streamlined creative process makes things easier for artists relying on generative AI in their creation, it could also mean fewer opportunities for others artists. When a single designer can easily create background art with Midjourney, for example, they might not need to hire another designer for that task. This helps the first designer to the detriment of the second. Those representing the companies that create and market generative AI programs, including Jasper AI and Stability AI, focused on the ways that their tools are already helping artists: these tools can generate inspiration images as “jumping off points” for visual artists and lower barriers to entry for aspiring visual artists who may not have the technical skills to create visual art without support from these kinds of tools, for example. 

On the other hand, some participants voiced concerns about ethical issues in AI-generated works. A representative from the National Press Photographers Association mentioned concerns that AI-generated images could be used for “bad uses,” and creators of the training data could be associated with these kinds of uses. Deepfakes and “images used to promote social unrest” are some of the uses that photojournalists and other creators are concerned about. 

Copyright Registration in AI-Generated Visual Art

Several participants expressed approval of the Copyright Office’s recent guidance regarding registration in AI-generated works, but others called for greater clarity in the registration guidance. The guidance reiterates that there is no copyright protection in works created by generative AI programs, because of copyright’s human authorship requirement. It instructs creators that they can only obtain copyright registration for the portions of the work they actually created, and must disclose the role of generative AI tools in creating their works if it is more than de minimis. An author can also obtain copyright protection for a selection and arrangement of AI-generated works as a compilation, but not in the AI-generated images themselves. Yet open questions, particularly in the context of AI-generated visual art, remain: how much does an artist need to add to an image to render it their own creation, rather than the product of a generative AI tool? In other words, how much human creativity is needed to transform an AI-generated image into the product of original human creation for the purposes of copyright? How are we to address situations where a human and AI program “collaborate” on the creation of a work? The fact that the Office’s guidance requires applicants to disclose if they used AI programs in the creation of their work also leaves open questions. If an artist uses a generative AI program to create just one element of a larger work, or as a tool for inspiration, must that be disclosed in copyright registration applications? 

The attorney for Kristina Kashtanova, the artist who applied for a copyright registration for her graphic novel, Zarya of the Dawn also spoke. If you haven’t been tracking it, Zarya of the Dawn included many AI-generated images and sparked many of the conversations around copyright in AI-generated visual works (you can read our previous coverage of the Office’s decision letter on Zarya of the Dawn here). Kashtanova’s attorney raised more questions about the registration guidance. She pointed out that the amount of creativity required to create a copyrighted work is very low—there must be more than a “modicum” of creativity, meaning that vast quantities of works (like each of the photographs we take with our smartphones) are eligible for copyright protection. Why, then, is the bar higher when it comes to AI-generated works? Kashantova certainly had to be quite creative to put together her graphic novel, and the act of writing a prompt for the image generator, refining that prompt, and re-prompting the tool until the creator gets an image they are satisfied with requires a fair amount of creative human input. More, one might argue, than is required to take a quick digital photograph. The registration guidance attempts to solve the problem of copyright protection in works not created by a human, but in so doing, it creates different copyrightability standards for different types of creative processes. 

These questions will become all the more relevant as artists increasingly rely on AI programs to create their works. The representative from Getty Images stated that more than half of their consumers now use generative AI programs to create images as part of their workflows, and several of the professional artist organizations noted that many of their members were similarly taking up generative AI tools in their creation.

Calls For Greater Transparency

Many participants expressed a desire for the companies designing and making available generative AI programs to be more transparent about the contents of these tools’ training data. This appealed both to artists who were concerned that their works were used to train the models, and felt this was fundamentally unfair, and those with ethical concerns around scraping or potential copyright infringement. Responsive to these critiques, Adobe explained that it sought to develop its new AI image generator, Firefly (which is currently in beta testing) in a way that is responsive to these kinds of concerns. Adobe explained that it planned to train its tool on openly licensed images, seeking to “drive transparency standards” and “deploy [the] technology responsibly in a way that respects creators and our communities at large.” The representative from Getty Images also called for greater transparency in training data. Getty stated that transparency could help mitigate the legal and economic risks associated with the use of generative AI programs—potential copyright claims as well as the possibility of harming the visual artists who created the underlying works they are trained on. 

Opt-Outs and Licensing 

Related to calls for transparency, much of the discussion centered around attempts to permit artists to opt out of having their works included in the training data used for generative AI programs. Like robots.txt, a tag that allows websites to indicate to web crawlers and other web robots that they don’t wish to allow these robots to visit their sites, several participants discussed a “do not train tag” as a way for creators to opt out of being included in the training data. Adobe said it intended to train its new generative AI tool, Firefly, on openly licensed images and make it easy for artists to opt out with a “do not train” tag, apparently in response to these types of concerns. Yet some rightsholder groups pointed out that compliance with this tag may be uneven—indeed, robots.txt itself is a voluntary standard, and so-called bad robots like spam bots often ignore it. 

Works available under permissive licenses like Creative Commons’ various licenses have been suggested as good candidates for training data to avoid potential rights issues. Though several participants pointed out that there may be compliance issues when it comes to commercial uses of these tools, as well as attribution requirements. And the participant representing the American Society for Collective Rights Licensing voiced support for proposals to implement a collective licensing scheme to compensate artists whose works are used to train generative AI programs, echoing earlier suggestions by groups such as the Authors Guild. 

One visual artist argued fervently that an opt out standard was not enough: in her view, visual artists should have to opt in to having their works included in training data, as, in her view, an opt out system harms artists without much of an online presence or the digital literacy to affirmatively opt out. In general, the artist participants voiced strong opposition to having their works included without compensation, a position many creators with concerns about generative AI have taken. But Jasper AI expressed its view that training generative AI programs with visual works found across the Internet was a transformative use of that data, all but implying that this kind of training was fair use (a position Authors Alliance has taken). It was notable that so few participants suggested that the ingestion of visual works of art for the purposes of training generative AI programs was a fair use, particularly compared to the arguments in the listening session on text-based works. This may well be due to ongoing lawsuits, inherent differences between image based and text based outputs, or the general tenor of conversations around AI-generated visual art. Many of the participants spoke of anecdotal evidence that graphic artists are already facing job loss and economic hardship as a result of the emergence of AI-generated visual art.

‘Negotiating with the Dead’

Posted January 30, 2023

This is a guest post by Meera Nair, PhD, Copyright Specialist for the Northern Alberta Institute of Technology (NAIT), commenting on the recent extension of copyright term in Canada. It was originally published at https://fairduty.wordpress.com/2023/01/10/negotiating-with-the-dead/.

When it became evident that our copyright term was to be extended by twenty years, with no measures to mitigate the excess damage wrought by such action, Margaret Atwood’s book of this title kept returning to mind. A foray into the relationships that exist between writers and writing, a book where the word copyright did not feature among those ruminations, the title nonetheless feels apt for the days ahead.

Works of long-since-dead authors will now—in the best of situations—literally become objects of negotiation. This is purportedly to the benefit of those authors’ heirs, whereas on balance the true beneficiaries will be international publishing conglomerates and collective societies. In the worst of situations though, works will simply fade away with no surviving copy to emerge seventy years after their authors’ deaths. Those authors will be forgotten, and the public domain will remain poorer.

Atwood has been a prominent advocate for a stronger scope of protection in the name of copyright, famously remembered for her characterization of exceptions as expropriation and theft during a Standing Committee Meeting of the Department of Canadian Heritage in 1996. Two decades later, when she gave the 2016 CLC Kreisel Lecture at the University of Alberta, fair dealing was called out by name. Nonetheless, that lecture was a delight to listen to, grounded as it was on Atwood’s own experiences of being a Canadian writer.

It is her life that lies at the foundation of Negotiating, which took form through the Empson Lectures at the University of Cambridge in 2000. The combination of literature, literary criticism, book history, and history itself, written as only Margaret Atwood can, makes for compelling reading. In this book she comes perhaps closest to answering an age-old question about writing: what does it mean to write? There is no neat and tidy answer; at the very least it is blood, sweat, and tears amid negotiations between oneself, the society of the living, but also that of the dead.

To be sure, financial wherewithal is relevant to any impetus to write. Money appears approximately three times among the 74 reasons for writing taken “from the words of writers themselves (xx-xxii).” Yet, perhaps unintentionally, Atwood lays bare why copyright was not, nor ever will be, a broad determinant of success (either literary or material) for Canadian writers and publishers. From identifying the limitations of the Canadian publishing sector in the early to mid-twentieth century (to say there was disinterest in Canadian authors is putting it mildly), to stripping away the facades of originality and individuality (which underpin copyright’s structure of rights) in literary endeavor, there is much here to remind us that Canada’s phenomenal success in developing literary talent (see here and here) has occurred despite copyright, not because of it.

After borrowing the book repeatedly from the Edmonton Public Library, I had to buy it. Or rather, I had to buy it in the original form. Because what I had borrowed was a book titled On Writers and Writing, by Margaret Atwood, identified as a Canadian reprint of her earlier work, Negotiating with the Dead.

My preference was to buy Negotiating; in the peculiarities of my own mind, somehow it felt more authentic. As it turned out though, my instincts were correct. The two books are not the same. The difference lies, not in Atwood’s words, but in the representation of what copyright is. While both books specify the copyright as belonging to O.W. Toad (the name of Atwood’s enterprise), similarity ends there.

In Negotiating, published by The Press Syndicate of The University of Cambridge, readers are told: “This book is in copyright. Subject to statutory exceptions and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press (emphasis mine).”

There it is. A clear indication that statutory exceptions exist and are relevant; meaning that some reproduction might not require permission. Whereas in Writers, published by Emblem (an imprint of McClelland & Stewart, a division of Random House of Canada Limited, a Penguin Random House Company), readers are told that permission is always needed for even a particle copied:

“All rights reserved. The use of any part of this publication reproduced, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, or stored in a retrieval system, without the prior written consent of the publisher – or, in the case of photocopying or other reprographic copying, a license from the Canadian Copyright Licensing Agency – is an infringement of the copyright law (emphasis mine).”

Despite what a publisher might prefer, Canada’s Copyright Act permits unauthorized uses of insubstantial parts of a work and unauthorized uses of substantial parts which comport with fair dealing or other exceptions. As the Supreme Court (with unanimity) stated in 2004, “the fair dealing exception is perhaps more properly understood as an integral part of the Copyright Act than simply a defence. Any act falling within the fair dealing exception will not be an infringement of copyright (para 48).” And yet, willful misinformation is standard fare among books issued in Canada.

Given the stunting of our public domain by term extension, fair dealing is even more important now as it provides some allowance of use of older, protected, material. But even a large and liberal interpretation of fair dealing, as required by our Supreme Court, is no substitute for a vibrant public domain.

With the Act expected to undergo change this year, Canada could still introduce a system of registration associated to a longer term of copyright. Owners of works which continue to be commercially successful fifty years after an author’s death, will likely choose to register and thus receive the additional twenty years of protection. Whereas works that did not have such longevity with respect to commercialization, and works that were never intended for revenue generation, would likely not be registered and thus would enter the public domain without the twenty year delay. Such a system was recommended by a former Industry Committee to uphold our obligations under CUSMA, ensure that commercial works which may benefit by a longer term are able to capture that gain, and continue to grow the public domain.

The difficulty is to convey to current Canadian lawmakers the importance of the public domain. Too often, its intangibility has meant that the public domain is perceived as being of lesser value. That an author’s work is not protected somehow deems it and the author as being unworthy. Even the way older works are spoken of, that they have “fallen into the public domain,” carries an aura of degradation familiar to the plight of “fallen women.” Whereas the public domain is precisely the opposite; it enables new works to emerge. As Jessica Litman wrote in The Public Domain (1990):

To say that every new work is in some sense based on the works that preceded it is such a truism that it has long been a cliche, invoked but not examined. …  The public domain should be understood not as the realm of material undeserving of protection, but as a device that permits the rest of the system to work by leaving the raw material of authorship available for authors to use (966-968).

That this truism went unexamined and unarticulated is a testament to the difficulty of capturing the intricacy of the relationships between old works and new authors. Margaret Atwood not only undertook such an exploration but also elegantly articulated the journey that underlies every literary endeavor.

It is only fitting then that Margaret Atwood should have the last words:

… All writers must go from now to once upon a time; all must go from here to there; all must descend to where the stories are kept; all must take care not to be captured and held immobile by the past. And all must commit acts of larceny, or else of reclamation, depending how you look at it. The dead may guard the treasure, but it’s useless treasure unless it can be brought back into the land of the living and allowed to enter time once more – which means to enter the realm of audience, the realm of readers, the realm of change (p.178).

Authors Alliance Annual Report: 2022 In Review

Authors Alliance is pleased to share this year’s annual report, where you can find highlights of our work in 2022 to promote laws, policies, and practices that enable authors to reach wide audiences. In the report, you can read about how we’re helping authors meet their dissemination goals for their works, representing their interests in the courts, and otherwise working to advocate for authors who write to be read. 

Click here to view the report in your browser.