Author Archives: Dave Hansen

Some Initial Thoughts on the US Copyright Office Report on AI and Digital Replicas

Posted August 1, 2024

On July 31, 2024, the U.S. Copyright Office published Part 1 of its report summarizing the Office’s ongoing initiative of artificial intelligence. This first part of the report addresses digital replicas, in other words, how AI is used to realistically but falsely portray people in digital media. The Office in its report recommends new federal legislation that would create a new right to control “digital replicas” which it defines as  “a video, image, or audio recording that has been digitally created or manipulated to realistically but falsely depict an individual.”

We remain somewhat skeptical that such a right would do much to address the most troubling abuses such as deepfakes, revenge porn, and financial fraud. But, as the report points out, a growing number of varied state legislative efforts are already in the works, making a stronger case for unifying such rules at the federal level, with an opportunity to ensure adequate protections are in place for creators.  

The backdrop for the inquiry and report is a fast-developing space of state-led legislation, including legislation with regard to deepfakes. Earlier this year, Tennessee became the first state to enact such a law, the ELVIS Act (TN HB 2091), while other states mostly focused on addressing deepfakes in the context of sexual acts and political campaigns. New state laws are continuing to be introduced, making it harder and harder to navigate the space for creators, AI companies, and consumers alike. A federal right of publicity in the context of AI has already been discussed in Congress, and just yesterday a new bill was formally introduced, titled the “NO AI Fakes Act.” 

Authors Alliance has watched the development of this US Copyright Office initiative closely. In August 2023, the Office issued a notice of inquiry, asking stakeholders to weigh in on a series of questions about copyright policy and generative AI.  Our comment in response to the inquiry was devoted in large part to sharing the ways that authors are using generative AI, how fair use should apply to training AI, and that the USCO should be cautious in recommending new legislation to Congress

This report and recommendation from the Copyright Office could have a meaningful impact on authors and other creators, including both those whose personality and images are subject to use with AI systems, and those who are actively using AI in the writing and research. Below are our preliminary thoughts on what the Copyright Office recommends, which it summarizes in the report as follows: 

“We recommend that Congress establish a federal right that protects all individuals during their lifetimes from the knowing distribution of unauthorized digital replicas. The right should be licensable, subject to guardrails, but not assignable, with effective remedies including monetary damages and injunctive relief. Traditional rules of secondary liability should apply, but with an appropriately conditioned safe harbor for OSPs. The law should contain explicit First Amendment accommodations. Finally, in recognition of well-developed state rights of publicity, we recommend against full preemption of state laws.”

Initial Impressions

Overall, this seems like a well-researched and thoughtful report, given that the Office had to navigate a huge number of comments and opinions (over 10,000 comments were submitted). The report also incorporates the many more recent developments that included numerous new state laws and federal legislative proposals.  

Things we like: 

  • In the context of an increasing number of state legislative efforts—some overbroad and more likely than not to harm creators than help them—we appreciate the Office’s recognition that a patchwork of laws can pose a real problem for users and creators who are trying to understand their legal obligations when using AI that references and implicates real people.
  • The report also recognizes that the collection of concerns motivating digital replica laws—things like control of personality, privacy, fraud, and deception—are not at their core copyright concerns. “Copyright and digital replica rights serve different policy goals; they should not be conflated.” This matters a lot for what the scope of protection and other details for a digital replica right looks like. Copy-pasting copyright’s life+70 term of protection, for example, makes little sense (and the Office recognizes this, for example, by rejecting the idea of posthumous digital replica rights). 
  • The Office also suggests limiting the transferability of rights. We think this is a good idea to protect individuals from unanticipated downstream use by companies that may persuade individuals to sign deals that would lock them into unfavorable long-term deals. “Unlike publicity rights, privacy rights, almost without exception, are waivable or licensable, but cannot be assigned outright. Accordingly, we recommend a ban on outright assignments, and the inclusion of appropriate guardrails for licensing, such as limitations in duration and protection for minors.” 
  • The Office explicitly rejects the idea of a new digital replica right covering “artistic style.” We agree that protection of artistic style is a bad idea. Creators of all types have always used existing styles and methods as a baseline to build upon, and it’s resulted in a rich body of new works. Allowing for control over “style” however well-defined, would impinge on these new creations. Strong federal protection over “style” would also contradict traditional limitations on rights, such as Section 102(b)’s limits on copyrightable subject matter and the idea/expression dichotomy, which are rooted in the Constitution. 

Some concerns: 

  • The Office’s proposal would apply to the distribution of digital replicas, which are defined as “a video, image, or audio recording that has been digitally created or manipulated to realistically but falsely depict an individual.” This definition is quite broad and could potentially include a large number of relatively common and mostly innocuous uses—e.g., taking a photo with your phone of a person and applying a standard filter on your camera app could conceivably fall within the definition. 
  • First Amendment rights to free expression are critical for protecting uses for news reporting, artistic uses, parody and so on. Expressive uses of digital replicas—e.g., a documentary that uses AI to replicate a recent event involving recognizable people, or reproduction in a comedy show to to poke fun at politicians—could be significantly hindered by an expansive digital replica right unless it has robust free expression protections. Of course, the First Amendment applies regardless of the passing of a new law, but it will be important for any proposed legislation to find ways to allow people to exercise those rights effectively. As the report explains, comments were split. Some like the Motion Picture Association proposed enumerated exceptions for expressive use, while others such as the Recording Industry Association of America took the position that “categorical exclusions for certain speech-oriented uses are not constitutionally required and, in fact, risk overprotection of speech interests at the expense of important publicity interests.” 

We tend to think that most laws should skew toward “overprotection of speech interests,” but the devil is in the details on how to do so. The report leaves much to be desired on how to do this effectively in the context of digital replicas. For its part, “[t]he Office stresses the importance of explicitly addressing First Amendment concerns. While acknowledging the benefits of predictability, we believe that in light of the unique and evolving nature of the threat to an individual’s identity and reputation, a balancing framework is preferable.” One thing to watch in future proposals is what such a balancing framework actually includes, and how easy or difficult it is to assert protection of First Amendment rights under this balancing framework. 

  • The Office rejects the idea that Section 230 should provide protection for online service providers if they host content that runs afoul of the proposed new digital replica rights. Instead, the Office suggests something like a modified version of the Copyright Act’s DMCA section 512 notice and takedown process. This isn’t entirely outlandish—the DMCA process mostly works, and if this new proposed digital replica right is to be effective in practice, asking large service providers that are benefiting from hosting content to be responsive in cases of alleged infringing content may make sense. But, the Office says that it doesn’t believe the existing DMCA process should be the model, and points to its own Section 512 report for how a revised version for digital replicas might work. If the Office’s 512 study is a guide to what a notice-and-takedown system could look like for digital replicas, there is reason to be concerned.  While the study rejected some of the worst ideas for changing the existing system (e.g., a notice-and-staydown regime), it also repeatedly diminished the importance of ideas that would help protect creators with real First Amendment and fair use interests. 
  • The motivations for the proposed digital replica right are quite varied. For some commenters, it’s an objection to the commercial exploitation of public figures’ images or voices. For others, the need is to protect against invasions of privacy. For yet others, it is to prevent consumer confusion and fraud. The Office acknowledges these different motivating factors in its report and in its recommendations attempts to balance competing interests among them. But, there are still real areas of discontinuity—e.g., the basic structure of the right the Office proposes is intellectual-property-like. But it doesn’t really make a lot of sense to try to address some of the most pernicious fraudulent uses, such as deepfakes to manipulate public opinion, revenge porn, or scam phone calls, with a privately enforced property right oriented toward commercialization. Discovering and stopping those uses requires a very different approach and one that this particular proposal seems ill-equipped to deal with. 

Barely a few months ago, we were extremely skeptical that new federal legislation on digital replicas was a good idea. We’re still not entirely convinced, but the rash of new and proposed state laws does give us some pause. While the federal legislative process is fraught, it is also far from ideal for authors and creators to operate under a patchwork of varying state laws, especially those that provide little protection for expressive uses. Overall, we hope certain aspects of this report can positively influence the debate about existing federal proposals in Congress, but remain concerned about the lack of detail about protections for First Amendment rights. 

In the meantime, you can check out our two new resource pages on Generative AI and Personality Rights to get a better understanding of the issues.

What happens when your publisher licenses your work for AI training? 

Posted July 30, 2024
Photo by Andrew Neel on Unsplash

Over the last year, we’ve seen a number of major deals inked between companies like OpenAI and news publishers. In July 2023, OpenAI entered into a two-year deal with The Associated Press for ChatGPT to ingest the publisher’s news stories. In December 2023, Open AI announced its first non-US partnership to train ChatGPT on German publisher Axel Springer’s content, including Business Insider. This was then followed by a similar deal in March 2024 with Le Monde and Prisa Media, news publishers from France and Spain. These partnerships are likely sought in an effort to avoid litigation like the case OpenAI and Microsoft are currently defending from the New York Times.

As it turns out, such deals are not limited to OpenAI or newsrooms. Book publishers have also gotten into the mix. Numerous reports recently pointed out that based on Taylor and Francis’s parent company’s market update, the British academic publishing giant has agreed to a $10 million USD AI training deal with Microsoft. Earlier this year, another major academic publisher, John Wiley and Sons, recorded $23 million in one-time revenue from a similar deal with a non-disclosed tech company. Meta even considered buying Simon & Schuster or paying $10 per book to acquire its rights portfolio for AI training. 

With few exceptions (a notable one being Cambridge University Press), publishers have not bothered to ask their authors whether they approve of these agreements. 

Does AI training require licensing to begin with? 

First, it’s worth appreciating that these deals are made in the backdrop of some legal uncertainty. There are more than two dozen AI copyright lawsuits just in the United States, most of them turning on one key question: whether AI developers should have to obtain permission to scrap content to train AI models or whether fair use already allows this kind of training use even without permission. 

The arguments for and against fair use for AI training data are well explained elsewhere. We think there are strong arguments, based on cases like Authors Guild v. Google, Authors Guild v. HathiTrust, and AV ex rel Vanderhye v. iParadigms, that the courts will conclude that copying to training AI models is fair use. We also think there are really good policy reasons to think this could be a good outcome if we want to encourage future AI development that isn’t dominated by only the biggest tech giants and that results in systems that produce less biased outputs. But we won’t know for sure whether fair use covers any and all AI training until some of these cases are resolved. 

Even if you are firmly convinced that fair use protects this kind of use (and AI model developers have strong incentives to hold this belief), there are lots of other reasons why AI developers might seek licenses in order to navigate around the issue. This includes very practical reasons, like securing access to content in formats that make training easier, or content accompanied by structured, enhanced metadata. Given the pending litigation, licenses are also a good way to avoid costly litigation (copyright lawsuits are expensive, even if you win). 

Although one can hardly blame these tech companies for making a smart business decision to avoid potential litigation, this could have a larger systematic impact on other players in the field, including the academic researchers who would like to rely on fair use to train AI. As IP scholar James Gibson explains, when risk-averse users create new licensing markets in gray areas of copyright law, copyright holders’ exclusive rights expands, and public interest diminishes. The less we rely on fair use, the weaker it becomes.

Finally, it’s worth noting that fair use is only available in the US and a few other jurisdictions. In other jurisdictions, such as within the EU, using copyrighted materials for AI training (especially for commercial purposes) may require a license. 

To sum up: even though it may not be legally necessary to acquire copyright licenses for AI training, it seems that licensing deals between publishers and AI companies are highly likely to continue. 

So, can publishers just do this without asking authors? 

In a lot of cases, yes, publishers can license AI training rights without asking authors first. Many publishing contracts include a full and broad grant of rights–sometimes even a full transfer of copyright to the publisher for them to exploit those rights and to license the rights to third parties. For example, this typical academic publishing agreement provides that “The undersigned authors transfer all copyright ownership in and relating to the Work, in all forms and media, to the Proprietor in the event that the Work is published.” In such cases, when the publisher becomes the de facto copyright holder of a work, it’s difficult for authors to stake a copyright claim when their works are being sold to train AI.

Not all publishing contracts are so broad, however. For example, in the Model Publishing Contract for Digital Scholarship (which we have endorsed), the publisher’s sublicensing rights are limited and specifically defined, and profits resulting from any exploitation of a work must be shared with authors.  

There are lots of variations, and specific terms matter. Some publisher agreements are far more limited–transferring only limited publishing and subsidiary rights. These limitations in the past have prompted litigation over whether the publisher or the author gets to control rights for new technological uses. Results have been highly dependent on the specific contract language used. 

There are also instances where publishers aren’t even sure of what they own. For example, in the drawn-out copyright lawsuit brought by Cambridge University Press, Oxford University Press and Sage against Georgia State University, the court dropped 16 of the alleged 74 claimed instances of infringement because the publishers couldn’t produce documentation that they actually owned rights in the works they were suing over. This same lack of clarity contributed to the litigation and proposed settlement in the Google Books case, which is probably our closest analogy in terms of mass digitization and reuse of books (for a good discussion of these issues, see page 479 of this law review article by Pamela Samuelson about the Google Books settlement). 

This is further complicated by the fact that authors sometimes are entitled to reclaim their rights, such as by rights reversion clause and copyright termination. Just because a publisher can produce the documentation of a copyright assignment, does not necessarily mean that the publisher is still the current copyright holder of a work. 

We think it is certainly reasonable to be skeptical about the validity of blanket licensing schemes between large corporate rights holders and AI companies, at least when they are done at very large scale. Even though in some instances publishers do hold rights to license AI training, it is dubious whether they actually hold, and sufficiently document, all of the purported rights of all works being licensed for AI training.

Can authors at least insist on a cut of the profit? 

It can feel pretty bad to discover that massively profitable publishers are raking in yet more money by selling licensing rights to your work, while you’re cut out of the picture. If they’re making money, why not the author? 

It’s worth pointing out that, at least for academic authors, this isn’t exactly a novel situation–most academic authors make very little in royalties on their books, and nothing at all on their articles, while commercial publishers like Elsevier, Wiley, and SpringerNature sell subscription access at a healthy profit.  Unless you have retained sublicensing rights, or your publishing contract has a profit-sharing clause, authors, unfortunately, are not likely to profit from the budding licensing market for AI training.

So what are authors to do? 

We could probably start most posts like this with a big red banner that says “READ YOUR PUBLISHING CONTRACT!! (and negotiate it too)”  Be on the lookout for what you are authorizing your publisher to do with your rights, and any language in it about reuse or the sublicensing of subsidiary rights. 

You might also want to look for terms in your contract that speak to royalties and shares of licensing revenue. Some contracts have language that will allow you to demand an accounting of royalties; this may be an effective means of learning more about licensing deals associated with your work. 

You can also take a closer look at clauses that allow you to revert rights–many contracts will include a clause under which authors can regain rights when their book falls below a certain sales threshold or otherwise becomes “out of print.” Even without such clauses, it is reasonable for authors to negotiate a reversion of rights when their books are no longer generating revenue. Our resources on reversion will give you a more in-depth look at this issue.

Finally, you can voice your support for fair use in the context of licensing copyrighted works for AI training. We think fair use is especially important to preserve for non-commercial uses. For example, academic uses could be substantially stifled if paid-for licensing for permission to engage in AI research or related uses becomes the norm. And in those cases, the windfall publishers hope to pocket isn’t coming from some tech giant, but ultimately is at the expense of researchers, their libraries and universities, and the public funding that goes to support them.

Introducing Yuanxiao Xu, Authors Alliance’s New Staff Attorney

Posted July 23, 2024
Yuanxiao Xu, Authors Alliance Staff Attorney

By Dave Hansen

Today I’m pleased to introduce to the Authors Alliance community Yuanxiao Xu, who will be taking on the role of Authors Alliance’s Staff Attorney.  

Over the past few years, Authors Alliance has been more active than ever before advocating for the interests of authors before courts and administrative agencies. Our involvement has ranged from advocacy in high-profile cases such as Warhol Foundation v. Goldsmith and Hachette Books v. Internet Archive to less visible but important regulatory filings. For example, last December we filed a comment explaining the importance of federal agencies having the legal tools to promote open access to scholarly research outputs funded through grants.  On top of those advocacy efforts, we remain committed to helping authors navigate the law through our legal guides and other educational resources. The most recent significant addition is our practical guide titled Writing about Real People.

Our advocacy and educational work requires substantial legal, copyright, and publishing expertise. We’re therefore very fortunate to welcome Yuanxiao to the team to support our efforts. Yuanxiao joins us from previous legal roles with Creative Commons, the Dramatists Guild of America, and the University of Michigan Libraries. She has substantial experience advising academic authors and other creators on issues related to plagiarism, copyright infringement, fair use, licensing, and music copyright. She received her JD from the University of Michigan and is licensed to practice law in the State of New York.

As we grapple with difficult issues such as AI and authorship, ongoing publishing industry consolidation, and attacks on the integrity of institutions like libraries, I’m very excited to work with Yuanxiao to further develop and implement our legal strategy in a way that supports authors who care deeply about the public interest. 

I’m thrilled to join Authors Alliance to collaborate with our community and sister organizations and together advocate for a better copyright ecosystem for authors and creatives. I hope to strive for a future where the interests of creators and the public do not take a back seat to the profit-maximizing agenda of big entertainment companies and tech giants,” says Yuanxiao. 

We’re always pleased to hear from our members about ways that we might be able to help support their efforts to reach readers and have their voice heard. If you’d like to get in touch with Yuanxiao directly, you can reach her at xu@authorsalliance.org

Hachette v. Internet Archive Update: Oral Argument Before the Second Circuit Court of Appeals

This is a short update on the Hachette v. Internet Archive controlled digital lending lawsuit, which is currently pending on appeal before the Second Circuit Court of Appeals. The court held oral argument in the case today. [July 2 update: a recording of the hearing is available here.]

We’ve covered the background of this suit numerous times – it is in essence about whether it is permissible for libraries to digitize and lend books in their collections in a lend-like-print manner (e.g., only providing access to one user at a time based on the number of copies the library owns in print). 

At this point, both parties have fully briefed the court on their legal arguments, bolstered on both sides by numerous amicus briefs explaining the broader implications of the case for authors, publishers, libraries, and readers (you can find the full docket, including these briefs online here). 

Our amicus brief, which received a nice shout-out from Internet Archive’s counsel in oral argument today, was filed in support of the Internet Archive and controlled digital lending, argues that many authors benefit from CDL because it enhances access to their work, aids in preservation, and supports their efforts to research and build upon existing works to create new ones. 

What happened at oral argument

Compared to the District Court proceedings, this oral argument went much better for Internet Archive. Whether Internet Archive will prevail is another question, but it did seem to me the panel was genuinely trying to understand the basic rationale for CDL, whether there is a credible argument for distinguishing between CDL copies and licensed ebooks, and what kind of burden the plaintiff or defendant should bear in proving or disproving market harm. Overall, I felt the panel gave both sides a fair hearing and is interested in the broader implications of this case. 

A few highlights: 

  • It almost seemed that the panel assumed the district court got it wrong when it concluded that Internet Archive’s use was commercial in nature, rather than nonprofit (an important distinction in fair use cases). The district court adopted a novel approach, finding that IA’s connection with Better World Books and its solicitation of donations on webpages that employ CDL pushed it into the “commercial” category. The panel on appeal seemed skeptical, for example, commenting on how meager the $5000 was that Internet Archive actually made on the arrangement. Looking beyond controlled digital lending, this is an important issue for all nonprofit users, and I’m hopeful that the Second Circuit sees the importance of correcting the lower court on this point. 
  • At least some members of the panel seemed to appreciate the incongruity of a first sale doctrine that applies only to physical books but somehow not to digital lending. One particularly good question on this, directed to the publishers’ counsel, was about whether in the absence of section 109, library physical lending would be permissible as a fair use or otherwise. This was helpful, I think, because it stripped away the focus on the text of 109 and refocused the discussion on the underlying principles of exhaustion–i.e., what rights do libraries and other owners of copies get when they buy copies. 

There were also a few concerning exchanges: 

  • At one point, there was a line of questioning about whether fair use could override or provide for a broader scope of uses than what Congress had provided to libraries in Section 108 (the part of the copyright act that has very specific exceptions for things like libraries making preservation copies). Even the publishers’ lawyer wasn’t willing to argue that libraries’ rights are fully covered by Section 108 of the Copyright Act and that fair use didn’t apply–likely because of course that issue was addressed directly in Authors Guild HathiTrust, and she knew it–but it was a concerning exchange nonetheless.

I also came away with several questions:

  • Each member of the panel asked probing questions to both sides about the importance of market harm and, more specifically, what kind of proof is required to demonstrate market harm to the publishers. It was hard to tell which direction any were leaning on this–while there was some acknowledgment that there wasn’t really any hard evidence about the market effect, members of the panel also made several remarks about the logic of CDL copies replacing ebook sales as being common sense. 
  • The panel asked a number of questions about the role of fair use in responding to law new technology. Should fair use be employed to help smooth over bumps caused by new technology, or should courts be more conservative in its application in cases where Congress has chosen not to act?  Despite several questions about this issue, I came away with no clear read on what the panel thought might be the correct framework in a case like this.

It’s folly to predict, but I came away optimistic that the panel will correct many of the errors from the District Court below. 

Introducing the Authors Alliance’s First Zine: Can Authors Address AI Bias?

Posted May 31, 2024

This guest post was jointly authored by Mariah Johnson and Marcus Liou, student attorneys in Georgetown’s Intellectual Property and Information Policy (iPIP) Clinic.

Generative AI (GenAI) systems perpetuate biases, and authors can have a potent role in mitigating such biases.

But GenAI is generating controversy among authors. Can authors do anything to ensure that these systems promote progress rather than prevent it? Authors Alliance believes the answer is yes, and we worked with them to launch a new zine, Putting the AI in Fair Use: Authors’ Abilities to Promote Progress, that demonstrates how authors can share their works broadly to shape better AI systems. Drawing together Authors Alliance’s past blog posts and advocacy discussing GenAI, copyright law, and authors, this zine emphasizes how authors can help prevent AI bias and protect “the widest possible access to information of all kinds.” 

As former Copyright Register Barbara Ringer articulated, protecting that access requires striking a balance with “induc[cing] authors and artists to create and disseminate original works, and to reward them for their contributions to society.” The fair use doctrine is often invoked to do that work. Fair use is a multi-factor standard that allows limited use of copyrighted material—even without authors’ credit, consent, or compensation–that asks courts to examine:

(1) the purpose and character of the use, 

(2) the nature of the copyrighted work, 

(3) the amount or substantiality of the portion used, and 

(4) the effect of the use on the potential market for or value of the work. 

While courts have not decided whether using copyrighted works as training data for GenAI is fair use, past fair use decisions involving algorithms, such as Perfect 10, iParadigms, Google Books, and HathiTrust favored the consentless use of other people’s copyrighted works to create novel computational systems. In those cases, judges repeatedly found that algorithmic technologies aligned with the Constitutional justification for copyright law: promoting progress.

But some GenAI outputs prevent progress by projecting biases. GenAI outputs are biased in part because they use biased, low friction data (BLFD) as training data, like content scraped from the public internet. Examples of BLFD include Creative Commons (CC) licensed works, like Wikipedia, and works in the public domain. While Wikipedia is used as training data in most AI systems, its articles are overwhelmingly written by men–and that bias is reflected in shorter and fewer articles about women. And because the public domain cuts off in the mid-1920s, those works often reflect the harmful gender and racial biases of that time. However, if authors allow their copyrighted works to be used as GenAI training data, those authors can help mitigate some of the biases embedded in BLFD. 

Current biases in GenAI are disturbing. As we discuss in our zine, word2vec is a very popular toolkit used to help machine learning (ML) models recognize relationships between words–like women as homemakers and Black men with the word “assaulted.” Similarly, OpenAI’s GenAI chatbox ChatGPT, when asked to generate letters of recommendation, used “expert,” “reputable,” and “authentic” to describe men and  “beauty,” “stunning,” and “emotional” for women, discounting women’s competency and reinforcing harmful stereotypes about working women. An intersectional perspective can help authors see the compounding impact of these harms. What began as a legal framework to describe why discrimination law did not adequately address harms facing Black women, it is now used as a wider lens to consider how marginalization affects all people with multiple identities. Coined by Professor Kimberlé Crenshaw in the late 1980s, intersectionality uses critical theory like Critical Race Theory, feminism, and working-class studies together as “a lens . . . for seeing the way in which various forms of inequality often operate together and exacerbate each other.” Contemporary authors’ copyrighted works often reflect the richness of intersectional perspectives, and using those works as training data can help mitigate GenAI bias against marginalized people by introducing diverse narratives and inclusive language. Not always–even recent works reflect bias–but more often than might be possible currently.

Which brings us back to fair use. Some corporations may rely on the doctrine to include more works by or about marginalized people in an attempt to mitigate GenAI bias. Professor Mark Lemley and Bryan Casey have suggested “[t]he solution [to facial recognition bias] is to build bigger databases overall or to ‘oversample’ members of smaller groups” because “simply restricting access to more data is not a viable solution.” Similarly, Professor Matthew Sag notes that “[r]estricting the training data for LLMs to public domain and open license material would tend to encode the perspectives, interests, and biases of a distinctly unrepresentative set of authors.” However, many marginalized people may wish to be excluded from these databases rather than have their works or stories become grist for the mill. As Dr. Anna Lauren Hoffman warns, “[I]nclusion reinforces the structural sources of violence it supposedly addresses.”

Legally, if not ethically, fair use may moot the point. The doctrine is flexible, fact-dependent, and fraught. It’s also fairly predictable, which is why legal precedent and empirical work have led many legal scholars to believe that using copyrighted works as training data to debias AI will be fair use–even if that has some public harms. Back in 2017, Professor Ben Sobel concluded that “[i]f engineers made unauthorized use of copyrighted data for the sole purpose of debiasing an expressive program, . . . fair use would excuse it.” Professor Amanda Levendowski has explained why and how “[f]air use can, quite literally, promote creation of fairer AI systems.” More recently, Dr. Mehtab Khan and Dr. Alex Hanna  observed that “[a]ccessing copyright work may also be necessary for the purpose of auditing, testing, and mitigating bias in datasets . . . [and] it may be useful to rely on the flexibility of fair use, and support access for researchers and auditors.” 

No matter how you feel about it, fair use is not the end of the story. It is ill-equipped to solve the troubling growth of AI-powered deepfakes. After being targeted by sexualized deepfakes, Rep. Ocasio-Cortez described “[d]eepfakes [as] absolutely a way of digitizing violent humiliation against other people.” Fair use will not solve the intersectional harms of AI-powered face surveillance either. Dr. Joy Buolamwini and Dr. Timnit Gebru evaluated leading gender classifiers used to train face surveillance technologies and discovered that they more accurately classified males over females and lighter-skinned over darker-skinned people. The researchers also discovered that the “classifiers performed worst on darker female subjects.” While legal scholars like Professors Shyamkrishna Balganesh, Margaret Chon, and Cathay Smith argue that copyright law can protect privacy interests, like the ones threatened by deepfakes or face surveillance, federal privacy laws are a more permanent, comprehensive way to address these problems.

But who has time to wait on courts and Congress? Right now, authors can take proactive steps to ensure that their works promote progress rather than prevent it. Check out the Authors Alliance’s guides to Contract Negotiations, Open Access, Rights Reversion, and Termination of Transfer to learn how–or explore our new zine, Putting the AI in Fair Use: Authors’ Abilities to Promote Progress.

You can find a PDF of the Zine here, as well as printer-ready copies here and here.

Book Talk: Attack from Within by Barbara McQuade

This event is canceled due to a scheduling issue. We will repost when it is rescheduled.

Join us for a VIRTUAL book talk with legal scholar BARBARA McQUADE on her New York Times bestseller, ATTACK FROM WITHIN, about disinformation’s impact on democracy. NYU professor and author CHARLTON McILWAIN will facilitate our discussion.

REGISTER NOW

“A comprehensive guide to the dynamics of disinformation and a necessary call to the ethical commitment to truth that all democracies require.”

Timothy Snyder, author of the New York Times bestseller On Tyranny

American society is more polarized than ever before. We are strategically being pushed apart by disinformation—the deliberate spreading of lies disguised as truth—and it comes at us from all sides: opportunists on the far right, Russian misinformed social media influencers, among others. It’s endangering our democracy and causing havoc in our electoral system, schools, hospitals, workplaces, and in our Capitol. Advances in technology including rapid developments in artificial intelligence threaten to make the problems even worse by amplifying false claims and manufacturing credibility.

In Attack from Within, legal scholar and analyst Barbara McQuade, shows us how to identify the ways disinformation is seeping into all facets of our society and how we can fight against it. The book includes:

  • The authoritarian playbook: a brief history of disinformation from Mussolini and Hitler to Bolsonaro and Trump, chronicles the ways in which authoritarians have used disinformation to seize and retain power.
  • Disinformation tactics—like demonizing the other, seducing with nostalgia, silencing critics, muzzling the media, condemning the courts; stoking violence—and why they work.
  • An explanation of why America is particularly vulnerable to disinformation and how it exploits our First Amendment Freedoms, sparks threats and violence, and destabilizes social structures.
  • Real, accessible solutions for countering disinformation and maintaining the rule of law such as making domestic terrorism a federal crime, increasing media literacy in schools, criminalizing doxxing, and much more.

Disinformation is designed to evoke a strong emotional response to push us toward more extreme views, unable to find common ground with others. The false claims that led to the breathtaking attack on our Capitol in 2021 may have been only a dress rehearsal. Attack from Within shows us how to prevent it from happening again, thus preserving our country’s hard-won democracy.

ABOUT OUR SPEAKERS

BARBARA McQUADE is a professor at the University of Michigan Law School, where she teaches criminal law and national security law. She is also a legal analyst for NBC News and MSNBC. From 2010 to 2017, McQuade served as the U.S Attorney for the Eastern District of Michigan. She was appointed by President Barack Obama, and was the first woman to serve in her position. McQuade also served as vice chair of the Attorney General’s Advisory Committee and co-chaired its Terrorism and National Security Subcommittee.

Before her appointment as U.S. Attorney, McQuade served as an Assistant U.S. Attorney in Detroit for 12 years, including service as Deputy Chief of the National Security Unit. In that role, she prosecuted cases involving terrorism financing, foreign agents, threats, and export violations. McQuade serves on a number of non-profit boards, and served on the Biden-Harris Transition Team in 2020-2021. She has been recognized by The Detroit News with the Michiganian of the Year Award, the Detroit Free Press with the Neal Shine Award for Exemplary Regional Leadership, Crain’s Detroit Business as a Newsmaker of the Year and one of Detroit’s Most Influential Women, and the Detroit Branch NAACP and Arab American Civil Rights League with their Tribute to Justice Award. McQuade is a graduate of the University of Michigan and its law school. She and her husband live in Ann Arbor, Michigan, and have four children.s an assistant professor of English at Emory University with a courtesy appointment in quantitative theory and methods. He is the author of American Literature and the Long Downturn: Neoliberal Apocalypse (2020). His writing has appeared in the New York Times, the Washington Post, the Los Angeles Review of BooksThe RumpusDissent, and other publications.

CHARLTON McILWAIN
Author of the recent book, Black Software: The Internet & Racial Justice, From the Afronet to Black Lives Matter, Dr. Charlton McIlwain is Vice Provost for Faculty Development, Pathways & Public Interest Technology at New York University, where he is also Professor of Media, Culture, and Communication at NYU Steinhardt. He works at the intersections of computing technology, race, inequality, and racial justice activism. He has served as an expert witness in landmark U.S. Federal Court cases on reverse redlining/racial targeting in mortgage lending and recently testified before the U.S. House Committee on Financial Services about the impacts of automation and artificial intelligence on the financial services sector. He is the author of the recent PolicyLink report Algorithmic Discrimination: A Framework and Approach to Auditing & Measuring the Impact of Race-Targeted Digital Advertising. He writes regularly for outlets such as The Guardian, Slate’s Future Tense, MIT Technology Review and other outlets about the intersection of race and technology. McIlwain is the founder of the Center for Critical Race & Digital Studies, and is Board President at Data & Society Research Institute. He leads NYU’s Alliance for Public Interest Technology, is NYU’s Designee to the Public Interest Technology University Network, and serves on the executive committee as co-chair of the ethics panel for the International Panel on the Information Environment.

Book Talk: Attack from Within by Barbara McQuade
Thursday, June 6 @ 10am PT / 1pm ET
Register now for the virtual event!

Books are Big AI’s Achilles Heel

Posted May 13, 2024

By Dave Hansen and Dan Cohen

Image of the Rijksmuseum by Michael D Beckwith. Image dedicated to the Public Domain.

Rapidly advancing artificial intelligence is remaking how we work and live, a revolution that will affect us all. While AI’s impact continues to expand, the operation and benefits of the technology are increasingly concentrated in a small number of gigantic corporations, including OpenAI, Google, Meta, Amazon, and Microsoft.

Challenging this emerging AI oligopoly seems daunting. The latest AI models now cost billions of dollars, beyond the budgets of startups and even elite research universities, which have often generated the new ideas and innovations that advance the state of the art.

But universities have a secret weapon that might level the AI playing field: their libraries. Computing power may be one important part of AI, but the other key ingredient is training data. Immense scale is essential for this data—but so is its quality.

Given their voracious appetite for text to feed their large language models, leading AI companies have taken all the words they can find, including from online forums, YouTube subtitles, and Google Docs. This is not exactly “the best that has been thought and said,” to use Matthew Arnold’s pointed phrase. In Big AI’s haphazard quest for quantity, quality has taken a back seat. The frequency of “hallucinations”—inaccuracies currently endemic to AI outputs—are cause for even greater concern.

The obvious way to rectify this lack of quality and tenuous relationship to the truth is by ingesting books. Since the advent of the printing press, authors have published well over 100 million books. These volumes, preserved for generations on the shelves of libraries, are perhaps the most sophisticated reflection of human thinking from the beginning of recorded history, holding within them some of our greatest (and worst) ideas. On average, they have exceptional editorial quality compared to other texts, capture a breadth and diversity of content, a vivid mix of styles, and use long-form narrative to communicate nuanced arguments and concepts.

The major AI vendors have sought to tap into this wellspring of human intelligence to power the artificial, although often through questionable methods. Some companies have turned to an infamous set of thousands of books, apparently retrieved from pirate websites without permission, called “Books3.” They have also sought licenses directly from publishers, using their massive budgets to buy what they cannot scavenge. Meta even considered purchasing one of the largest publishers in the world, Simon & Schuster.

As the bedrock of our shared culture, and as the possible foundation for better artificial intelligence, books are too important to flow through these compromised or expensive channels. What if there were a library-managed collection made available to a wide array of AI researchers, including at colleges and universities, nonprofit research institutions, and small companies as well as large ones?

Such vast collections of digitized books exist right now. Google, by pouring millions of dollars into its long-running book scanning project, has access to over 40 million books, a valuable asset they undoubtedly would like to keep exclusive. Fortunately, those digitized books are also held by Google’s partner libraries. Research libraries and other nonprofits have additional stockpiles of digitized books from their own scanning operations, derived from books in their own collections. Together, they represent a formidable aggregation of texts.

A library-led training data set of books would diversify and strengthen the development of AI. Digitized research libraries are more than large enough, and of substantially higher quality, to offer a compelling alternative to existing scattershot data sets. These institutions and initiatives have already worked through many of the most challenging copyright issues, at least for how fair use applies to nonprofit research uses such as computational analysis. Whether fair use also applies to commercial AI, or models built from iffy sources like Books3, remains to be seen.

Library-held digital texts come from lawfully acquired books—an investment of billions of dollars, it should be noted, just like those big data centers—and libraries are innately respectful of the interests of authors and rightsholders by accounting for concerns about consent, credit, and compensation. Furthermore, they have a public-interest disposition that can take into account the particular social and ethical challenges of AI development. A library consortium could distinguish between the different needs and responsibilities of academic researchers, small market entrants, and large commercial actors. 

If we don’t look to libraries to guide the training of AI on the profound content of books, we will see a reinforcement of the same oligopolies that rule today’s tech sector. Only the largest, most well-resourced companies will acquire these valuable texts, driving further concentration in the industry. Others will be prevented from creating imaginative new forms of AI based on the best that has been thought and said. As they have always done, by democratizing access libraries can support learning and research for all, ensuring that AI becomes the product of the many rather than the few.

Further reading on this topic: “Towards a Books Data Commons for AI Training,” by Paul Keller, Betsy Masiello, Derek Slater, and Alek Tarkowski.

This week, Authors Alliance celebrates its 10th anniversary with an event in San Francisco on May 17 (We still have space! Register for free here) titled “Authorship in an Age of Monopoly and Moral Panics,” where we will highlight obstacles and opportunities of new technology. This piece is part of a series leading up to the event.

Authors Alliance Submits Amicus Brief in Tiger King Fair Use Case

Posted May 6, 2024

By Dave Hansen

Have you ever used a photograph to illustrate a historical event in your writing? Or quoted, say from a letter, to point out some fact that the author conveyed in their writing? According to the 10th Circuit, these aren’t the kinds of uses that fair use supports. 

On Thursday, Authors Alliance joined with EFF, the Association of Research Libraries, the American Library Association, and Public Knowlege in filing an amicus brief asking the 10th Circuit Court of Appeals to reconsider its recent fair use decision in Whyte Monkee v. Netflix. 

The case is about Netflix’s use of a funeral video recording in its documentary series Tiger King, a true crime documentary about Joseph Maldanado, aka Joe Exotic, an excentric zookeeper, media personality, exotic animal owner, and convicted felon. The recording at issue was created by Timothy Sepi/Whyte Monkee, as a memorial for Travis Maldonado, Joe Exotic’s late husband. Netflix used about 60 seconds of the funeral video in its show. Its purpose was, among other things, to “illustrate Mr. Exotic’s purported megalomania, even in the face of tragedy.” 

A three-judge panel of the 10th Circuit issued its opinion in late March, concluding that Netflix’s use was not “transformative” under the first fair use factor and therefore disfavored as a fair use. The panel relied heavily on the Supreme Court’s recent decision in Andy Warhol v. Goldsmith, taking that case to mean that uses that do not comment or criticize the artistic and creative aspects of the underlying work are generally disfavored. So, the court concluded: 

Defendants’ use of the Funeral Video is not transformative under the first fair use factor. Here, Defendants did not comment on or “target” Mr. Sepi’s work at all; instead, Defendants used the Funeral Video to comment on Joe Exotic. More specifically, Defendants used the Funeral Video to illustrate Mr. Exotic’s purported megalomania, even in the face of tragedy. By doing so, Defendants were providing a historical reference point in Mr. Exotic’s life and commenting on Mr. Exotic’s showmanship. However, Defendants’ use did not comment on Mr. Sepi’s video—i.e., its creative decisions or its intended meaning.

You can probably see the problem. Fair use has, for a very long time, supported a wide variety of other uses that incorporate existing works as historical reference points and illustrations. Although the Supreme Court talked a lot about criticism and comment in its Warhol opinion (which made sense, given that the use before it was a purported artistic commentary), I think very few people interpreted that decision to mean that only commentary and criticism are permissible transformative fair uses. But as our brief points out, the panel’s decision essentially converts the Supreme Court’s decision in Warhol from a nuanced reaffirmation of fair use precedent into a radical rewrite of the law that only supports those kinds of uses. 

Our brief argues that the 10th Circuit misread the Supreme Court’s opinion in Warhol, and that it ignored decades of fair use case law. We point to a few good examples – e.g., Time v. Bernard Geis (a 1968 case finding fair use of a recreation of the famous Zapruder film in a book titled “Six Seconds in Dallas,” analyzing President Kennedy’s assassination), New Era Publications v. Carol Publishing (a 1990 case supporting reuse of lengthy quotations of L Ron Hubbard in a book about him, to make a point about Hubbard’s “hypocrisy and pomposity”) and Bill Graham Archives v. Dorling Kindersley (a 2006 case finding fair use of Grateful Dead concert posters in a book using them as historical reference points). 

Our brief also highlights how communities of practice such as documentary filmmakers, journalists, and nonfiction writers have come to rely on fair use to support these types of uses–so much so that these practices are codified in best practices here, here, and even here Authors Alliance’s own Fair Use for Nonfiction Authors guide. 

Although it is rare for appellate courts to grant rehearing of already issued opinions, this opinion has drawn quite a lot of negative attention. In addition to our amicus brief, there were amicus briefs filed in support of rehearing from: 

Given the broad and negative reach of this decision, I hope the 10th Circuit will pay attention and grant the request. 

Book Talk – Unlocking the Digital Age: The Musician’s Guide to Research, Copyright & Publishing

Posted March 27, 2024

Join us for a book talk with ANDREA I. COPLAND & KATHLEEN DeLAURENTI about UNLOCKING THE DIGITAL AGE, a crucial resource for early career musicians navigating the complexities of the digital era.

REGISTER NOW

“[Musicians,] Use this book as a tool to enhance your understanding, protect your creations, and confidently step into the world of digital music. Embrace the journey with the same fervor you bring to your music and let this guide be a catalyst in shaping a fulfilling and sustainable musical career.”
– Dean Fred Bronstein, THE PEABODY INSTITUTE OF THE JOHNS HOPKINS UNIVERSITY

Based on coursework developed at the Peabody Conservatory, Unlocking the Digital Age: The Musician’s Guide to Research, Copyright, and Publishing by Andrea I. Copland and Kathleen DeLaurenti [READ NOW] serves as a crucial resource for early career musicians navigating the complexities of the digital era. This guide bridges the gap between creative practice and scholarly research, empowering musicians to confidently share and protect their work as they expand their performing lives beyond the concert stage as citizen artists. It offers a plain language resource that helps early career musicians see where creative practice and creative research intersect and how to traverse information systems to share their work. As professional musicians and researchers, the authors’ experiences on stage and in academia makes this guide an indispensable tool for musicians aiming to thrive in the digital landscape.

Copland and DeLaurenti will be in conversation with musician and educator, Kyoko Kitamura. Music librarian Matthew Vest will facilitate our discussion.

Unlocking the Digital Age: The Musician’s Guide to Research, Copyright, and Publishing is available to read & download.

REGISTER NOW

About our speakers

ANDREA I. COPLAND is an oboist, music historian, and librarian based in Baltimore, MD. Andrea has dual master’s of music degrees in oboe performance and music history from the Peabody Institute of the Johns Hopkins University and is currently Research Coordinator at the Répertoire International de la Presse Musicale (RIPM) database. She is also a teaching artist with the Baltimore Symphony Orchestra’s OrchKids program and writes a public musicology blog, Outward Sound, on substack.

KATHLEEN DeLAURENTI is the Director of the Arthur Friedheim Library at the Peabody Institute of The Johns Hopkins University where she also teaches Foundations of Music Research in the graduate program. Previously, she served as scholarly communication librarian at the College of William and Mary where she participated in establishing state-wide open educational resources (OER) initiatives. She is co-chair of the Music Library Association (MLA) Legislation Committee as well as a member of the Copyright Education sub-committee of the American Library Association (ALA) and is past winner of the ALA Robert Oakley Memorial Scholarship for copyright research. DeLaurenti is passionate about copyright education, especially for musicians. She is active in communities of practice working on music copyright education, sustainable economic models for artists and musicians, and policy for a balanced copyright system. DeLaurenti served as the inaugural Open Access Editor of MLA and continues to serve on the MLA Open Access Editorial Board. She holds an MLIS from the University of Washington and a BFA in vocal performance from Carnegie Mellon University.

KYOKO KITAMURA is a Brookyn-based vocal improviser, bandleader, composer and educator, currently co-leading the quartet Geometry (with cornetist Taylor Ho Bynum, guitarist Joe Morris and cellist Tomeka Reid) and the trio Siren Xypher (with violist Melanie Dyer and pianist Mara Rosenbloom). A long-time collaborator of legendary composer Anthony Braxton, Kitamura appears on many of his releases and is the creator of the acclaimed 2023 documentary Introduction to Syntactical Ghost Trance Music which DownBeat Magazine calls “an invaluable resource for Braxton-philes.” Active in interdisciplinary performances, Kitamura recently provided vocals for, and appeared in, artist Matthew Barney’s 2023 five-channel installation Secondary.

MATTHEW VEST is the Music Inquiry and Research Librarian at UCLA. His research interests include change leadership in higher education, digital projects and publishing for music and the humanities, and composers working at the margins of the second Viennese School. He has also worked in the music libraries at the University of Virginia, Davidson College, and Indiana University and is the Open Access Editor for the Music Library Association.

Book Talk: UNLOCKING THE DIGITAL AGE
April 3 @ 10am PT / 1pm ET
VIRTUAL
Register now!

Hachette v. IA Amicus Briefs: Highlight on Privacy and Controlled Digital Lending

Posted January 16, 2024

Photo by Matthew Henry on Unsplash

Over the holidays you may have read about the amicus brief we submitted in the Hachette v. Internet Archive case about library controlled digital lending (CDL), which we’ve been tracking for quite some time. Our brief was one of 11 amicus briefs filed that explained to the court the broader implications of the case. Internet Archive itself has a short overview of the others already (representing 20 organizations and 298 individuals–mostly librarians and legal experts). 

I thought it would be worthwhile to highlight some of the important issues identified by these amici that did not receive much attention earlier in the lawsuit. This post is about the reader’s privacy issues raised by several amici in support of Internet Archive and CDL. Later this week we’ll have another post focused on briefs and arguments about why the district court inappropriately construed Internet Archive’s lending program as “commercial.” 

Privacy and CDL 

One aspect of library lending that’s really special is the privacy that readers are promised when they check out a book. Most states have special laws that require libraries to protect readers’ privacy, something that libraries enthusiastically embrace (e.g., see the ALA Library Bill of Rights) as a way to help foster free inquiry and learning among readers.  Unlike when you buy an ebook from Amazon–which keeps and tracks detailed reader information–dates, times, what page you spent time on, what you highlighted–libraries strive to minimize the data they keep on readers to protect their privacy. This protects readers from data breaches or other third party demands for that data. 

The brief from the Center for Democracy and Technology, Library Freedom Project, and Public Knowledge spends nearly 40 pages explaining why the court should consider reader privacy as part of its fair use calculus. Represented by Jennifer Urban and a team of students at the Samuelson Law, Technology and Public Policy Clinic at UC Berkeley Law (disclosure: the clinic represents Authors Alliance on some matters, and we are big fans of their work), the brief masterfully explains the importance of this issue. From their brief, below is a summary of the argument (edited down for length): 

The conditions surrounding access to information are important. As the Supreme Court has repeatedly recognized, privacy is essential to meaningful access to information and freedom of inquiry. But in ruling against the Internet Archive, the district court did not consider one of CDL’s key advantages: it preserves libraries’ ability to safeguard reader privacy. When employing C

DL, libraries digitize their own physical materials and loan them on a digital-to-physical, one-to-one basis with controls to prevent redistribution or sharing. CDL provides extensive, interrelated benefits to libraries and patrons, such as increasing accessibility for people with disabilities or limited transportation, improving access to rare and fragile materials, facilitating interlibrary resource sharing—and protecting reader privacy. For decades, libraries have protected reader privacy, as it is fundamental to meaningful access to information. Libraries’ commitment is reflected in case law, state statutes, and longstanding library practices. CDL allows libraries to continue protecting reader privacy while providing access to information in an increasingly digital age. Indeed, libraries across the country, not just the Internet Archive, have deployed CDL to make intellectual materials more accessible. And while increasing accessibility, these CDL systems abide by libraries’ privacy protective standards. 

Commercial digital lending options, by contrast, fail to protect reader privacy; instead, they threaten it. These options include commercial aggregators—for-profit companies that “aggregate” digital content from publishers and license access to these collections to libraries and their patrons—and commercial e-book platforms, which provide services for reading digital content via e-reading devices, mobile applications (“apps”), or browsers. In sharp contrast to libraries, these commercial actors track readers in intimate detail. Typical surveillance includes what readers browse, what they read, and how they interact with specific content—even details like pages accessed or words highlighted. The fruits of this surveillance may then be shared with or sold to third parties. Beyond profiting from an economy of reader surveillance, these commercial actors leave readers vulnerable to data breaches by collecting and retaining vast amounts of sensitive reader data. Ultimately, surveilling and tracking readers risks chilling their desire to seek information and engage in the intellectual inquiry that is essential to American democracy. 

Readers should not have to choose to either forfeit their privacy or forgo digital access to information; nor should libraries be forced to impose this choice on readers. CDL provides an ecosystem where all people, including those with mobility limitations and print disabilities, can pursue knowledge in a privacy-protective manner. . . . 

An outcome in this case that prevents libraries from relying on fair use to develop and deploy CDL systems would harm readers’ privacy and chill access to information. But an outcome that preserves CDL options will preserve reader privacy and access to information. The district court should have more carefully considered the socially beneficial purposes of library-led CDL, which include protecting patrons’ ability to access digital materials privately, and the harm to copyright’s public benefit of disallowing libraries from using CDL. Accordingly, the district court’s decision should be reversed.

The court below considered CDL copies and licensed ebook copies as essentially equivalent and concluded that the CDL copies IA provided acted as substitutes for licensed copies. Authors Alliance’s amicus brief points out some of the ways that CDL copies actually quite different significantly from licensed copies. It seems to me that this additional point about protection of reader privacy–and the protection of free inquiry that comes with it–is exactly the kind of distinguishing public benefit that the lower court should have considered but did not. 

You can read the full brief from the Center for Democracy and Technology, Library Freedom Project, and Public Knowledge here.