Join us for a book talk with ANDREA I. COPLAND & KATHLEEN DeLAURENTI about UNLOCKING THE DIGITAL AGE, a crucial resource for early career musicians navigating the complexities of the digital era.
“[Musicians,] Use this book as a tool to enhance your understanding, protect your creations, and confidently step into the world of digital music. Embrace the journey with the same fervor you bring to your music and let this guide be a catalyst in shaping a fulfilling and sustainable musical career.” – Dean Fred Bronstein, THE PEABODY INSTITUTE OF THE JOHNS HOPKINS UNIVERSITY
Based on coursework developed at the Peabody Conservatory, Unlocking the Digital Age: The Musician’s Guide to Research, Copyright, and Publishingby Andrea I. Copland and Kathleen DeLaurenti [READ NOW] serves as a crucial resource for early career musicians navigating the complexities of the digital era. This guide bridges the gap between creative practice and scholarly research, empowering musicians to confidently share and protect their work as they expand their performing lives beyond the concert stage as citizen artists. It offers a plain language resource that helps early career musicians see where creative practice and creative research intersect and how to traverse information systems to share their work. As professional musicians and researchers, the authors’ experiences on stage and in academia makes this guide an indispensable tool for musicians aiming to thrive in the digital landscape.
Copland and DeLaurenti will be in conversation with musician and educator, Kyoko Kitamura. Music librarian Matthew Vest will facilitate our discussion.
Unlocking the Digital Age: The Musician’s Guide to Research, Copyright, and Publishing is available to read & download.
ANDREA I. COPLAND is an oboist, music historian, and librarian based in Baltimore, MD. Andrea has dual master’s of music degrees in oboe performance and music history from the Peabody Institute of the Johns Hopkins University and is currently Research Coordinator at the Répertoire International de la Presse Musicale (RIPM) database. She is also a teaching artist with the Baltimore Symphony Orchestra’s OrchKids program and writes a public musicology blog, Outward Sound, on substack.
KATHLEEN DeLAURENTI is the Director of the Arthur Friedheim Library at the Peabody Institute of The Johns Hopkins University where she also teaches Foundations of Music Research in the graduate program. Previously, she served as scholarly communication librarian at the College of William and Mary where she participated in establishing state-wide open educational resources (OER) initiatives. She is co-chair of the Music Library Association (MLA) Legislation Committee as well as a member of the Copyright Education sub-committee of the American Library Association (ALA) and is past winner of the ALA Robert Oakley Memorial Scholarship for copyright research. DeLaurenti is passionate about copyright education, especially for musicians. She is active in communities of practice working on music copyright education, sustainable economic models for artists and musicians, and policy for a balanced copyright system. DeLaurenti served as the inaugural Open Access Editor of MLA and continues to serve on the MLA Open Access Editorial Board. She holds an MLIS from the University of Washington and a BFA in vocal performance from Carnegie Mellon University.
KYOKO KITAMURA is a Brookyn-based vocal improviser, bandleader, composer and educator, currently co-leading the quartet Geometry (with cornetist Taylor Ho Bynum, guitarist Joe Morris and cellist Tomeka Reid) and the trio Siren Xypher (with violist Melanie Dyer and pianist Mara Rosenbloom). A long-time collaborator of legendary composer Anthony Braxton, Kitamura appears on many of his releases and is the creator of the acclaimed 2023 documentary Introduction to Syntactical Ghost Trance Music which DownBeat Magazine calls “an invaluable resource for Braxton-philes.” Active in interdisciplinary performances, Kitamura recently provided vocals for, and appeared in, artist Matthew Barney’s 2023 five-channel installation Secondary.
MATTHEW VEST is the Music Inquiry and Research Librarian at UCLA. His research interests include change leadership in higher education, digital projects and publishing for music and the humanities, and composers working at the margins of the second Viennese School. He has also worked in the music libraries at the University of Virginia, Davidson College, and Indiana University and is the Open Access Editor for the Music Library Association.
Book Talk: UNLOCKING THE DIGITAL AGE April 3 @ 10am PT / 1pm ET VIRTUAL Register now!
Dear Authors Alliance Members, Friends, and Allies,
It is with a heavy heart that I am announcing my departure from Authors Alliance. For me, the development is bittersweet—in a few weeks, I will be starting a new job at a law firm where I’ll focus on litigation and developing my advocacy skills in a new way. I’m excited for this next chapter, but I’ll sorely miss being an Authors Alliance staff member and working to advance the interests of our members, a dedicated and engaged community of authors who care deeply about access to knowledge and culture.
My time at Authors Alliance has seen a lot of change, both on an organizational level and in terms of the world around us. I joined as a staff attorney in late 2020, during a stormy political season and in the midst of a public health crisis. Working with former executive director, Brianna Schofield, I got to know this community and began to understand what mattered to you. I wrote one of our guides, Third-Party Permissions and How to Clear Them, drawing on my past experience working as a literary agent in addition to what I had learned about copyright law and the particular needs of our members. I also spent nine months as our interim executive director before Dave joined us back in 2022. Along the way, with the blessing and guidance of our outstanding board of directors, Authors Alliance began to focus more on policy and scale back our education work. Back in 2014, there was a dearth of these kinds of educational resources for authors, but that has changed over time, particularly with the increasing presence of scholarly communications offices to guide academic scholars.
This week is my last as an employee of Authors Alliance, and next week will be my first as a regular member. During my years with Authors Alliance, I’ve been asked a lot of times “who can join” and whether a person “qualified” as an author. Unlike other authors’ organizations, we don’t gatekeep when it comes to membership. If you—like me—write, for business or for pleasure, and you—like me—believe in our mission, Authors Alliance would love to have you join as a member. And what I love about this organization is that it truly does want to be responsive to the needs of its members. Our two amicusbriefs in the Hachette Books v. Internet Archive litigation (that Dave and Kyle Courtney wrote about just last week) were based on a survey we conducted of members and other authors, because we saw how author interests were taking a back seat to the interests of large publishers in the litigation. I wrote both of these briefs, and it was an absolute pleasure to use my legal training to share this important perspective with the courts.
We created our most recent guide, Writing About Real People, because we so often heard from nonfiction authors writing about real people who had questions about whether they might be exposing themselves to legal risk. The same is true for the permissions guide—it was partially inspired by the fact that a guest blog post on clearing rights for images had been one of our most popular of all time, indicating the need for this kind of resource. We began conducting advocacy work in the realm of AI and copyright because it was clear that generative AI had the potential to reshape authorship and intellectual property laws, and we thought our voice could be useful as a sensible, measured one that remained optimistic about technology and innovation.
On a personal level, being an attorney for Authors Alliance has given me both a strong sense of job satisfaction and the feeling that my work is helping people and making a difference in the world (something many lawyers can only dream of!). Whether it is seeing our views shape the development of the laws and regulations governing information policy, or hearing from an author who got their rights back or successfully negotiated with their publisher to retain their copyright, the effects of our work have reminded me that our organization really matters. It’s one I have been honored to be a part of for the past three and a half years. Please feel free to reach out over email (for now, you can reach me at rachel@authorsalliance.org) in the next few days, or add me on twitter or LinkedIn—I’d love to stay engaged with this community, even if I’m no longer involved professionally. I also plan to attend our 10th Anniversary celebration in May, and hope to see many of our members and allies there!
Dave Hansen and Kyle Courtney jointly authored this post. They are also the authors of a White Paper on Controlled Digital Lending of Library Books. We are not, as the Publishers claim in their brief on page 13, a “cadre of boosters.” We wrote the paper independently as part of our combined decades of work on libraries and access to knowledge.
Earlier today the publishers (Hachette, Harper Collins, John Wiley, and Penguin Random House) filed their reply brief on appeal in their long-running lawsuit against Internet Archive, which challenges (among other things) the practice of controlled digital lending.
For the months after the decision, we had been observing all the hot takes, cheers, jeers, and awkward declarations about the case, the Internet Archive itself, and Controlled Digital Lending (CDL).
This post is not part of that fanfare. Here, we want to identify a few critical issues that the publishers focus on in their brief, including some questionable fair use analysis that they repeat from the district court below. Much of the brief is framed in heated rhetoric that may cause alarm, but much like publishers’ announcements about interlibrary loan, e-reserves, or document delivery, we believe controlled digital lending is here to stay, regardless of the lower court’s poor copyright analysis and current publisher’s brief.
Framing the Question
As is often the case, the parties disagree on what this case is actually about. For its part, Internet Archive says in their “Statement of the Issue on Appeal” that the question is “whether Internet Archive’s controlled digital lending is fair use.” Publishers, on the other hand, reframe the question more broadly, which in combination with their arguments through the brief, seems intended to not just kill IA’s implementation of controlled digital lending, but to encourage the court to rule in a way that would call into question all other library applications of CDL.. They say that the question is “whether IA’s infringement of the Publishers’ Works is fair use based on IA’s CDL theories and practices.”
This litigation, coordinated by the AAP, seems to us an attempt to undermine what libraries have done for centuries: lend the books that they already lawfully own. Ironically, the opposition calls CDL a made-up theory created by a “cadre of boosters,” but in actuality, it’s the publishers’ licensing system that is a modern, made-up invention. The works themselves are unchanged, but the nature of digital delivery allows publishers to charge people in new ways. There is nothing in the Copyright Act that states ebook licensing is, or should be, the default way for libraries to acquire and lend books.
Commercial vs. Non-Profit Use
One of the most criticized aspects of the decision below is the lower court’s conclusion that IA’s activities are commercial, as opposed to non-profit. The publisher’s brief enthusiastically embraces this conclusion, while also attempting to drive a wedge between IA’s lending and that of other libraries: “IA’s practices are distinctly commercial – especially in comparison to public and academic libraries.”
The district court concluded that IA’s activity was commercial because it “stands to profit” through its partnership with Better World Books on its website, and by “us[ing] its Website to attract new members, solicit donations, and bolster its standing in the library community” (p. 26).
As many amici pointed out earlier in the appeal, the use of a nonprofit’s website to solicit donations is routine; it would be chilling for sites like Wikipedia, Project Gutenberg, Hathitrust and others (all of whom filed briefs in this case) to face heightened copyright liability just because they seek donations in combination with aspects of their sites that rely on a fair use assertion. The publishers attempt to distance themselves from this absurd result (“The concern that Judge Koeltl’s analysis “would render virtually all nonprofit uses commercial” is wildly overblown”), but it is clear from the number and diversity of amici who filed to speak to just this issue that the concern is very real.
As for Better World Books (BWB): BWB is an online bookstore and a Certified B Corporation, meaning that it achieves high standards of social and environmental performance, transparency, and accountability. B Corps are committed to using business as a force for good in the world. According to its website, BWB donates books to nonprofit organizations, including the Internet Archive. As of November 2019, IA and BWB have a partnership to digitize books for preservation purposes.
The focus on the supposedly commercial relationship with Better World Books (a used book reseller) seems to us a stretch based on the facts. The publishers’ brief makes a big deal of Better World Books (referencing them over 20 times in the brief), and argues that IA’s use is commercial because a) IA encourages readers to purchase books through links on its site to Better World Books, and b) Better World Books donates some funds back to IA. The first point is perplexing–one would think they’d be pleased that readers are encouraged to purchase copies of their books–even if on the used market. But the later point about BetterWorld Books’ commercial influence on IA’s operation is just not rooted in the facts of the case. As IA laid out in its opening brief, it has only received $5,561.41 from Better World Books in the relevant time frame. That’s an infinitesimally small drop in the bucket compared to the costs that IA has borne to digitize and lend books for no monetary return from readers. It’s hard to see how such an amount could be construed to tilt IA’s entire operation into a commercial activity.
For anyone who has actually worked on such projects, it is clear that IA is not archiving or lending books for commercial purposes. The idea that there is money to be made in doing so is laughable. Instead, it is providing access to knowledge and cultural heritage. This fundamental point somehow got lost on the publishers on the road to enormous profits.
eBooks vs. Digitized Books
There are lots of nuances that got lost in the decision below, which we believe were helpfully addressed by amici filings earlier in this appeal (e.g., the privacy implications of licensed ebooks vs. CDL copies lent by libraries). The publishers seem happy to gloss over the details again in this brief, particularly when it comes to the differences between licensed ebooks and those that are lent out with CDL.
First, the publisher’s brief makes clear they really don’t like it when books are available for free.. They use the word 33 times (about every other page of the brief)! Many of the references obscure what “free” really means though – for example, asserting that “Two Publishers believe that 39-50% of American ebook consumers read their ebooks for free from libraries rather than paying for their own commercial ebooks” (emphasis added) while ignoring the exorbitant costs and other burdens placed on libraries and the public to fund that licensed access. This is a major part of why libraries have responded both by embracing CDL and by advocating for laws that would require fair licensing terms for ebooks. .
Second, as far as market harm goes, the Publisher’s assert that “IA offered the Publishers’ library and consumer customers a free competing substitute to the authorized ebook editions” essentially arguing that “you can’t compete with free.” But, that is just not true. Examples are trivially easy to conjure up open source software vs. Microsoft or iOS. How often do you run into someone who uses Libre Open Office, or Ubuntu? And of course in creative industries, we’ve seen this kind of model take hold in numerous areas, including book publishing, with “freemium” models.’
That’s because products that are free often offer a different user experience than those that aren’t. Usually when someone opts to pay, they’re paying for an enhanced experience. The same holds true of books scanned for CDL vs. licensed ebooks. CDL books are just that – they are digitized physical books. They don’t have the nice, crisp text of licensed ebooks, nor the interactive features. You can’t highlight, or change the font, or look up a word by touching it, or do any of the myriad of functions that you can with an ebook.
That a library is loaning and controlling those copies is also a major distinguishing factor, because borrowing a book from a library (along with all the special privacy protections one receives) provides a vastly different reading environment than one in which vendors can scrape, process and sell data about your reading experience. Notably, the publishers did not engage with this argument.
“IA refuses to pay the customary price and join the Publishers’ thriving market for authorized library ebooks…”
Good gravy! According to the publishers, libraries should be forced to pay over and over again for the same book, to join a market for which there is no evidence that they are harming.
The publishers’ devote a large portion of their brief – nearly 20 pages– to arguing about market harm. Most of it comes down to the assertion that mere fact of the existence of a digital book market means that CDL must negatively impact the rightsholders’ profits (despite no empirical evidence of market harm). The lower court decision stated that IA has the “burden to show a lack of market harm” (p. 43), and concluded (without reference to meaningful evidence) that “that harm here is evident” (p. 44), an assumption which the publishers are happy to rest on.
There is a genuinely important legal question raised here about which party needs to prove what when it comes to market harm. The publisher’s brief relies heavily on the idea that IA bears the burden on every point of its fair use defense, especially market harm. But as IA points out in its opening brief,
“Although the Supreme Court has stated fair use is an affirmative defense for which defendants bear the burden (Campbell, 510 U.S. at 1177), it has also suggested this burden may apply differently to noncommercial uses than commercial ones. Sony stated that noncommercial cases require “a showing by a preponderance of the evidence that some meaningful likelihood of future harm exists.” 464 U.S. at 417; see Princeton Univ. Press v. Mich. Document Servs., Inc., 99 F.3d 1381, 1385- 86 (6th Cir. 1996) (“The burden of proof as to market effect rests with the copyright holder if the challenged use is of a ‘noncommercial’ nature.”).
Conclusion
The brief is predictably hyperbolic, and continues to refuse to allow for any room for digital lending based on a misreading, in our view, of precedents such as Sony, TVEyes, and ReDigi. But, CDL is not some form of library-sanctioned piracy. CDL is based in copyright, fair use, and the public mission of libraries, while also broadening access to the books that library systems spend billions of dollars to collect and maintain for the public—including long-neglected, out-of-print books with enormous social and scholarly value and books for which commercial ebook licenses are not available.
During the pandemic, the importance of digital library access became strikingly apparent. It is unfortunate that the Publishers chose that moment of national emergency to sue a non-profit library for loaning books digitally. CDL simply seeks to preserve the library’s long-established and vital mission to collect and lend books in an increasingly licensed-access digital world.
Some of you may recall that Authors Alliance published our long-awaited guide, Writing About Real People, earlier this year. One of the major topics in the guide is the right of publicity—a right to control use of one’s own identity, particularly in the context of commercial advertising. These issues have been in the news a lot lately as generative AI poses new questions about the scope and application of the right of publicity.
Sound-alikes and the Right of Publicity
One important right of publicity question in the genAI era concerns the increasing prevalence of “sound-alikes” created using generative AI systems. The issue of AI-generated voices that mimicked real people came to the public’s attention with the apparently convincing “Heart on My Sleeve” song, imitating Drake and the Weeknd, and tools that facilitate creating songs imitating popular singers have increased in number and availability.
AI-generated soundalikes are a particularly interesting use of this technology when it comes to the right of publicity because one of the seminal right of publicity cases, taught in law schools and mentioned in primers on the topic, concerns a sound-alike from the analog world. In 1986, the Ford Motor Company hired an advertising agency to create a TV commercial. The agency obtained permission to use “Do You Wanna Dance,” a song Bette Midler had famously covered, in its commercial. But when the ad agency approached Midler about actually singing the song for the commercial, she refused. The agency then hired a former backup singer of Midler’s to record the song, apparently asking the singer to imitate Midler’s voice in the recording. A federal court found that this violated Midler’s right of publicity under California law, even though her voice was not actually used. Extending this holding to AI-generated voices seems logical and straightforward—it is not about the precise technology used to create or record the voice, but about the end result the technology is used to achieve.
Right of Publicity Legislation
The right of publicity is a matter of state law. In some states, like California and New York, the right of publicity is established via statute, and in others, it’s a matter of common law (or judge-made law). In recent months, state legislatures have proposed new laws that would codify or expand the right of publicity. Similarly, many have called for the establishment of a federal right of publicity, specifically in the context of harms caused by the rise of generative AI. One driving force behind calls for the establishment of a federal right of publicity is the patchwork nature of state right of publicity laws: in some states, the right of publicity extends only to someone’s name, image, likeness, voice, and signature, but in others, it’s much broader. While AI-generated content and the ways in which it is being used certainly pose new challenges for courts considering right of publicity violations, we are skeptical that new legislation is the best solution.
In late January, the No Artificial Intelligence Fake Replicas and Unauthorized Duplications Act of 2024 (or “No AI FRAUD Act”) was introduced in the House of Representatives. The No AI FRAUD Act would create a property-like right in one’s voice and likeness, which is transferable to other parties. It targets voice “cloning services” and mentions the “Heart on My Sleeve” controversy specifically. But civil societies and advocates for free expression have raisedalarm about the ways in which the bill would make it easier for creators to actually lose control over their own personality rights while also impinging on others’ First Amendment rights due to its overbreadth and the property-like nature of the right it creates. While the No AI FRAUD Act contains language stating that the First Amendment is a defense to liability, it’s unclear how effective this would be in practice (and as we explain in the Writing About Real People Guide, the First Amendment is always a limitation on laws affecting freedom of expression).
The Right of Publicity and AI-Generated Content
In the past, the right of publicity has been described as “name, image, and likeness” rights. What is interesting about AI-generated content and the right of publicity is that a person’s likeness can be used in a more complete way than ever before. In some cases, both their appearance and voice are imitated, associated with their name, and combined in a way that makes the imitation more convincing.
What is different about this iteration of right of publicity questions is the actors behind the production of the soundalikes and imitations, and, to a lesser extent, the harms that might flow from these uses. A recent use of a different celebrity’s likeness in connection with an advertisement is instructive on this point. Earlier this year, advertisements emerged on various platforms featuring an AI-generated Taylor Swift participating in a Le Creuset cookware giveaway. These ads contained two separate layers of deceptiveness: most obviously, that Swift was AI-generated and did not personally appear in the ad, but more bafflingly, that they were not Le Creuset ads at all. The ads were part of a scam whereby users might pay for cookware they would never receive, or enter credit card details which could then be stolen or otherwise used for improper purposes. Compared to more traditional conceptions of advertising, the unfair advantages and harms caused by the use of Swift’s voice and likeness are much more difficult to trace. Taylor Swift’s likeness and voice were appropriated by scammers to trick the public into thinking they were interacting with Le Creuset advertising.
It may be that the right of publicity as we know it (and as we discuss it in the Writing About Real People Guide) is not well-equipped to deal with these kinds of situations. But it seems to us that codifying the right of publicity in federal law is not the best approach. Just as Bette Midler had a viable claim under California’s right of publicity statute back in 1992, Taylor Swift would likely have a viable claim against Le Creuset if her likeness had been used by that company in connection with commercial advertising. The problem is not the “patchwork of state laws,” but that this kind of doubly-deceptive advertising is not commercial advertising at all. On a practical level, it’s unclear what party could even be sued by this kind of use. Certainly not Le Creuset. And it seems to us unfair to say that the creator of the AI technology sued should be left holding the bag, just because someone used it for fraudulent purposes. The real fraudsters—anonymous but likely not impossible to track down—are the ones who can and should be pursued under existing fraud laws.
Authors Alliance has said elsewhere that reforms to copyright law cannot be the solution to any and all harms caused by generative AI. The same goes for the intellectual property-like right of publicity. Sensible regulation of platforms, stronger consumer protection laws, and better means of detecting and exposing AI-generated content are possible solutions to the problems that the use of AI-generated celebrity likenesses have brought about. To instead expand intellectual property rights under a federal right of publicity statute risks infringing on our First Amendment freedoms of speech and expression.
We’ve been tracking for a fewyearsthe new copyright small claims court known as the Copyright Claims Board. My last update was in September when I posted a summary of a paper I wrote with Katie Fortney summarizing data about the first year of operations of the court (thanks entirely to Katie for doing the hard work of extracting that data and sharing it in an easy-to-understand format).
As explained then, the CCB has been slow in processing cases; it only entered a final judgment on the merits in one case when I last wrote. It has now issued a total of 18 final determinations, about half of which are default determinations (cases where the respondent failed to appear or refused to participate in the CCB process). The facts for most of these cases are not very interesting, but two of the most recent caught my attention.
Oakes v. Heart of Gold Pageant System
The first case, Oakes v. Heart of Gold Pageant System Inc., highlights a concern from opponents of the CCB when it was being debated in Congress. Namely, the CCB’s ability to make default determinations could be a trap for the unwary defendants who don’t understand what the CCB is, what a case before it could mean for them, or what their rights are to opt out of a CCB proceeding.
The facts are unspectacular: Oakes, a professional photographer represented by Higbee & Associates, filed a CCB complaint against Heart of Gold and its owner, Angel Jameson, for using photographs taken by Oakes on Heart of Gold’s Facebook page and in materials for events it sponsored. Oakes originally filed the claim in July 2022 and then refiled it in August 2022 with some corrections. Oakes then provided the CCB with the required proof of service (proof that Oakes had adequately informed Heart of Gold and Jameson of the CCB claim) in October 2022.
At this point, the ball was in Heart of Gold and Jameson’s court; she could either respond and defend her use, or (if done within 60 days of service) opt out of the CCB proceeding altogether. Unfortunately for her, she did neither, which resulted in a default determination against her for $4,500.
We learn in the final determination a little more about Jameson’s lack of participation. As the CCB recounts in its final default determination:
“At multiple points in this procedural history, Jameson has contacted the CCB, and after communicating with staff, has affirmed each time her intent to not participate in this proceeding.”
“Jameson initially contacted the Board in response to this Zoom link, expressing her disbelief that the Board is a government tribunal.”
“Jameson then sent another email in response to the First Default, requesting an ‘official day in court.’”
“In a subsequent call with CCB staff in March, Jameson indicated that she would not participate.”
“Shortly after the order scheduling the hearing, Jameson contacted the U.S. Copyright Office’s Public Information Office, who placed her in contact with CCB staff. In a follow-up call, CCB staff again explained the proceeding and Jameson again affirmed that she would not participate in the proceeding.”
Jameson missed her opportunity to opt out early in the case – she had a sixty-day window to do so, as defined by CCB regulations. So, her protests later were ineffective to opt out, even though it seems clear that she did not want her case to be heard by the CCB.
Joe Hand Promotions v. Dawson
A second default determination case offers a slightly different view of how the CCB treats defaults. The facts are similarly straightforward: Joe Hand is a company that “specializes in commercially licensing premier sporting events to commercial locations such as bars, restaurants, lounges, clubhouses, and similar establishments.” Joe Hand had obtained the exclusive right to sell pay-per-view access to a boxing event–” Deontay Wilder vs. Tyson Fury II,” to commercial establishments, including bars. Joe Hand provided evidence that a California bar, “Bottoms Up,” had shown the match without permission.
Joe Hand (a frequent filer with the CCB, with 33 cases to its name) ran into a problem in this case, however, because it didn’t actually file its case against Bottoms Up, but instead against the individual that is listed on the bar’s liquor license and ownership documents, Mary Dawson. Even in Dawson’s absence, the CCB was unwilling to rubber-stamp Joe Hand’s claims against her. The final determination explained,
“Beyond the conclusory and clearly boilerplate allegations in the Claim that Dawson (and now-dismissed respondent Giglio) ‘owned, operated, maintained, and controlled the commercial business known as Bottoms Up Bar & Grill’ and ‘had a right and ability to supervise the activities of the Establishment on the date of the Program and had an obvious and direct financial interest in the activities of the Establishment on the date of the Program’ (Dkt. 1), Claimant offers absolutely no information linking Respondent to the infringement.”
I will spare you the details, but the CCB went on to cite case after case explaining why courts have routinely rejected such boilerplate claims, and required plaintiffs to at least allege meaningful facts connecting an individual to an act of infringement. Even in this default case where Dawson was not present to defend herself, the CCB put in the effort on her behalf.
Takeaways
I have a few observations. In the first case, given that Jameson clearly did not want her case heard before the CCB, I think it would have been fair for the CCB to allow her a second chance to opt out. At least on the record we have available, there is no indication that the CCB offered her that chance. Although the normal opt-out period extends only sixty days after service, the CCB opt-out regulations also state that “the Board may extend the 60-day period to opt out in exceptional circumstances and in the interests of justice.”
It seems to me, given the newness of the CCB system, the small number of cases filed to date, and the relative lack of awareness among most people that the CCB is a legitimate government forum (Jameson expressed such doubt herself), the “interests of justice” may well dictate a more flexible approach at least at the outset of operations of the CCB.
The CCB has demonstrated an extraordinary willingness to offer helpful guidance, flexibility, and multiple opportunities to claimants, and so respondents may have expected a similar approach to help them along through the process. At least in this case, we see a more stringent approach. An obvious takeaway for respondents then is to pay attention to notices about CCB claims and associated deadlines, and opt-out early on in the process if they think they don’t want their case heard there.
The Dawson case, however, does show that the CCB isn’t willing to let claimants make unsubstantiated claims against absent respondents. Though Joe Hand is surely familiar with the process and it would have been easy for the CCB to accept its barebones allegations against Dawson as true, the CCB made the case itself–with ample legal support–that even claims against absent respondents require claimants to make a real case.
Overall, these are just two cases, so I don’t want to read into them too much. But it’s already looking like a large portion of CCB cases will be defaults (10 out of the 18 final determinations to date, and more than half of the existing active cases are trending in that direction). So, it’s good to keep an eye on how the CCB will treat these types of cases, given the risks they pose for unwary and uninformed respondents.
To recap: our expansion petitions ask the Copyright Office to modify the existing TDM exemption so that researchers who assemble corpora of ebooks or films on which to conduct text and data mining are able to share that corpus with other academic researchers, where this second group of researchers qualifies under the exemption. Under the current exemption, academic researchers are only able to share their corpora with other qualified researchers for purposes of “collaboration and verification.” This simple change would eliminate the need for duplicative efforts to remove digital locks from ebooks and films, a time and resource-intensive process, broadening the group of academic researchers who are able to use the exemption.
Our comment argues that the existing TDM exemption has begun to enable valuable digital humanities research and teaching, but that the proposed expansion would go much further towards enabling this research and helping TDM researchers reach their goals. The comment is accompanied by 13 letters of support from researchers, educators, and funding organizations, highlighting the research that has been done in reliance on the exemption, and explaining why this expansion is necessary. Our thanks go out to our stellar clinical team at UC Berkeley’s Samuelson Law, Technology & Public Policy Clinic—law students Mathew Cha and Zhudi Huang, and clinical supervisor Jennifer Urban—for writing and submitting this comment on our behalf. We are also grateful to our co-petitioners, the Library Copyright Alliance and American Association of University Professors, for their support on this comment.
Ambiguity in “Collaboration”
One reason the expansion is necessary is the uncertainty over what constitutes “collaboration” under the existing exemption. Researchers have open questions about what level of individual contribution to a project would make researchers “collaborators” under the exemption. As our comment explains, collaboration can come in a number of different forms, from “formal collaborations under the auspice of a grant, [to] ad hoc collaborations that result from two teams discovering that they are working on similar material to the same ends, or even discussions at conferences between members of a loose network of scholars working on the same broad set of interests.” But it is not clear which of these activities is “collaboration” for the purposes of the exemption. And this uncertainty has had a chilling effect on the socially valuable research made possible by the exemption.
Costly Corpora Creation
Our comment also highlights the vast costs that go into creating a usable corpus for TDM research. Institutions whose researchers are conducting TDM research pursuant to the exemption must lawfully own the works in question, or license them through a license that is not time-limited. But these costs pale in comparison to the required computing resources—a cost which is compounded by the exemption’s strict security requirements—and human labor involved in bypassing technical protection measures and assembling a corpus. Moreover, it’s important to recognize that there is simply not a tremendous amount of grant funding or even institutional support available to TDM researchers.
Because corpora are so costly to assemble and create, we believe it to be reasonable to permit researchers to share their corpora with researchers at other institutions who want to conduct independent TDM research on these corpora. As the exemption currently stands, researchers interested in pre-existing corpora must duplicate the efforts of the previous researchers, incurring massive costs along the way. We’ve already seen indications that these costs can lead researchers to avoid certain research questions and areas of study altogether. As our comment explains, this “duplicative circumvention” can be avoided by changing the language of the exemption to permit corpora sharing between qualified researchers at separate institutions.
Equity Issues
Worse still, not all institutions are able to bear these expenses. Our comment explains how the current exemption’s prohibition on sharing beyond collaboration and verification—and consequent duplication of prior labor—-”create[s] barriers that can prevent smaller and less-well-resourced institutions from conducting TDM research at all.” This creates inequity in what type of institutions can support TDM projects, and what types of researchers can conduct them. The unfortunate result has been that large institutions that have “the resources to compensate and maintain technical staff and infrastructure” are able to support TDM research under the exemption, while smaller institutions are not.
Values of Corpora Sharing
Our comment explains how allowing limited sharing of corpora under the exemption would go a long way towards lowering barriers to entry for TDM research and ameliorating the equity issues described above. Since digital humanities is already an under-resourced field, the effects of enabling researchers to share their corpora with other academic researchers could be quite profound.
Researchers who wrote letters in support of the petition described a multitude of exciting projects, and have built “a rich set of corpora to study, such as a collection of fiction written by African American writers, a collection of books banned in the United States, and a curated corpus of movies and television with an ‘emphasis on racial, ethnic, sexual, and gender diversity.’” Many of those who wrote letters in support of our petition recounted requests they’ve gotten from other researchers to use their corpora, and who were frustrated that the exemption’s prohibition on non-collaborative sharing and their limited capacity for collaboration prevented them from sharing these corpora.
Allowing new researchers with new research questions to study these corpora could reveal new insights about these bodies of work. As we explain, “in the same way a single literary work or motion picture can evince multiple meanings based on the lens of analysis used, when different researchers study one corpus, they are able to pose different research questions and apply different methodologies, ultimately revealing new and original findings . . . . Enabling broader sharing and thus, increasing the number of researchers that can study a corpus, will allow a body of works to be better understood beyond the initial ‘limited set of research questions.’”
Fair Use
The 1201 rulemaking process for exemptions to DMCA § 1201’s prohibition on breaking digital locks requires that the proposed activity be a fair use. In the 2021 proceedings, the Office recognized TDM for research and teaching purposes as a fair use. Because the expansion we’re seeking is relatively minor, our comment explains that the types of uses we are asking the Office to permit researchers to make is also fair use. Our comment explains that each of the four fair use factors favor fair use in the context of the proposed expansion. We further explain why the enhanced sharing the expansion would provide does not harm the market for the original works under factor four: because institutions must lawfully own (or license under a non-time-limited license) the works that their researchers wish to conduct TDM on, it makes no difference from a market standpoint whether researchers bypass technical protection measures themselves, or share another institution’s corpus. Copyright holders are not harmed when researchers at one institution share a corpus created by researchers at another institution, since both institutions must purchase the works in order to be eligible under the exemption.
What’s Next?
If there are parties that oppose our proposed expansion, they have until February 20th to submit opposition comments to the Copyright Office. Then, on March 19th, our reply comments to any opposition comments will be due. We will keep our readers and members apprised as the process continues to move forward.
Authors Alliance is pleased to share our 2023 annual report, where you can find highlights of our work in 2023 to promote laws, policies, and practices that enable authors to reach wide audiences. In the report, you can read about how we’re helping authors meet their dissemination goals for their works, representing their interests in the courts, and otherwise working to advocate for authors who write to be read.
Our last post highlighted one of the amicus briefs filed in the Hachette v. Internet Archive lawsuit, which made the point that controlled digital lending serves important privacy interests for library readers. Today I want to highlight a second new issue introduced on appeal and addressed by almost every amici: the proper way to assess whether a given use is “non-commercial.”
“Non-commercial” use is important because the first fair use factor directs courts to assess “the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes.” Before the district court, neither Internet Archive (IA) nor amici who filed in support of IA paid considerable attention to arguing about whether IA’s use was commercial, I think because it seemed so clear that lending books for free to library patrons appeared to us a paradigmatic example of non-commercial use. It came as a shock, therefore, when the District Court in this case concluded that “IA stands to profit” from its use and that the use was therefore commercial.
The Court’s reasoning was odd. While it recognized that IA “is a non-profit organization that does not charge patrons to borrow books and because private reading is noncommercial in nature,” the court concluded that because IA gains “an advantage or benefit from its distribution and use” of the works at issue, its use was commercial. Among the “benefits” that the court listed:
IA exploits the Works in Suit without paying the customary price
IA uses its Website to attract new members, solicit donations, and bolster its standing in the library community.
Better World Books also pays IA whenever a patron buys a used book from BWB after clicking on the “Purchase at Better World Books” button that appears on the top of webpages for ebooks on the Website.
Although almost every amici addressed the problems with this approach to “non-commercial” use, three briefs, in particular, added important additional context, explaining both why the district court was wrong on the law and why its rule would have dramatically negative implications for other libraries and nonprofit organizations.
First, the Association of Research Libraries and the American Library Association, represented by Brandon Butler, make a forceful legal argument in their amicus brief about why the district court’s baseline formulation of commerciality (benefit without paying the customary price) was wrong:
The district court’s determination that the Internet Archive (“IA”) was engaged in a “commercial” use for purposes of the first statutory factor is based on a circular argument that seemingly renders every would-be fair use “commercial” so long as the user benefits in some way from their use. This cannot be the law, and in the Second Circuit it is not. The correct standard is clearly stated in American Geophysical Union v. Texaco Inc., 60 F. 3d 913 (2d Cir. 1994), a case the district court ignored entirely.
ARL and ALA then go on to highlight numerous examples of appellate courts (including the Second Circuit) rejecting this approach such as in the 11th Circuit in the Georgia State E-reserves copyright lawsuit: “Of course, any unlicensed use of copyrighted material profits the user in the sense that the user does not pay a potential licensing fee, allowing the user to keep his or her money. If this analysis were persuasive, no use could qualify as ’nonprofit’ under the first factor.”
The constitutional goal of copyright protection is to “promote the progress of science and useful arts,” Art. I, sec. 1, cl. 8, and the first copyright law was “an act for the encouragement of learning,” Cambridge University Press v. Patton, 769 F.3d 1232, 1256 (11th Cir. 2014). This case provides an opportunity for this Court to reaffirm that vision by recognizing the special role that noncommercial, nonprofit uses play in supporting freedom of speech and access to knowledge.
The IP Professors Brief then goes on to highlight the many ways that Congress has indicated that library lending should be treated favorably because it furthers objectives of supporting learning, and how the court’s constrained reading of “non-commercial” is actually in conflict with how that term is used elsewhere in the Copyright Act (for example, Sections 111, 114, and 118 for non-commercial broadcasters, or Section 1008 for non-commercial consumers who copy music). The brief then goes on to make a strong case for why the district court wasn’t only mistaken, but that library lending should presumptively be treated as non-commercial.
Finally, we see the amicus brief from the Wikimedia Foundation, Creative Commons, and Project Gutenberg, represented by Jef Pearlman and a team of students at the USC IP & Technology Law Clinic. Their brief highlighted in detail the practical challenges that the district court’s approach to non-commercial use would pose for all sorts of online nonprofits. The brief explains how nonprofits that raise money will inevitably include donation buttons on pages with fair use content, rely on volunteer contributions, and engage in revenue-generated activities to support their work, which in some cases require millions of dollars for technical infrastructure. The brief explains:
The district court defined “commercial” under the first fair use factor far too broadly, inextricably linking secondary uses to fundraising even when those activities are, in practice, completely unrelated. In evaluating what constitutes commercial use, the district court misapplied several considerations and ignored other critical considerations. As a result, the district court’s ruling threatens nonprofit organizations who make fair use of copyrighted works. Adopting the district court’s approach would threaten both the processes of nonprofit fundraising and the methods by which educational nonprofits provide their services.
Over the holidays you may have read about the amicus brief we submitted in the Hachette v. Internet Archive case about library controlled digital lending (CDL), which we’ve been tracking for quite some time. Our brief was one of 11 amicus briefs filed that explained to the court the broader implications of the case. Internet Archive itself has a short overview of the others already (representing 20 organizations and 298 individuals–mostly librarians and legal experts).
I thought it would be worthwhile to highlight some of the important issues identified by these amici that did not receive much attention earlier in the lawsuit. This post is about the reader’s privacy issues raised by several amici in support of Internet Archive and CDL. Later this week we’ll have another post focused on briefs and arguments about why the district court inappropriately construed Internet Archive’s lending program as “commercial.”
The brief from the Center for Democracy and Technology, Library Freedom Project, and Public Knowledge spends nearly 40 pages explaining why the court should consider reader privacy as part of its fair use calculus. Represented by Jennifer Urban and a team of students at the Samuelson Law, Technology and Public Policy Clinic at UC Berkeley Law (disclosure: the clinic represents Authors Alliance on some matters, and we are big fans of their work), the brief masterfully explains the importance of this issue. From their brief, below is a summary of the argument (edited down for length):
The conditions surrounding access to information are important. As the Supreme Court has repeatedly recognized, privacy is essential to meaningful access to information and freedom of inquiry. But in ruling against the Internet Archive, the district court did not consider one of CDL’s key advantages: it preserves libraries’ ability to safeguard reader privacy. When employing C
DL, libraries digitize their own physical materials and loan them on a digital-to-physical, one-to-one basis with controls to prevent redistribution or sharing. CDL provides extensive, interrelated benefits to libraries and patrons, such as increasing accessibility for people with disabilities or limited transportation, improving access to rare and fragile materials, facilitating interlibrary resource sharing—and protecting reader privacy. For decades, libraries have protected reader privacy, as it is fundamental to meaningful access to information. Libraries’ commitment is reflected in case law, state statutes, and longstanding library practices. CDL allows libraries to continue protecting reader privacy while providing access to information in an increasingly digital age. Indeed, libraries across the country, not just the Internet Archive, have deployed CDL to make intellectual materials more accessible. And while increasing accessibility, these CDL systems abide by libraries’ privacy protective standards.
Commercial digital lending options, by contrast, fail to protect reader privacy; instead, they threaten it. These options include commercial aggregators—for-profit companies that “aggregate” digital content from publishers and license access to these collections to libraries and their patrons—and commercial e-book platforms, which provide services for reading digital content via e-reading devices, mobile applications (“apps”), or browsers. In sharp contrast to libraries, these commercial actors track readers in intimate detail. Typical surveillance includes what readers browse, what they read, and how they interact with specific content—even details like pages accessed or words highlighted. The fruits of this surveillance may then be shared with or sold to third parties. Beyond profiting from an economy of reader surveillance, these commercial actors leave readers vulnerable to data breaches by collecting and retaining vast amounts of sensitive reader data. Ultimately, surveilling and tracking readers risks chilling their desire to seek information and engage in the intellectual inquiry that is essential to American democracy.
Readers should not have to choose to either forfeit their privacy or forgo digital access to information; nor should libraries be forced to impose this choice on readers. CDL provides an ecosystem where all people, including those with mobility limitations and print disabilities, can pursue knowledge in a privacy-protective manner. . . .
An outcome in this case that prevents libraries from relying on fair use to develop and deploy CDL systems would harm readers’ privacy and chill access to information. But an outcome that preserves CDL options will preserve reader privacy and access to information. The district court should have more carefully considered the socially beneficial purposes of library-led CDL, which include protecting patrons’ ability to access digital materials privately, and the harm to copyright’s public benefit of disallowing libraries from using CDL. Accordingly, the district court’s decision should be reversed.
The court below considered CDL copies and licensed ebook copies as essentially equivalent and concluded that the CDL copies IA provided acted as substitutes for licensed copies. Authors Alliance’s amicus brief points out some of the ways that CDL copies actually quite different significantly from licensed copies. It seems to me that this additional point about protection of reader privacy–and the protection of free inquiry that comes with it–is exactly the kind of distinguishing public benefit that the lower court should have considered but did not.
You can read the full brief from the Center for Democracy and Technology, Library Freedom Project, and Public Knowledge here.
This is a guest post by Rachael G. Samberg, Timothy Vollmer, and Samantha Teremi, professionals within the Office of Scholarly Communication Services at UC Berkeley Library.
On academic and library listservs, there has emerged an increasingly fraught discussion about licensing scholarly content when scholars’ research methodologies rely on artificial intelligence (AI). Scholars and librarians are rightfully concerned that non-profit educational research methodologies like text and data mining (TDM) that can (but do not necessarily) incorporate usage of AI tools are being clamped down upon by publishers. Indeed, libraries are now being presented with content license agreements that prohibit AI tools and training entirely, irrespective of scholarly purpose.
Conversely, publishers, vendors, and content creators—a group we’ll call “rightsholders” here—have expressed valid concerns about how their copyright-protected content is used in AI training, particularly in a commercial context unrelated to scholarly research. Rightsholders fear that their livelihoods are being threatened when generative AI tools are trained and then used to create new outputs that they believe could infringe upon or undermine the market for their works.
Within the context of non-profit academic research, rightsholders’ fears about allowing AI training, and especially non-generative AI training, are misplaced. Newly-emerging content license agreements that prohibit usage of AI entirely, or charge exorbitant fees for it as a separately-licensed right, will be devastating for scientific research and the advancement of knowledge. Our aim with this post is to empower scholars and academic librarians with legal information about why those licensing outcomes are unnecessary, and equip them with alternative licensing language to adequately address rightsholders’ concerns.
To that end, we will:
Explain the copyright landscape underpinning the use of AI in research contexts;
Address ways that AI usage can be regulated to protect rightsholders, while outlining opportunities to reform contract law to support scholars; and
Conclude with practical language that can be incorporated into licensing agreements, so that libraries and scholars can continue to achieve licensing outcomes that satisfy research needs.
Our guidance is based on legal analysis as well as our views as law and policy experts working within scholarly communication. While your mileage or opinions may vary, we hope that the explanations and tools we provide offer a springboard for discussion within your academic institutions or communities about ways to approach licensing scholarly content in the age of AI research.
Copyright and AI training
As we have recently explored in presentations and posts, the copyright law and policy landscape underpinning the use of AI models is complex, and regulatory decision-making in the copyright sphere will have ramifications for global enterprise, innovation, and trade. A much-discussed group of lawsuits and a parallel inquiry from the U.S. Copyright Office raise important and timely legal questions, many of which we are only beginning to understand. But there are two precepts that we believe are clear now, and that bear upon the non-profit education, research, and scholarship undertaken by scholars who rely on AI models.
First, as the UC Berkeley Library has explained in greater detail to the Copyright Office, training artificial intelligence is a fair use—and particularly so in a non-profit research and educational context. (For other similar comments provided to the Copyright Office, see, e.g., the submissions of Authors Alliance and Project LEND). Maintaining its continued treatment as fair use is essential to protecting research, including TDM.
TDM refers generally to a set of research methodologies reliant on computational tools, algorithms, and automated techniques to extract revelatory information from large sets of unstructured or thinly-structured digital content. Not all TDM methodologies necessitate usage of AI models in doing so. For instance, the words that 20th century fiction authors use to describe happiness can be searched for in a corpus of works merely by using algorithms looking for synonyms and variations of words like “happiness” or “mirth,” with no AI involved. But to find examples of happy characters in those books, a researcher would likely need to apply what are called discriminative modeling methodologies that first train AI on examples of what qualities a happy character demonstrates or exhibits, so that the AI can then go and search for occurrences within a larger corpus of works. This latter TDM process involves AI, but not generative AI; and scholars have relied non-controversially on this kind of non-generative AI training within TDM for years.
Previous court cases like Authors Guild v. HathiTrust, Authors Guild v. Google, and A.V. ex rel. Vanderhye v. iParadigmshave addressed fair use in the context of TDM and confirmed that the reproduction of copyrighted works to create and conduct text and data mining on a collection of copyright-protected works is a fair use. These cases further hold that making derived data, results, abstractions, metadata, or analysis from the copyright-protected corpus available to the public is also fair use, as long as the research methodologies or data distribution processes do not re-express the underlying works to the public in a way that could supplant the market for the originals.
For the same reasons that the TDM processes constitute fair use of copyrighted works in these contexts, the training of AI tools to do that text and data mining is also fair use. This is in large part because of the same transformativeness of the purpose (under Fair Use Factor 1) and because, just like “regular” TDM that doesn’t involve AI, AI training does not reproduce or communicate the underlying copyrighted works to the public (which is essential to the determination of market supplantation for Fair Use Factor 4).
But, while AI training is no different from other TDM methodologies in terms of fair use, there is an important distinction to make between the inputs for AI training and generative AI’s outputs. The overall fair use of generative AI outputs cannot always be predicted in advance: The mechanics of generative AI models’ operations suggest that there are limited instances in which generative AI outputs could indeed be substantially similar to (and potentially infringing of) the underlying works used for training; this substantial similarity is possible typically only when a training corpus is rife with numerous copies of the same work. And a recent case filed by the New York Times addresses this potential similarity problem with generative AI outputs.
Yet, training inputs should not be conflated with outputs: The training of AI models by using copyright-protected inputs falls squarely within what courts have already determined in TDM cases to be a transformative fair use. This is especially true when that AI training is conducted for non-profit educational or research purposes, as this bolsters its status under Fair Use Factor 1, which considers both transformativeness and whether the act is undertaken for non-profit educational purposes.
Were a court to suddenly determine that training AI was not fair use, and AI training was subsequently permitted only on “safe” materials (like public domain works or works for which training permission has been granted via license), this would curtail freedom of inquiry, exacerbate bias in the nature of research questions able to be studied and the methodologies available to study them, and amplify the views of an unrepresentative set of creators given the limited types of materials available with which to conduct the studies.
The second precept we uphold is that scholars’ ability to access the underlying content to conduct fair use AI training should be preserved with no opt-outs from the perspective of copyright regulation.
The fair use provision of the Copyright Act does not afford copyright owners a right to opt out of allowing other people to use their works in any other circumstance, for good reason: If content creators were able to opt out of fair use, little content would be available freely to build upon. Uniquely allowing fair use opt-outs only in the context of AI training would be a particular threat for research and education, because fair use in these contexts is already becoming an out-of-reach luxury even for the wealthiest institutions. What do we mean?
In the U.S., the prospect of “contractual override” means that, although fair use is statutorily provided for, private parties like publishers may “contract around” fair use by requiring libraries to negotiate for otherwise lawful activities (such as conducting TDM or training AI for research). Academic libraries are forced to pay significant sums each year to try to preserve fair use rights for campus scholars through the database and electronic content license agreements that they sign. This override landscape is particularly detrimental for TDM research methodologies, because TDM research often requires use of massive datasets with works from many publishers, including copyright owners who cannot be identified or who are unwilling to grant such licenses.
So, if the Copyright Office or Congress were to enable rightsholders to opt-out of having their works fairly used for training AI for scholarship, then academic institutions and scholars would face even greater hurdles in licensing content for research. Rightsholders might opt out of allowing their work to be used for AI training fair uses, and then turn around and charge AI usage fees to scholars (or libraries)—essentially licensing back fair uses for research.
Fundamentally, this undermines lawmakers’ public interest goals: It creates a risk of rent-seeking or anti-competitive behavior through which a rightsholder can demand additional remuneration or withhold granting licenses for activities generally seen as being good for public knowledge or that rely on exceptions like fair use. And from a practical perspective, allowing opt-outs from fair uses would impede scholarship by or for research teams who lack grant or institutional funds to cover these additional licensing expenses; penalize research in or about underfunded disciplines or geographical regions; and result in bias as to the topics and regions that can be studied.
“Fair use” does not mean “unregulated”
Although training AI for non-profit scholarly uses is fair use from a copyright perspective, we are not suggesting AI training should be unregulated. To the contrary, we support guardrails because training AI can carry risk. For example, researchers have been able to use generative AI like ChatGPT to solicit personal information by bypassing platform safeguards.
To address issues of privacy, ethics, and the rights of publicity (which govern uses of people’s voices, images, and personas), there should be the adoption of best practices, private ordering, and other regulations.
For instance, as to best practices, scholar Matthew Sag has suggested preliminary guidelines to avoid violations of privacy and the right to publicity. First, he recommends that AI platforms avoid training their large language models on duplicates of the same work. This would reduce the likelihood that the models could produce copyright-infringing outputs (due to memorization concerns), and it would also lessen the likelihood that any content containing potentially private or sensitive information would be outputted from having been fed into the training process multiple times. Second, Sag suggests that AI platforms engage in “reinforcement learning through human feedback” when training large language models. This practice could cut down on privacy or rights of publicity concerns by involving human feedback at the point of training, instead of leveraging filtering at the output stage.
Private ordering would rely on platforms or communities to implement appropriate policies governing privacy issues, rights of publicity, and ethical concerns. For example, the UC Berkeley Library has created policies and practices (called “Responsible Access Workflows”) to help it make decisions around whether—and how—special collection materials may be digitized and made available online. Our Responsible Access Workflows require review of collection materials across copyright, contracts, privacy, and ethics parameters. Through careful policy development, the Library applies an ethics of care approach to making available online the collection content with ethical concerns. Even if content is not shared openly online, it doesn’t mean it’s unavailable for researchers for use in person; we simply have decided not to make that content available in digital formats with lower friction for use. We aim to apply transparent information about our decision-making, and researchers must make informed decisions about how to use the collections, whether or not they are using them in service of AI.
And finally, concerning regulations, countries like those in the EU have recently introduced an AI training framework that requires, among other things, the disclosure of source content, and the rights for content creators to opt out of having their works included in training sets except when the AI training is being done for research purposes by research organizations, cultural heritage institutions, and their members or scholars. United States agencies could consider implementing similar regulations here.
But from a copyright perspective, and within non-profit academic research, fair use in AI training should be preserved without the opportunity to opt out for the reasons we discuss above. Such an approach regarding copyright would also be consistent with the distinction the EU has made for AI training in academic settings, as the EU’s Digital Single Market Directive bifurcates practices outside the context of scholarly research.
While we favor regulation that preserves fair use, it is also important to note that merely preserving fair use rights in scholarly contexts for training AI is not the end of the story in protecting scholarly inquiry. So long as the United States permits contractual override of fair uses, libraries and researchers will continue to be at the mercy of publishers aggregating and controlling what may be done with the scholarly record, even if authors dedicate their content to the public domain or apply a Creative Commons license to it. So in our view, the real work that should be done is pursuing legislative or regulatory arrangements like the approximately 40 other countries that have curtailed the ability of contracts to abrogate fair use and other limitations and exceptions to copyright within non-profit scholarly and educational uses. This is a challenging, but important, mission.
Licensing guidance in the meantime
While the statutory, regulatory, and private governance landscapes are being addressed, libraries and scholars need ways to preserve usage rights for content when training AI as part of their TDM research methodologies. We have developed sample license language intended to address rightsholders’ key concerns while maintaining scholars’ ability to train AI in text and data mining research. We drafted this language to be incorporated into amendments to existing licenses that fail to address TDM, or into stand-alone TDM and AI licenses; however, it is easily adaptable into agreements-in-chief (and we encourage you to do so).
We are certain our terms can continue to be improved upon over time or be tailored for specific research needs as methodologies and AI uses change. But in the meantime, we think they are an important step in the right direction.
With that in mind, it is important to understand that within contracts applying U.S. law, more specific language controls over general language in a contract. So, even if there is a clause in a license agreement that preserves fair use, if it is later followed by a TDM clause that restricts how TDM can be conducted (and whether AI can be used), then that more specific language governs TDM and AI usage under the agreement. This means that libraries and scholars must be mindful when negotiating TDM and AI clauses as they may be contracting themselves out of rights they would otherwise have had under fair use.
So, how can a library or scholar negotiate sufficient AI usage rights while acknowledging the concerns of publishers? We believe publishers have attempted to curb AI usage because they are concerned about: (1) the security of their licensed products, and the fear that researchers will leak or release content behind their paywall; and (2) AI being used to create a competing product that could substitute for the original licensed product and undermine their share of the market. While these concerns are valid, they reflect longstanding fears over users’ potential generalized misuse of licensed materials in which they do not hold copyright. But publishers are already able to—and do—impose contractual provisions disallowing the creation of derivative products and systematically sharing licensed content with third-parties, so additionally banning the use of AI in doing so is, in our opinion, unwarranted.
We developed our sample licensing language to precisely address these concerns by specifying in the grant of license that research results may be used and shared with others in the course of a user’s academic or non-profit research “except to the extent that doing so would substantially reproduce or redistribute the original Licensed Materials, or create a product for use by third parties that would substitute for the Licensed Materials.” Our language also imposes reasonable security protections in the research and storage process to quell fears of content leakage.
Perhaps most importantly, our sample licensing language preserves the right to conduct TDM using “machine learning”and “other automated techniques” by expressly including these phrases in the definition for TDM, thereby reserving AI training rights (including as such AI training methodologies evolve), provided that no competing product or release of the underlying materials is made.
The licensing road ahead
As legislation and standards around AI continue to develop, we hope to see express contractual allowance for AI training become the norm in academic licensing. Though our licensing language will likely need to adapt to and evolve with policy changes and research or technological advancements over time, we hope the sample language can now assist other institutions in their negotiations, and help set a licensing precedent so that publishers understand the importance of allowing AI training in non-profit research contexts. While a different legislative and regulatory approach may be appropriate in the commercial context, we believe that academic research licenses should preserve the right to incorporate AI, especially without additional costs being passed to subscribing institutions or individual users, as a fundamental element of ensuring a diverse and innovative scholarly record.