Author Archives: Thomas Padilla

The Public Interest Corpus: An Update and Opportunities for Co-Development 

Posted February 24, 2025
A Library salute to National Photography Month and the photographer’s skill for staging eye-catching compositions

In December 2024 we announced a new project to develop a public interest AI training corpus focused on books. Over the last few months we’ve been actively engaging a diverse set of stakeholders in the development of The Public Interest Corpus

The Public Interest Corpus is focused on developing large-scale, high-quality AI training data from the world’s memory organizations that serve the public interest. In the aggregate, memory organizations like libraries and archives are in a prime position to address this need given a multi-century focus on developing high-quality, locally and globally comprehensive collections of books, newspapers, scholarly journals, photographs, manuscript materials, and more. We seek to prioritize uses of The Public Interest Corpus that promote learning, access to knowledge, and broad benefits to the public. 

Project Team and Advisory Board

The  project team consists of Dave Hansen, Executive Director of Authors Alliance and Dan Cohen, Vice Provost for Information Collaboration, Dean of the Library, and Professor of History at Northeastern University. In January, I joined the team as the Public Interest AI Strategist. In this capacity I will leverage extensive experience developing community around responsible computational use of memory organization collections as data and responsible AI.  Giulia Taurino, recently joined the team as Project Coordinator. Giulia holds a doctoral degree in Media Studies and Visual Arts from the University of Bologna and the University of Montreal and is currently a member of the NULab for Digital Humanities and Computational Social Science and of AI & Arts interest group at The Alan Turing Institute.

The project team is guided by a strong advisory board composed of senior leaders and experts who think deeply about how authors, libraries, and AI can better serve the public interest. 

  • David Bamman, Associate Professor, UC Berkeley School of Information
  • Sandra Aya Enimil, Director of Scholarly Communications and Collection Strategy, Yale University Library
  • Mike Furlough, Executive Director, HathiTrust
  • David Smith, Associate Professor, Khoury College of Computer Sciences, Northeastern University
  • Claire Stewart, Dean of Libraries and University Librarian, University of Illinois, Urbana-Champaign 
  • Mehtab Khan, Assistant Professor of Law at Cleveland State University College of Law
  • Rachael Samberg, Director,  Scholarly Communications and Information Policy, UC Berkeley Library
  • Robin Sloan, NY Times best selling science fiction author
  • Günter Waibel, Associate Vice Provost & Executive Director, California Digital Library
  • Martha Whitehead, Vice President for the Harvard Library and University Librarian, Harvard University
  • John Wilkin, CEO, LYRASIS
  • Suzanne Wones, University Librarian, UC Berkeley Library
  • Ted Underwood, Professor of Information Science and English, University of Illinois at Urbana Champaign

How you can get involved 

Over the next year the project team will engage a diverse set of stakeholders in a co-development process that directly informs The Public Interest Corpus priorities, strategies, and partnerships. To kick things off we are holding a working event at Northeastern University Library in Boston, Massachusetts on March 3 where a group of senior library administrators, publishers, disciplinary researchers, authors, and technical experts will workshop core legal, technical, business model, and governance challenges. 

Moving forward we intend to hold additional focused in-person and virtual working events with a broad range of communities. We strongly believe that engaging with diverse stakeholders in a co-development process for this effort will be key to success. If you are interested in participating in a future event, hosting a Public Interest Corpus event, or have other ideas for how we might collaborate please let us know via the following form.

We look forward to advancing a public interest solution with you all.