AI Licensing: An Interview with Ben Denne of Cambridge University Press

Posted March 17, 2025

We’ve heard from lots of authors with questions about AI licensing of their works by their publishers. Cambridge University Press is one that has been in the news because it has undertaken a project to ask authors to opt into a contract addendum that would allow CUP to license AI rights for their books, giving authors a royalty on AI licensing net revenue. Cambridge has shared an FAQ with authors already, along with a further explanation of its approach last September and a report in January highlighting that it had contacted some 17,000 authors, the majority of whom have opted in. 

Below is an interview with Ben Denne, Director of Publishing, Academic Books, at Cambridge University Press, answering some questions about the program. 

Dave: Thank you, Ben, for talking with me. To start off, could you say what your role is at Cambridge University Press?

Ben: I’m the Director of Publishing for the Academic Books part of the Academic Division of Cambridge. In short, I’m the director overseeing the whole of the books program, the Academic Books program for Cambridge, except for the Bibles. That’s a specialist unit that runs separately that I don’t have anything to do with, but that means our textbooks, our research and reference books, and then we have a kind of small program of more traditional academic titles that sell a bit more to a bit of a wider audience.

Dave:  Thanks. My interest in talking with you is about generative AI licensing. And we’ve had quite a few authors actually forward us some emails that they’ve gotten from Cambridge presenting an AI license addendum to sign that goes with their contract and also an FAQ. I’d like to ask just a few questions about how that’s going and how that works.

What are Cambridge University Press’ goals with AI licensing?

Ben:  That’s a really good question. Broadly speaking, when this started to come our way, which was the same time a couple of years ago as this subject became really noisy. We’re looking at it and thinking, what’s the best way through this? How do we appropriately engage in this conversation? And I think it came back to us thinking about encouraging responsible use and thinking about our role as an academic publisher. 

And I think our role as an academic publisher is to push the academic debate forward, which means that we want our authors’ books to get read. We want them to get used. We want them to get cited. I think that’s really the kind of spirit we came into this conversation with is thinking, these developments are happening, right, that they’re happening anyway and the best thing we can do as a publisher is try and engage with this debate and push it in a direction that we think really helps to underline those principles of how good research is done.

Dave:  One of the things that I’ve seen with CUP’s rollout with this asking authors is, first of all, that you are asking authors.  Could you talk me through that decision? We’ve seen some other publishers in the news just announce that they have licensing deals with technology companies, and there was no outreach to authors as far as we can tell from those publishers. So could you talk through that thought process of this outreach?

Ben: Sure, so for us, when we first looked at this, we have a contract that authors sign, which is, probably in many ways, very similar to contracts that they signed with other publishers, and it includes all sorts of clauses about use and wide ranging licensing rights.  And one of the things it covers is derivative uses for content and the right to make derivatives. Our sense with that is when we looked at this in the context of these AI conversations and licensing, from a legal perspective, we looked at that and thought, well actually that derivative use clause technically does cover us for this kind of work. And I’m sure that’s the conclusion that some other people have reached too.

But we also thought, it just feels a bit like nobody knew that this kind of technology was emerging when they signed those contracts. And so from our perspective, we thought there’s a lot of noise about this subject in the whole ecosystem right now, you know, you can’t read the news without reading about AI, and people are nervous about it, understandably, and all of those kinds of things. So we felt that we should treat this as additional consent and approach it in that spirit. And that really underpins the decision to go out with the addendum for existing contracts.

I don’t want to jump onto any of your other questions, but that kind of principle, that we were going to ask for opt-ins, was important. Authors have to actively opt into this. We’re not saying to them. “if we don’t hear from you, we’ll assume you’ve opted in.” They have to actually come back to us and say that they’re happy for that use to happen.

Dave:  I think one of the things a lot of people don’t think about is how complicated rights clearance is, especially at scale, across a title list that is the size that you have. So this seems to me like a pretty big investment in just doing this process. Could you say how many of these  you have sent out? I gather that you’re doing this in batches, but do you have a sense of the scale of how many author addendum requests you anticipate making over the course of however long this process lasts?

Ben: It’s a really good question and it’s a moving target. At this stage, we have sent out multiple thousands. But I think we have about 45,000 books available in print and digitally at the moment. And we’re working our way through that list systematically. So we’re in the thousands and you’re right. It is a pretty big undertaking you know it’s quite a logistical challenge to do. We had to set up a whole kind of new workflow for doing this. We have a team that are working on the addenda and addressing the questions that authors have and all of those kinds of things.

Dave: This is maybe getting in the weeds, but it seems to me like there’s a pretty big difference between figuring this out for a sole-authored, single-part monograph, for instance, which is mostly what I’ve seen come through, and edited volumes. Have you tried to figure out those more complex books with multiple authors, multiple works within them?

Ben: Yeah, so the way it’s working for us is where we have several contracted authors for a book, we’re contacting them all and all of those authors have to opt in in order for us to agree that we have the licensing rights.

For edited volumes with multiple contributors, we’re not contacting the individual contributors for opt-in and there are a couple of different reasons for that. Typically, they don’t get paid royalties and also it would just be impossible for us to do. I mean, that’s logistically, you know, that’s a huge ask. So what we are doing is we are still contacting the editors for those volumes and the editors will opt in or not. So if the editor opts in, our understanding is that they’re opting in on behalf of the contributors as well.

But for multi-authored works, we get in touch with all of them. And in fact, we have quite a sizable number of books which are stuck because we’ve had some authors opt-in and some authors not opt-in.

Dave: This is a pretty fast-moving technology and I think a lot of authors are feeling just uncertain right now. And so I wonder about the opt-in window, if an author declines to opt in right now, is that it? Is there an opportunity to come back later after the dust settles and say, oh, no, actually, you know, I’d be happy to have my work used in this way? 

Ben:  Yeah, definitely. We’re in the process of putting something in place so that if authors don’t opt in now they are able to come back and opt in later. And by the way, if they don’t opt in, that’s fine for all the reasons that you just said;  some people are queasy about this and that’s okay. We’re not trying to, we’re not putting a hard sell on it

My sense with this is that for some of the people that we’re speaking to who haven’t opted in, it is because they haven’t yet really seen what the kind of use cases are for this kind of technology. Perhaps as those become more public, people will want to come back and opt in. 

I think some of the things that are out there are going to be quite powerful discovery tools in the future. So we want to make sure the authors do have the opportunity to opt in later if they want to, although we can’t, of course, be sure that if people opt in later the same opportunities will necessarily be available then, since this is quite a fast moving area.

Dave:  For your contracts moving forward for front list books, is a clause like this now a default in those agreements or will authors of new books have the option to opt-in or opt-out for AI licensing?

Ben: Good question. Currently, we have put a clause into our contracts to add AI licensing. But, where authors are asking us to remove that clause, we’re taking it out.

And again, coming back to your point before, those authors could opt in later. But for the contracts as they go out, we have it in as a clause now.

Dave:  Okay. So let’s shift to if you’re gathering all of these rights from authors, presumably at some point, then you would actually engage in the licensing with technology companies or others.  Could you say a little bit about that? Do you have any deals in place with tech companies already?  Or, the other thing that I’ve seen is, some publishers have been in the position of not doing those deals directly, but having sort of sub-licensing deals with others- I understand Proquest Clarivate is doing this. And I think Wiley is as well. Do you have any of those deals in place now?

Ben:  We’re still having those conversations at the moment. And we are talking to a range of different people who are looking at this kind of content. 

Dave:  Okay, that’s really helpful to know.

At the beginning, you talked a little bit about Cambridge University Press’s motivations with engaging in this space and doing licensing. Could you talk a little bit about important factors for what might show up in one of those kinds of deals with tech companies?  For instance, one of the things that I think aligns with the sort of values that you outlined at the beginning and that authors care a lot about is credit, right? We know that, especially for academic authors, credit is incredibly valuable and important. And so I wonder if you’ve thought about how ensuring author credit might factor into any sort of downstream deal that CUP might engage in?

Ben: Absolutely. So we’re having exactly those conversations at the moment with anybody that we’re talking to. And we’ve been very clear with our authors when they’ve asked questions about this, and you may have seen this alluded to in some of the information that you’ve had forwarded to you from authors, that those principles of attribution are 100% what we’re focused on.  Really, they’re kind of a red line for us. 

One of the things we’ve been in lots of conversations with people around this technology is the question of at what level does content need to be attributed? Our sense with this is that any kind of meaningful extract from somebody else’s work needs to be cited. 

I’m kind of repeating myself, but that’s how research works. People build on other people’s work, and so in a scenario where content is being ‘discovered’, if we can’t identify and cite that content, it can’t be accurately attributed. So that’s a red line for us.

Dave: Right.  I think figuring out that attribution, like at what level does that attribution need to kick in, is a really tricky thing. It seems to me, that if you’ve got a foundation model that is pulling in some texts and then someone’s using, say ChatGPT to write emails and somewhere in the model it gleans some structural components from sources like academic books,  I don’t think that’s the thing most authors care about – being cited for the fact that you help train this model to understand how to format citations or do other things like that. It’s the intellectual content that matters and that’s the really tricky piece of it.

Ben: Absolutely and I don’t have an easy answer for you there. So we’re having those conversations at the moment, but our sense is that any sort of direct quote, anything that could be, you know, anything that you would consider to be plagiarism or worthy of credit in a non-AI world should be attributed.

Dave:  I realize this question is asking a hypothetical because you don’t have any of these agreements in place yet, but it seems to me there’s a pretty big difference between use of Cambridge books for model training and uses such as for Retrieval Augmented Generation (RAG).

Have you thought about those distinctions in terms of how that might affect differences in Cambridge’s willingness to set a price on those things?  I assume retrieval-augmented generation (RAG) would come with a higher licensing price than others. But could you talk me through that thought process?

Ben: So it’s kind of interesting because I think there’s a little bit of a gray area,  because I think a lot of the RAG tools are combined with some aspect of LLM. So they might belooking to summarize some research or write a brief about X, Y, and Z.

I think it is quite interesting at the moment that most of the questions we get from people who are worried about this are really anxious about LLMs, but I feel like the really exciting place for academia and research is around that kind of retrieval augmented generation because that’s what’s going to help with discoverability for authors.  It is difficult to talk about at the moment because we don’t have any public deals that I can point to. But I’d say a lot of the conversations that we’re having are somewhere between those two things, you know, so it’s a combination of an  LLM that’s generating text and a citation engine or discovery engine sitting over content.

Dave: Leaving aside the legal situation for a moment, one of the things that I hear from authors pretty consistently is the sentiment that with these big technology companies coming in, they feel that these companies are sort of profiting off of content; that they are exploiting. And so they ought to return something to the system and to authors.

But there’s a really different sentiment about what happens when you have, say, academic researchers using content for AI or text data mining purposes to make new discoveries or learn new things both about the texts and about the world around them. We work a lot with text data mining researchers who are interested in large aggregations of content, not so they can build the next OpenAI,  but so they can understand how language has changed over time, or how has culture changed over time.

I wonder from CUP’s perspective, how do those two different kinds of use cases factor into your thinking about downstream licensing deals for AI/ text data mining?

Ben: Yeah, I think for us that the primary thing we’re really trying to lean on, because of course the whole thing is not quite that clear cut, because a lot of the time it’s the big tech companies that are facilitating a lot of that discovery or that a lot of the kind of discovery traffic goes through them. So I think from our perspective, I’m going to say we’re not ruling out working with anyone. We would put anybody– any partner that we had– through the same diligence process that we would have with onboarding anybody else, but we wouldn’t rule out those conversations with anybody. I think for us, the most important thing is coming back to, and I’m going to sound like a stuck record here, but those principles of attribution. And we have had conversations, some preliminary conversations with people who’ve said, “Well, we don’t think it would be possible to do what you’re asking,” and at that point, we’re saying, “well, okay, then you know that’s the red line for us.”

I think there’s quite a bit of cloudy territory between those two things. And I think for us, the most important thing is to make sure that authors are being credited where their work’s being used.

Dave:  All right,  I have a hypothetical that I wanted to give you. So we see that it’s a 20% royalty calculated on net revenue. Let’s say you received $5 million from an AI licensing deal. Can you walk me through how that might work out for the author? How do you calculate net revenue on that? And then, how that the individual author sitting there sees CUP signs a big deal. What can they expect?

Ben: That’s a tricky one because it would depend a little bit on the terms of the deal as well. But broadly speaking, the principle is, if that’s the net revenues that we receive, so in your situation, you had five million in there, the full licensing payment, is divided out across the list of titles. Authors then earn the royalty for that sale or license type per title, as they do now with all other forms of licensing. 

But, then, where a licensee can provide accurate title-level usage within their royalty statements, this would instead be used. So in an LLM situation that you were just talking about, that would be divided among those books.  With the retrieval augmented generation tool, I think that would work much more around the basis of usage. So, depending on what searches within that tool were bringing back particular content, then we would be attributing revenue that way.

Dave: Okay, that makes a lot of sense. I think this was in the FAQ: one of your use cases is in an authoritative database that’s used on a perpetual basis. But there was somewhere that talked about the removal of content once a licensing term has ended. I wonder if you’ve developed thinking internally about what a standard term would be, how long these things might last? 

Ben:  Yeah, I mean, it’s hard, isn’t it? Because where you’re licensing content to train an LLM, it would be sort of insincere to dress that up. Generally most agreements would be governed by a 2-5 year training term and at the end of that term the training data set would be destroyed, however, they would retain the output from the specific models that were developed during the training term. If they wanted to create new models they would need to renew the license/extend the term. 

For some of the other uses that’s all being discussed at the moment. I think there is still work on this, but there would be standard partnership length terms. What I would say is that from our perspective, we think it’s quite likely in the next few years, the focus will move more away from training large language models and into that area of discovery that these are going to become quite important revenue streams for academic publishers. 

Dave: Thanks, very helpful. As you work on these deals, what level of transparency do you plan on offering authors or the general public about what these licenses might look like? At least with other publishers, it’s been quite mysterious – I think with one, we learned about an AI licensing deal in a quarterly earnings report, for instance. I think authors do really care about what the details of these deals look like. 

Ben:  It’s tricky, isn’t it? It’s hard for me to talk about a deal that hasn’t been done already, and of course, these deals can be subject to the same commercial confidentiality requirements as any other partnership. But I think it’s fair to say that Cambridge University Press would endeavor to be pretty transparent about what we’re doing generally and most importantly, be transparent about why we’re doing it. So I don’t think we’d be concealing that information from anybody. And coming back to my point before, we’ve been quite clear that we only want to enter into these kinds of conversations with people that we think are using content responsibly, and we’d always aim to be open.

Dave: A few final questions. First, CUP has published a number of open-access books. For example, I believe CUP was part of the TOME initiative.  Do you feel like this kind of addendum is necessary for those open-access books, given that they already have some sort of open license attached to them? Or do you think that this is a necessary addition to those OA licenses? 

Ben: That’s a really good question, and it’s something that we’re grappling with at the moment. Without getting into the kind of weeds around open access, some of it depends on the license. Historically for books, our default license open access license was a Creative Commons CC BY NC license, which prohibits commercial reuse. I think at the moment, we’re looking at that (and I think a lot of publishers would say the same thing) and working through how that fits with AI licensing with commercial AI companies. The short answer to your question is if you have a CC BY license, then, people do have a broad license to reuse that content. So at the moment, we’re not actively going after those authors for opt-ins, nor are we including those books in licensing deals.

That we’re doing, but that’s also a relatively small number of books. I can say, we are now looking at using more CC-BY-NC-ND as the default, which restricts the creation of derivative works. You’ve touched on a conversation that is evolving, but we would be treating AI usage as requiring a derivative license and therefore not covered under a CC-BY-NC-ND license. 

Dave:  Thanks, that’s very helpful and I think that’s something a lot of authors are trying to figure out: how does AI downstream use factor into Creative Commons licensed works? And of course, the underlying legal situation matters. I didn’t ask, but I assume that the rights that you’re asking for in this addendum are worldwide, since that affects for example whether usage might be permitted under national law. 

Ben: Yes, the rights are worldwide. 

And thinking again about that, I mean, it’s interesting, isn’t it? Because even under the CC-BY license, it doubles down on that principle of attribution as well. That’s the nature of the license so some uses even then may not be covered by that license.

Dave:  Right. That attribution piece under the CC-BY license will be an important one [note: this issue is being litigated, most prominently in the Doe v. Github suit]. And then, there’s also the underlying question of what the law allows independently even if there is no license–open license or otherwise. I know right now there’s a consultation that just closed in the UK about what the law should be, and in the US, we’re fighting these things out in the courts. I think there are 39 lawsuits right now pending about various aspects of this, and a key question in most of them is just how far fair use goes. And of course, you know, if fair use applies then you don’t have to worry too much about what the license says, whether it’s CC BY or CC BY NC ND or anything else.  This is like reading tea leaves but I think the prevailing case law indicates that model training and coming up with the weights has a pretty strong fair use case, but for the output side, that’s where I think it starts to stumble a little bit when you’ve got systems that are producing outputs that are substantially similar to the inputs. So I wouldn’t be surprised if in some of these suits, we get a ruling in favor of fair use and then in some of them we get a different outcome. And then, the landscape is just sort of messy.

And I suppose in the UK, I imagine y’all are watching what that legal landscape looks like around the world as it’s changing.

Ben:  Yeah, absolutely. 

Dave: One final question: we’ve talked a lot about licensing books for AI, but CUP has a substantial journal portfolio as well. Can you say anything about CUP’s approach to use of journal content either as AI training data or for other AI uses? 

Ben: We’ve been more focussed on books, as this is where most of the demand has been to date, but we have seen a developing interest in journal content. We are, therefore, currently exploring this form of licensing in a consultative way with our journal partners. 

Dave:  Well, thank you for talking. And this was really, really helpful. And I think that this will be useful for authors who are trying to understand just more about what’s going on.

Ben:  It’s been a pleasure.


Discover more from Authors Alliance

Subscribe to get the latest posts sent to your email.