OpenAI content boss: ‘Incumbent’ on us to help small publishers, not just the giants.
This story was first published by Press Gazette, written by Charlotte Tobitt. It is republished here with permission. (Wan Infra newsletter)
-Tom Rubin explica las ideas detrás de los acuerdos de asociación para editores grandes y pequeños, en el Congreso Mundial de Medios de Noticias de WAN-IFRA, en Copenhague.
It is “incumbent” on OpenAI to make sure smaller publishers get the same potential benefits from platforms like ChatGPT as large dominant names, its chief of intellectual property and content has said.
Tom Rubin also clarified a common misconception about the partnerships being signed between OpenAI and news publishers: they are “largely not” about training but are instead focused on the display of news content and use of the tools and tech.
Rubin was speaking at the WAN-IFRA World News Media Congress in Copenhagen on the same day OpenAI announced “strategic content and product partnerships” with both The Atlantic and Vox Media, following in the footsteps of other large publishers including News Corp, Dotdash Meredith, the Financial Times and Axel Springer.
OpenAI also revealed a partnership with WAN-IFRA to train teams from 128 newsrooms in Europe, Asia Pacific, Latin America and South Asia in using AI tools.
Rubin, formerly chief intellectual property strategy counsel at Microsoft, said it was “very important” that resources “don’t just go to large companies but that small, independent publications have the ability to learn and leverage the technology”.
He said despite OpenAI’s multiple partnerships so far “one of the things that we were very focused on is ensuring that the opportunity exists more broadly”.
He also cited an agreement signed last year that saw OpenAI give $5m to the American Journalism Project to help local newsrooms deploy the use of AI.
One of the “key outcomes” of these types of projects, he said, is to help newsrooms become “centres of excellence that can educate others, train others, and provide scalable” training.
He added that ChatGPT provides an “important opportunity for publishers because they’ll be able to target content more specifically to readers who are interested in it, therefore can take advantage of the fact that we have over 100 million users and create a relationship between those users and even a small publication’s content.
“So it’s incumbent upon us that way to make sure that the visibility that’s given isn’t just to large companies and articles, but to smaller publications and indeed, when a user has a very particular interest, or even a niche interest, that’s a particular opportunity for a small publisher using our technology to associate that content with the user and put it before them.”
The focus on reach echoes what Financial Times chief executive John Ridding told the conference on Tuesday. Of the FT’s deal with OpenAI, Ridding said: “The payment matters for principle and revenue of course, but also important is the opportunity to extend our reach and understand how users will interact with AI.”
OpenAI publisher partnerships ‘a priority focus’
Asked why there had been a flurry of partnership deals announced recently, Rubin said: “We identified the opportunity for the news industry from the beginning. The work that we started doing in the news sector was very shortly after the launch of ChatGPT.
“We identified it as a particular opportunity because we have the tools to really facilitate positive outcomes in the industry and we wanted to ensure that there was an understanding and adoption of the technology by those who wanted to experiment. We got a lot of interest from a lot of publishers and so that’s why it continues to be a priority focus.”
Despite some misconceptions, Rubin made clear the news partnerships that have been announced “largely” do not include terms around permission for training on the publishers’ content.
“I just want to clarify that the news partnerships that you read about are largely not about that. They are about display, they are about use of our tools.”
He noted that a major part of many of the deals – the display of the publishers’ content in ChatGPT and OpenAI’s other products – has not yet gone live.
“The goals of the partnerships are to help and guide news organizations into the use of a technology that will be beneficial to them,” he said. “They’re very excited about the deployment of the technology. We know more about technology than they do so we very much engage with them on educating about use cases for the technology.
“And the other significant component of the deals is that – you haven’t yet seen it in our products – but the organizations that we have partnerships with, the deals are substantially about displaying their content within the context of our products.”
OpenAI training ‘not carried over’ once bots are blocked
Rubin also confirmed that LLMs are “trained on a periodic basis”, usually annually or less regularly, and that this means if publishers have opted out of having their websites crawled since last summer when they were first able to do so, their content will not appear in the next version of ChatGPT or any other OpenAI product. The company revealed this week it has begun training its “next frontier model”.
Rubin said: “It’s important to realise that when models are trained, they’re trained from scratch, so it means that if a copyright owner has excluded their content from training, that will not appear in the large language model that’s being produced.
“So it’s not as if content is carried over. The training of the model starts from scratch and so the request to not be included and to be excluded from training data is effective, and it ensures that the content will not be used.”
Rubin acknowledged that one of the major concerns for publishers is disintermediation from their users and said both they and tech platforms need to be careful about “exacerbating what has demonstrably been an issue for the last several years.
“So I think that’s certainly one concern and one lesson is ensuring that the opportunity that new technology creates is an opportunity that flows to the providers of the information, and we’re very focused on that.”
He added that one of the major problems from disintermediation is the “siphoning of advertising revenue” from publishers to platforms (in the UK last year the ad market grew overall but publishers saw none of it).
“OpenAI is a company that does not have an advertising model, it doesn’t have any intention of having an advertising model, and we think that should align the interests of publishing houses and technology platforms better.”